Compiling Memory MCP Server Without CUDA

TL;DR:

Clone the MCP Memory Service repository
Install CPU-only Python dependencies
Configure SQLite-vec storage backend
Test compilation and basic functionality

Why Compile Without CUDA

MCP Memory Service supports GPU acceleration via CUDA for improved performance, but many systems lack CUDA-compatible GPUs (most laptops, ARM devices, and some servers). This guide focuses on CPU-only compilation and installation, providing full functionality while maintaining reasonable performance through optimized CPU libraries.

Enhanced Fork with CPU Support

Note: For optimized CPU-only compilation without CUDA dependencies, consider using my enhanced fork at jpmrblood/mcp-memory-service, which includes dedicated CPU build profiles and Dockerfile.cpu support, with critically important pyproject.toml optimizations detailed below.

Key improvements vs original doobidoo/mcp-memory-service:

Added Dockerfile.cpu: Docker container optimized for CPU-only deployments
Added .specify/ directory: Contains templates and scripts for project management:
- Bash scripts for development automation (check-prerequisites.sh, setup-plan.sh, create-new-feature.sh, update-agent-context.sh)
- Memory constitution file for project guidelines
- Templates for plans, specifications, tasks, and agent files
Modified .gitignore: Added .specify directory to ignore development templates

pyproject.toml Changes (Most Important)

The most critical improvement in the fork is the enhanced pyproject.toml configuration that enables true CPU-only installations:

Added CPU Dependency Profile

[project.optional-dependencies]
cpu = [
    "sentence-transformers[onnx]>=2.2.2",
    "torch>=2.0.0"
]

This creates an --extra cpu installation option that:

Forces CPU-only PyTorch without CUDA dependencies
Uses ONNX-optimized sentence transformers for better performance
Eliminates GPU package bloat from CPU installations

Astral UV Integration

[[tool.uv.index]]
name = "pytorch-cpu"
url = "https://download.pytorch.org/whl/cpu"
explicit = true

[tool.uv.sources]
torch = [
    { index = "pytorch-cpu", extra = "cpu" }
]

This sophisticated UV configuration:

Defines a custom PyTorch CPU index that never pulls CUDA packages
Automatically switches PyTorch to CPU-only wheels when using --extra cpu
Ensures deterministic CPU-only builds regardless of system configuration
Prevents accidental CUDA package installation

Usage with UV

# CPU-optimized installation (recommended)
uv sync --extra cpu

Or,

uv pip install ".[cpu]"

This configuration eliminates the common issue where pip install torch automatically pulls CUDA-enabled wheels even on CPU-only systems, saving disk space and preventing compatibility issues.

Prerequisites

My system is using Python3.13, so I need to install Python 3.12 or less. SO I install 3.11:

sudo apt update
sudo apt install python3.11 python3.11-venv python3-pip git build-essential

Installation Steps

1. Clone the Repository

Clone the latest MCP Memory Service code:

git clone https://github.com/jpmrblood/mcp-memory-service.git
cd mcp-memory-service

2. Create Virtual Environment

Set up an isolated Python environment:

uv venv
source venv/bin/activate

3. Install CPU-Only Dependencies

Install dependencies using Astral UV for optimized CPU-only compilation:

# Install with CPU profile using uv
uv sync --extra cpu

Or/then,

# For compiling, enable CPU installation
uv pip install ".[cpu]"

4. Verify CPU Installation

Ensure the installation uses CPU-only libraries:

python3 -c "import torch; print(f'CUDA available: {torch.cuda.is_available()}'); print(f'PyTorch version: {torch.__version__}')"

Expected output should show CUDA available: False.

5. Download Model for Embedding

Let’s install hugging-face cli:

uv pip install -U "huggingface_hub[cli]"

Download the file:

uv run hf download sentence-transformers/all-MiniLM-L6-v2

7. Create a running script

I want to make this running as a service because I have multiple free clients that burn tokens easily.

In ~/.local/bin/run-mcp-memory.sh

#!/bin/bash
# run-mcp-memory
export MCP_HTTP_ENABLED=true
export MCP_SERVER_HOST=0.0.0.0
export MCP_SERVER_PORT=8000
export MCP_MEMORY_STORAGE_BACKEND=sqlite_vec
export MCP_COMMAND="uv --directory=$HOME/mcp-memory-service run memory server"

npx -y supergateway --stdio "${MCP_COMMAND}" -outputTransport streamableHttp --port 8000

Yes, I run this using supergateway, a gateway that converts stdio MCPs into HTTP streaming and SSE.

Make the script executable:

chmod +x ~/.local/bin/run-mcp-memory.sh

It can run well now.

7. Run as server

run-mcp-memory.sh &

6. Install to the Editors

gemini mcp add -e sse -s user memory http://localhost:8000/sse

References

Compiling MCP Memory Service without CUDA provides a lightweight, accessible solution for CPU-only environments while maintaining full compatibility with Claude’s MCP protocol.

Twitter Facebook LinkedIn