Models

Download and manage GGUF models for local inference

Models are downloaded to ~/.cache/llama-runner/models/. Use Hugging Face CLI to download GGUF files, then this page will detect them automatically.

recommended

8-bit67.1 GB

Largest available MLX model. Uses ~70GB at 8-bit precision. Fits 128GB Mac with headroom for long context windows.

Download with huggingface-cli

recommended

8-bit39.1 GB

MLX-optimized 70B model at 8-bit precision. ~42GB footprint leaves 86GB for context and system.

Download with huggingface-cli

8-bit4.7 GB

Fast 8B MLX model. Runs on any Apple Silicon Mac with excellent performance.

Download with huggingface-cli

4-bit4.1 GB

Lightweight 7B MLX model with strong reasoning, at 4-bit precision.

Download with huggingface-cli

4-bit61.5 GB

Cohere's 120B model in MLX format. Excellent for RAG and tool use. ~66GB at 4-bit.

Download with huggingface-cli

Q4_K_M38.4 GB

Meta's flagship 70B model via llama.cpp. Requires ~42GB RAM.

Download with huggingface-cli

Q4_K_M4.6 GB

Fast and capable 8B model via llama.cpp. Runs on any Apple Silicon Mac.

Download with huggingface-cli

Download GGUF models and place them in the models directory.

# Example: download Llama 3 70B with huggingface-cli

huggingface-cli download casperhansen/llama-3-70b-instruct-awq \

llama-3-70b-instruct.Q4_K_M.gguf \

--local-dir ~/.cache/llama-runner/models/