Models
Download and manage GGUF models for local inference
Models are downloaded to ~/.cache/llama-runner/models/. Use Hugging Face CLI to download GGUF files, then this page will detect them automatically.
Llama 3 120B (MLX)
Largest available MLX model. Uses ~70GB at 8-bit precision. Fits 128GB Mac with headroom for long context windows.
Llama 3 70B (MLX)
MLX-optimized 70B model at 8-bit precision. ~42GB footprint leaves 86GB for context and system.
Llama 3 8B (MLX)
Fast 8B MLX model. Runs on any Apple Silicon Mac with excellent performance.
Mistral 7B (MLX)
Lightweight 7B MLX model with strong reasoning, at 4-bit precision.
C4AI 120B (MLX)
Cohere's 120B model in MLX format. Excellent for RAG and tool use. ~66GB at 4-bit.
Llama 3 70B (GGUF)
Meta's flagship 70B model via llama.cpp. Requires ~42GB RAM.
Llama 3 8B (GGUF)
Fast and capable 8B model via llama.cpp. Runs on any Apple Silicon Mac.
Manual download
Download GGUF models and place them in the models directory.
# Example: download Llama 3 70B with huggingface-cli
huggingface-cli download casperhansen/llama-3-70b-instruct-awq \
llama-3-70b-instruct.Q4_K_M.gguf \
--local-dir ~/.cache/llama-runner/models/