DashboardModelsChatSettings

Models

Download and manage GGUF models for local inference

Models are downloaded to ~/.cache/llama-runner/models/. Use Hugging Face CLI to download GGUF files, then this page will detect them automatically.

recommended

Llama 3 120B (MLX)

8-bit67.1 GB

Largest available MLX model. Uses ~70GB at 8-bit precision. Fits 128GB Mac with headroom for long context windows.

Download with huggingface-cli
recommended

Llama 3 70B (MLX)

8-bit39.1 GB

MLX-optimized 70B model at 8-bit precision. ~42GB footprint leaves 86GB for context and system.

Download with huggingface-cli

Llama 3 8B (MLX)

8-bit4.7 GB

Fast 8B MLX model. Runs on any Apple Silicon Mac with excellent performance.

Download with huggingface-cli

Mistral 7B (MLX)

4-bit4.1 GB

Lightweight 7B MLX model with strong reasoning, at 4-bit precision.

Download with huggingface-cli

C4AI 120B (MLX)

4-bit61.5 GB

Cohere's 120B model in MLX format. Excellent for RAG and tool use. ~66GB at 4-bit.

Download with huggingface-cli

Llama 3 70B (GGUF)

Q4_K_M38.4 GB

Meta's flagship 70B model via llama.cpp. Requires ~42GB RAM.

Download with huggingface-cli

Llama 3 8B (GGUF)

Q4_K_M4.6 GB

Fast and capable 8B model via llama.cpp. Runs on any Apple Silicon Mac.

Download with huggingface-cli

Manual download

Download GGUF models and place them in the models directory.

# Example: download Llama 3 70B with huggingface-cli

huggingface-cli download casperhansen/llama-3-70b-instruct-awq \

  llama-3-70b-instruct.Q4_K_M.gguf \

  --local-dir ~/.cache/llama-runner/models/