Dashboard
System overview and MLX inference engine status
Backend
MLX
Idle
Recommended model
Llama 3 120B (MLX)
120B parameters
Server status
Idle
Configure in Settings
API endpoint
Offline
Start engine in Settings
System resources
Memory (model + system)
0 GB128 GB total
Inference speed
Waiting...
Send a chat message
Weight load time
Waiting...
Measured on first inference
Model size
Unknown
No model loaded
Metal acceleration
Active (MLX)
Apple GPU via unified memory
MLX server not detected
Start the MLX server to enable model loading, chat, and the API endpoint. Head to Settings for step-by-step instructions.
Memory footprint by model scale (128 GB workstation)
7B-8B6 GBRuns easily on any Apple Silicon
13B-14B10 GBComfortable on 16GB+ Macs
30B-34B22 GBGreat fit for 32GB-64GB Macs
70B42 GBIdeal for 128GB workstations
120B70 GBFits 128GB with room for context