Dashboard

System overview and MLX inference engine status

Backend

MLX

Idle

Recommended model

Llama 3 120B (MLX)

120B parameters

Server status

Idle

Configure in Settings

API endpoint

Offline

Start engine in Settings

System resources

Memory (model + system)

0 GB128 GB total

Inference speed

Waiting...

Send a chat message

Weight load time

Waiting...

Measured on first inference

Model size

Unknown

No model loaded

Metal acceleration

Active (MLX)

Apple GPU via unified memory

Start the MLX server to enable model loading, chat, and the API endpoint. Head to Settings for step-by-step instructions.

7B-8B

6 GBRuns easily on any Apple Silicon

13B-14B

10 GBComfortable on 16GB+ Macs

30B-34B

22 GBGreat fit for 32GB-64GB Macs

70B

42 GBIdeal for 128GB workstations

120B

70 GBFits 128GB with room for context