DashboardModelsChatSettings

Dashboard

System overview and MLX inference engine status

Backend

MLX

Idle

Recommended model

Llama 3 120B (MLX)

120B parameters

Server status

Idle

Configure in Settings

API endpoint

Offline

Start engine in Settings

System resources

Memory (model + system)
0 GB128 GB total

Inference speed

Waiting...

Send a chat message

Weight load time

Waiting...

Measured on first inference

Model size

Unknown

No model loaded

Metal acceleration

Active (MLX)

Apple GPU via unified memory

MLX server not detected

Start the MLX server to enable model loading, chat, and the API endpoint. Head to Settings for step-by-step instructions.

Memory footprint by model scale (128 GB workstation)

7B-8B
6 GB
13B-14B
10 GB
30B-34B
22 GB
70B
42 GB
120B
70 GB