Skip to content

Instantly share code, notes, and snippets.

@awni
Last active December 2, 2025 06:50
Show Gist options
  • Select an option

  • Save awni/ec071fd27940698edd14a4191855bba6 to your computer and use it in GitHub Desktop.

Select an option

Save awni/ec071fd27940698edd14a4191855bba6 to your computer and use it in GitHub Desktop.
Run DeepSeek R1 or V3 with MLX Distributed

Setup

On every machine in the cluster install openmpi and mlx-lm:

conda install conda-forge::openmpi
pip install -U mlx-lm

Next download the pipeline parallel run script. Download it to the same path on every machine:

curl -O https://raw.githubusercontent.com/ml-explore/mlx-examples/refs/heads/main/llms/mlx_lm/examples/pipeline_generate.py

Make a hosts.json file on the machine you plan to launch the generation. For two machines it should look like this:

[
  {"ssh": "hostname1"},
  {"ssh": "hostname2"}
]

Also make sure you can ssh hostname from every machine to every other machine. Check-out the MLX documentation for more information on setting up and testing MPI.

Set the wired limit on the machines to use more memory. For example on a 192GB M2 Ultra set this:

sudo sysctl iogpu.wired_limit_mb=180000

Run

Run the generation with a command like the following:

mlx.launch \
  --hostfile path/to/hosts.json \
  --backend mpi \
  path/to/pipeline_generate.py \ 
  --prompt "What number is larger 6.9 or 6.11?" \
  --max-tokens 128 \
  --model mlx-community/DeepSeek-R1-4bit

For DeepSeek R1 quantized in 3-bit you need in aggregate 350GB of RAM accross the cluster of machines, e.g. two 192 GB M2 Ultras. To run the model quantized to 4-bit you need 450GB in aggregate RAM or three 192 GB M2 Ultras.

@awni
Copy link
Author

awni commented Nov 5, 2025

Which models did you have in mind? If you file an issue in mlx-lm we can look into adding it.

@georgiedekker
Copy link

Not one particular model, just would be great to know how to use this example you provided with any model from any provider. I'm currently running a small model mlx-community/Qwen3-1.7B-8bit on a single m4 mac mini with 16gb, would be great to test out many different models (from different providers) with simple cheap hardware.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment