Skip to content

Instantly share code, notes, and snippets.

@saforem2
Created September 11, 2024 19:40
Show Gist options
  • Select an option

  • Save saforem2/2f2549894d9c65ed2edcfe6b1dbe6a70 to your computer and use it in GitHub Desktop.

Select an option

Save saforem2/2f2549894d9c65ed2edcfe6b1dbe6a70 to your computer and use it in GitHub Desktop.

🐍 Setup @ ALCF

Easiest way to get setup on any of {Polaris, Aurora, Sunspot}1 is to use πŸ‹ ezpz

# ezpz
git clone https://github.com/saforem2/ezpz deps/ezpz

# ezpz: setup
export PBS_O_WORKDIR=$(pwd) && source deps/ezpz/src/ezpz/bin/utils.sh
ezpz_setup_python
ezpz_setup_job

# ezpz: install
python3 -m pip install -e deps/ezpz --require-virtualenv

# ezpz: test
launch python3 -m ezpz.test_dist

For Megatron-DeepSpeed specifically:

# clone repo + navigate into it
git clone https://github.com/argonne-lcf/Megatron-DeepSpeed
cd Megatron-DeepSpeed

# clone saforem2/ezpz, microsoft/DeepSpeed into ./deps/
mkdir deps
git clone https://github.com/saforem2/ezpz deps/ezpz
git clone https://github.com/microsoft/DeepSpeed deps/DeepSpeed

# ezpz
export PBS_O_WORKDIR=$(pwd) && source deps/ezpz/src/ezpz/bin/utils.sh
ezpz_setup_python
ezpz_setup_job
python3 -m pip install -e deps/ezpz --require-virtualenv

# deepspeed
cd deps/DeepSpeed && bash install.sh |& tee install.log && cd -

# upgrade W&B (needed on Polaris)
python3 -m pip install --upgrade wandb

# launch:
PBS_O_WORKDIR=$(pwd) bash train_aGPT_7B.sh

Footnotes

  1. Works on any of {cpu, cuda, amd, xpu, mps} ↩

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment