Easiest way to get setup on any of {Polaris, Aurora, Sunspot}1 is to use
π ezpz
# ezpz
git clone https://github.com/saforem2/ezpz deps/ezpz
# ezpz: setup
export PBS_O_WORKDIR=$(pwd) && source deps/ezpz/src/ezpz/bin/utils.sh
ezpz_setup_python
ezpz_setup_job
# ezpz: install
python3 -m pip install -e deps/ezpz --require-virtualenv
# ezpz: test
launch python3 -m ezpz.test_distFor Megatron-DeepSpeed specifically:
# clone repo + navigate into it
git clone https://github.com/argonne-lcf/Megatron-DeepSpeed
cd Megatron-DeepSpeed
# clone saforem2/ezpz, microsoft/DeepSpeed into ./deps/
mkdir deps
git clone https://github.com/saforem2/ezpz deps/ezpz
git clone https://github.com/microsoft/DeepSpeed deps/DeepSpeed
# ezpz
export PBS_O_WORKDIR=$(pwd) && source deps/ezpz/src/ezpz/bin/utils.sh
ezpz_setup_python
ezpz_setup_job
python3 -m pip install -e deps/ezpz --require-virtualenv
# deepspeed
cd deps/DeepSpeed && bash install.sh |& tee install.log && cd -
# upgrade W&B (needed on Polaris)
python3 -m pip install --upgrade wandb
# launch:
PBS_O_WORKDIR=$(pwd) bash train_aGPT_7B.shFootnotes
-
Works on any of {
cpu,cuda,amd,xpu,mps} β©