I've included the full series of commands (and their outputs) from a fresh attempt again this morning (2025-03-12) incase its helpful:
#[08:54:16 AM][x4716c2s4b0n0][/f/d/f/p/a/Megatron-DeepSpeed][🌱 main][✓]
$ source <(curl 'https://raw.githubusercontent.com/saforem2/ezpz/refs/heads/main/src/ezpz/bin/utils.sh') && ezpz_setup_env
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 54998 100 54998 0 0 2944k 0 --:--:-- --:--:-- --:--:-- 2983k
Unable to detect PBS or SLURM working directory info...
Using /lus/flare/projects/datascience/foremans/projects/argonne-lcf/Megatron-DeepSpeed as working directory...
Using WORKING_DIR: /lus/flare/projects/datascience/foremans/projects/argonne-lcf/Megatron-DeepSpeed
No conda_prefix OR virtual_env found in environment...
Setting up conda...
Due to MODULEPATH changes, the following have been reloaded:
1) hwloc/master-git.1793e43-level-zero 2) mpich/opt/4.3.0rc3
The following have been reloaded with a version change:
1) oneapi/eng-compiler/2024.07.30.002 => oneapi/release/2024.2.1 2) yaksa/0.3-aw2kkvy => yaksa/0.3-euoqglg
Found conda at: /opt/aurora/24.180.3/frameworks/aurora_nre_models_frameworks-2024.2.1_u1
No VIRTUAL_ENV found in environment!
- Trying to setup from /opt/aurora/24.180.3/frameworks/aurora_nre_models_frameworks-2024.2.1_u1
- Using VENV_DIR=/lus/flare/projects/datascience/foremans/projects/argonne-lcf/Megatron-DeepSpeed/venvs/aurora_nre_models_frameworks-2024.2.1_u1
- Found existing venv, activating from /lus/flare/projects/datascience/foremans/projects/argonne-lcf/Megatron-DeepSpeed/venvs/aurora_nre_models_frameworks-2024.2.1_u1
[python] Using /lus/flare/projects/datascience/foremans/projects/argonne-lcf/Megatron-DeepSpeed/venvs/aurora_nre_models_frameworks-2024.2.1_u1/bin/python3
[🍋 ezpz/bin/utils.sh]
• USER=foremans
• MACHINE=aurora
• HOST=x4716c2s4b0n0
• TSTAMP=2025-03-12-085419
[ezpz_setup_host_pbs]
• Using hostfile: /var/spool/pbs/aux/3211413.aurora-pbs-0001.hostmgmt.cm.aurora.alcf.anl.gov
• Found in environment:
• HOSTFILE: /var/spool/pbs/aux/3211413.aurora-pbs-0001.hostmgmt.cm.aurora.alcf.anl.gov
• Writing PBS vars to: /home/foremans/.pbsenv
[ezpz_save_pbs_env]
• Setting:
• HOSTFILE: /var/spool/pbs/aux/3211413.aurora-pbs-0001.hostmgmt.cm.aurora.alcf.anl.gov
• JOBENV_FILE: /home/foremans/.pbsenv
[HOSTS]
• [host:0] - x4716c2s3b0n0.hostmgmt2716.cm.aurora.alcf.anl.gov
• [host:1] - x4716c2s4b0n0.hostmgmt2716.cm.aurora.alcf.anl.gov
[DIST INFO]
• NGPUS=24
• NHOSTS=2
• NGPU_PER_HOST=12
• HOSTFILE=/var/spool/pbs/aux/3211413.aurora-pbs-0001.hostmgmt.cm.aurora.alcf.anl.gov
• DIST_LAUNCH=mpiexec --verbose --envall -n 24 -ppn 12 --hostfile /var/spool/pbs/aux/3211413.aurora-pbs-0001.hostmgmt.cm.aurora.alcf.anl.gov --cpu-bind depth -d 8 --no-vni
[LAUNCH]:
• To launch across all available GPUs, use: launch
launch = mpiexec --verbose --envall -n 24 -ppn 12 --hostfile /var/spool/pbs/aux/3211413.aurora-pbs-0001.hostmgmt.cm.aurora.alcf.anl.gov --cpu-bind depth -d 8 --no-vni
took: 0h:00m:06s#[🐍 aurora_nre_models_frameworks-2024.2.1_u1](👻 aurora_nre_models_frameworks-2024.2.1_u
#[08:54:24 AM][x4716c2s4b0n0][/f/d/f/p/a/Megatron-DeepSpeed][🌱 main][✓] [⏱️ 6s
$ python3 -m pip install "git+https://github.com/saforem2/ezpz"
Collecting git+https://github.com/saforem2/ezpz
Cloning https://github.com/saforem2/ezpz to /tmp/pip-req-build-5w2m90yj
Running command git clone --filter=blob:none --quiet https://github.com/saforem2/ezpz /tmp/pip-req-build-5w2m90yj
Resolved https://github.com/saforem2/ezpz to commit c45fb19353c9f06575e0ecb12ba7377321bb2f71
Installing build dependencies ... done
Getting requirements to build wheel ... done
Preparing metadata (pyproject.toml) ... done
Collecting ambivalent@ git+https://github.com/saforem2/ambivalent
Cloning https://github.com/saforem2/ambivalent to /tmp/pip-install-ruitfkv9/ambivalent_aa683df8399d4aeb915c2d2b0071a645
Running command git clone --filter=blob:none --quiet https://github.com/saforem2/ambivalent /tmp/pip-install-ruitfkv9/ambivalent_aa683df8399d4aeb915c2d2b0071a645
Resolved https://github.com/saforem2/ambivalent to commit 9063fda7d139416f141c5259f945c76bf1b85ed3
Installing build dependencies ... done
Getting requirements to build wheel ... done
Preparing metadata (pyproject.toml) ... done
Requirement already satisfied: ml-dtypes in /home/foremans/.local/aurora/frameworks/2024.2.1_u1/lib/python3.10/site-packages (from ezpz==0.3) (0.5.1)
Requirement already satisfied: sh in /home/foremans/.local/aurora/frameworks/2024.2.1_u1/lib/python3.10/site-packages (from ezpz==0.3) (2.2.1)
Requirement already satisfied: omegaconf in /opt/aurora/24.180.3/frameworks/aurora_nre_models_frameworks-2024.2.1_u1/lib/python3.10/site-packages (from ezpz==0.3) (2.3.0)
Requirement already satisfied: tensorboard in /opt/aurora/24.180.3/frameworks/aurora_nre_models_frameworks-2024.2.1_u1/lib/python3.10/site-packages (from ezpz==0.3) (2.15.2)
Requirement already satisfied: hydra-core in /opt/aurora/24.180.3/frameworks/aurora_nre_models_frameworks-2024.2.1_u1/lib/python3.10/site-packages (from ezpz==0.3) (1.3.2)
Requirement already satisfied: torch in /opt/aurora/24.180.3/frameworks/aurora_nre_models_frameworks-2024.2.1_u1/lib/python3.10/site-packages (from ezpz==0.3) (2.3.1+cxx11.abi)
Requirement already satisfied: tqdm in /opt/aurora/24.180.3/frameworks/aurora_nre_models_frameworks-2024.2.1_u1/lib/python3.10/site-packages (from ezpz==0.3) (4.67.1)
Requirement already satisfied: jax in /home/foremans/.local/aurora/frameworks/2024.2.1_u1/lib/python3.10/site-packages (from ezpz==0.3) (0.5.0)
Requirement already satisfied: h5py in /opt/aurora/24.180.3/frameworks/aurora_nre_models_frameworks-2024.2.1_u1/lib/python3.10/site-packages (from ezpz==0.3) (3.12.1)
Requirement already satisfied: jaxtyping in /home/foremans/.local/aurora/frameworks/2024.2.1_u1/lib/python3.10/site-packages (from ezpz==0.3) (0.2.37)
Requirement already satisfied: jaxlib in /home/foremans/.local/aurora/frameworks/2024.2.1_u1/lib/python3.10/site-packages (from ezpz==0.3) (0.5.0)
Requirement already satisfied: sentencepiece in /home/foremans/.local/aurora/frameworks/2024.2.1_u1/lib/python3.10/site-packages (from ezpz==0.3) (0.2.0)
Requirement already satisfied: mpi4py in /opt/aurora/24.180.3/frameworks/aurora_nre_models_frameworks-2024.2.1_u1/lib/python3.10/site-packages (from ezpz==0.3) (3.1.6)
Requirement already satisfied: joblib in /opt/aurora/24.180.3/frameworks/aurora_nre_models_frameworks-2024.2.1_u1/lib/python3.10/site-packages (from ezpz==0.3) (1.4.2)
Requirement already satisfied: xarray in /home/foremans/.local/aurora/frameworks/2024.2.1_u1/lib/python3.10/site-packages (from ezpz==0.3) (2025.1.2)
Requirement already satisfied: ipython in /opt/aurora/24.180.3/frameworks/aurora_nre_models_frameworks-2024.2.1_u1/lib/python3.10/site-packages (from ezpz==0.3) (8.31.0)
Requirement already satisfied: seaborn in /opt/aurora/24.180.3/frameworks/aurora_nre_models_frameworks-2024.2.1_u1/lib/python3.10/site-packages (from ezpz==0.3) (0.13.2)
Requirement already satisfied: rich in /home/foremans/.local/aurora/frameworks/2024.2.1_u1/lib/python3.10/site-packages (from ezpz==0.3) (13.9.4)
Requirement already satisfied: plotext in /home/foremans/.local/aurora/frameworks/2024.2.1_u1/lib/python3.10/site-packages (from ezpz==0.3) (5.3.2)
Requirement already satisfied: pyinstrument in /home/foremans/.local/aurora/frameworks/2024.2.1_u1/lib/python3.10/site-packages (from ezpz==0.3) (5.0.1)
Requirement already satisfied: hydra-colorlog in /home/foremans/.local/aurora/frameworks/2024.2.1_u1/lib/python3.10/site-packages (from ezpz==0.3) (1.2.0)
Requirement already satisfied: wandb in /home/foremans/.local/aurora/frameworks/2024.2.1_u1/lib/python3.10/site-packages (from ezpz==0.3) (0.19.6)
Requirement already satisfied: matplotlib in /opt/aurora/24.180.3/frameworks/aurora_nre_models_frameworks-2024.2.1_u1/lib/python3.10/site-packages (from ambivalent@ git+https://github.com/saforem2/ambivalent->ezpz==0.3) (3.5.3)
Requirement already satisfied: requests in /opt/aurora/24.180.3/frameworks/aurora_nre_models_frameworks-2024.2.1_u1/lib/python3.10/site-packages (from ambivalent@ git+https://github.com/saforem2/ambivalent->ezpz==0.3) (2.32.3)
Requirement already satisfied: colormaps in /home/foremans/.local/aurora/frameworks/2024.2.1_u1/lib/python3.10/site-packages (from ambivalent@ git+https://github.com/saforem2/ambivalent->ezpz==0.3) (0.4.2)
Requirement already satisfied: numpy>=1.19.3 in /opt/aurora/24.180.3/frameworks/aurora_nre_models_frameworks-2024.2.1_u1/lib/python3.10/site-packages (from h5py->ezpz==0.3) (1.26.4)
Requirement already satisfied: colorlog in /home/foremans/.local/aurora/frameworks/2024.2.1_u1/lib/python3.10/site-packages (from hydra-colorlog->ezpz==0.3) (6.9.0)
Requirement already satisfied: packaging in /opt/aurora/24.180.3/frameworks/aurora_nre_models_frameworks-2024.2.1_u1/lib/python3.10/site-packages (from hydra-core->ezpz==0.3) (24.0)
Requirement already satisfied: antlr4-python3-runtime==4.9.* in /opt/aurora/24.180.3/frameworks/aurora_nre_models_frameworks-2024.2.1_u1/lib/python3.10/site-packages (from hydra-core->ezpz==0.3) (4.9.3)
Requirement already satisfied: PyYAML>=5.1.0 in /opt/aurora/24.180.3/frameworks/aurora_nre_models_frameworks-2024.2.1_u1/lib/python3.10/site-packages (from omegaconf->ezpz==0.3) (6.0.2)
Requirement already satisfied: matplotlib-inline in /opt/aurora/24.180.3/frameworks/aurora_nre_models_frameworks-2024.2.1_u1/lib/python3.10/site-packages (from ipython->ezpz==0.3) (0.1.7)
Requirement already satisfied: pygments>=2.4.0 in /opt/aurora/24.180.3/frameworks/aurora_nre_models_frameworks-2024.2.1_u1/lib/python3.10/site-packages (from ipython->ezpz==0.3) (2.19.1)
Requirement already satisfied: typing_extensions>=4.6 in /opt/aurora/24.180.3/frameworks/aurora_nre_models_frameworks-2024.2.1_u1/lib/python3.10/site-packages (from ipython->ezpz==0.3) (4.12.2)
Requirement already satisfied: exceptiongroup in /opt/aurora/24.180.3/frameworks/aurora_nre_models_frameworks-2024.2.1_u1/lib/python3.10/site-packages (from ipython->ezpz==0.3) (1.2.2)
Requirement already satisfied: prompt_toolkit<3.1.0,>=3.0.41 in /opt/aurora/24.180.3/frameworks/aurora_nre_models_frameworks-2024.2.1_u1/lib/python3.10/site-packages (from ipython->ezpz==0.3) (3.0.50)
Requirement already satisfied: decorator in /opt/aurora/24.180.3/frameworks/aurora_nre_models_frameworks-2024.2.1_u1/lib/python3.10/site-packages (from ipython->ezpz==0.3) (5.1.1)
Requirement already satisfied: stack_data in /opt/aurora/24.180.3/frameworks/aurora_nre_models_frameworks-2024.2.1_u1/lib/python3.10/site-packages (from ipython->ezpz==0.3) (0.6.3)
Requirement already satisfied: jedi>=0.16 in /opt/aurora/24.180.3/frameworks/aurora_nre_models_frameworks-2024.2.1_u1/lib/python3.10/site-packages (from ipython->ezpz==0.3) (0.19.2)
Requirement already satisfied: pexpect>4.3 in /opt/aurora/24.180.3/frameworks/aurora_nre_models_frameworks-2024.2.1_u1/lib/python3.10/site-packages (from ipython->ezpz==0.3) (4.9.0)
Requirement already satisfied: traitlets>=5.13.0 in /opt/aurora/24.180.3/frameworks/aurora_nre_models_frameworks-2024.2.1_u1/lib/python3.10/site-packages (from ipython->ezpz==0.3) (5.14.3)
Requirement already satisfied: opt_einsum in /opt/aurora/24.180.3/frameworks/aurora_nre_models_frameworks-2024.2.1_u1/lib/python3.10/site-packages (from jax->ezpz==0.3) (3.4.0)
Requirement already satisfied: scipy>=1.11.1 in /opt/aurora/24.180.3/frameworks/aurora_nre_models_frameworks-2024.2.1_u1/lib/python3.10/site-packages (from jax->ezpz==0.3) (1.12.0)
Requirement already satisfied: wadler-lindig>=0.1.3 in /home/foremans/.local/aurora/frameworks/2024.2.1_u1/lib/python3.10/site-packages (from jaxtyping->ezpz==0.3) (0.1.3)
Requirement already satisfied: markdown-it-py>=2.2.0 in /home/foremans/.local/aurora/frameworks/2024.2.1_u1/lib/python3.10/site-packages (from rich->ezpz==0.3) (3.0.0)
Requirement already satisfied: pandas>=1.2 in /home/foremans/.local/aurora/frameworks/2024.2.1_u1/lib/python3.10/site-packages (from seaborn->ezpz==0.3) (2.2.3)
Requirement already satisfied: absl-py>=0.4 in /opt/aurora/24.180.3/frameworks/aurora_nre_models_frameworks-2024.2.1_u1/lib/python3.10/site-packages (from tensorboard->ezpz==0.3) (2.1.0)
Requirement already satisfied: setuptools>=41.0.0 in ./venvs/aurora_nre_models_frameworks-2024.2.1_u1/lib/python3.10/site-packages (from tensorboard->ezpz==0.3) (65.5.0)
Requirement already satisfied: werkzeug>=1.0.1 in /opt/aurora/24.180.3/frameworks/aurora_nre_models_frameworks-2024.2.1_u1/lib/python3.10/site-packages (from tensorboard->ezpz==0.3) (3.1.3)
Requirement already satisfied: google-auth-oauthlib<2,>=0.5 in /opt/aurora/24.180.3/frameworks/aurora_nre_models_frameworks-2024.2.1_u1/lib/python3.10/site-packages (from tensorboard->ezpz==0.3) (1.2.1)
Requirement already satisfied: google-auth<3,>=1.6.3 in /opt/aurora/24.180.3/frameworks/aurora_nre_models_frameworks-2024.2.1_u1/lib/python3.10/site-packages (from tensorboard->ezpz==0.3) (2.37.0)
Requirement already satisfied: markdown>=2.6.8 in /opt/aurora/24.180.3/frameworks/aurora_nre_models_frameworks-2024.2.1_u1/lib/python3.10/site-packages (from tensorboard->ezpz==0.3) (3.7)
Requirement already satisfied: tensorboard-data-server<0.8.0,>=0.7.0 in /opt/aurora/24.180.3/frameworks/aurora_nre_models_frameworks-2024.2.1_u1/lib/python3.10/site-packages (from tensorboard->ezpz==0.3) (0.7.2)
Requirement already satisfied: protobuf!=4.24.0,>=3.19.6 in /opt/aurora/24.180.3/frameworks/aurora_nre_models_frameworks-2024.2.1_u1/lib/python3.10/site-packages (from tensorboard->ezpz==0.3) (4.25.5)
Requirement already satisfied: grpcio>=1.48.2 in /opt/aurora/24.180.3/frameworks/aurora_nre_models_frameworks-2024.2.1_u1/lib/python3.10/site-packages (from tensorboard->ezpz==0.3) (1.69.0)
Requirement already satisfied: six>1.9 in /opt/aurora/24.180.3/frameworks/aurora_nre_models_frameworks-2024.2.1_u1/lib/python3.10/site-packages (from tensorboard->ezpz==0.3) (1.16.0)
Requirement already satisfied: filelock in /opt/aurora/24.180.3/frameworks/aurora_nre_models_frameworks-2024.2.1_u1/lib/python3.10/site-packages (from torch->ezpz==0.3) (3.16.1)
Requirement already satisfied: networkx in /opt/aurora/24.180.3/frameworks/aurora_nre_models_frameworks-2024.2.1_u1/lib/python3.10/site-packages (from torch->ezpz==0.3) (3.4.2)
Requirement already satisfied: sympy in /opt/aurora/24.180.3/frameworks/aurora_nre_models_frameworks-2024.2.1_u1/lib/python3.10/site-packages (from torch->ezpz==0.3) (1.13.3)
Requirement already satisfied: jinja2 in /opt/aurora/24.180.3/frameworks/aurora_nre_models_frameworks-2024.2.1_u1/lib/python3.10/site-packages (from torch->ezpz==0.3) (3.1.5)
Requirement already satisfied: fsspec in /opt/aurora/24.180.3/frameworks/aurora_nre_models_frameworks-2024.2.1_u1/lib/python3.10/site-packages (from torch->ezpz==0.3) (2024.12.0)
Requirement already satisfied: platformdirs in /opt/aurora/24.180.3/frameworks/aurora_nre_models_frameworks-2024.2.1_u1/lib/python3.10/site-packages (from wandb->ezpz==0.3) (4.2.2)
Requirement already satisfied: gitpython!=3.1.29,>=1.0.0 in /opt/aurora/24.180.3/frameworks/aurora_nre_models_frameworks-2024.2.1_u1/lib/python3.10/site-packages (from wandb->ezpz==0.3) (3.1.44)
Requirement already satisfied: click!=8.0.0,>=7.1 in /opt/aurora/24.180.3/frameworks/aurora_nre_models_frameworks-2024.2.1_u1/lib/python3.10/site-packages (from wandb->ezpz==0.3) (8.1.8)
Requirement already satisfied: setproctitle in /home/foremans/.local/aurora/frameworks/2024.2.1_u1/lib/python3.10/site-packages (from wandb->ezpz==0.3) (1.3.4)
Requirement already satisfied: psutil>=5.0.0 in /opt/aurora/24.180.3/frameworks/aurora_nre_models_frameworks-2024.2.1_u1/lib/python3.10/site-packages (from wandb->ezpz==0.3) (6.1.1)
Requirement already satisfied: docker-pycreds>=0.4.0 in /home/foremans/.local/aurora/frameworks/2024.2.1_u1/lib/python3.10/site-packages (from wandb->ezpz==0.3) (0.4.0)
Requirement already satisfied: pydantic<3,>=2.6 in /opt/aurora/24.180.3/frameworks/aurora_nre_models_frameworks-2024.2.1_u1/lib/python3.10/site-packages (from wandb->ezpz==0.3) (2.10.5)
Requirement already satisfied: sentry-sdk>=2.0.0 in /home/foremans/.local/aurora/frameworks/2024.2.1_u1/lib/python3.10/site-packages (from wandb->ezpz==0.3) (2.20.0)
Requirement already satisfied: gitdb<5,>=4.0.1 in /opt/aurora/24.180.3/frameworks/aurora_nre_models_frameworks-2024.2.1_u1/lib/python3.10/site-packages (from gitpython!=3.1.29,>=1.0.0->wandb->ezpz==0.3) (4.0.12)
Requirement already satisfied: pyasn1-modules>=0.2.1 in /opt/aurora/24.180.3/frameworks/aurora_nre_models_frameworks-2024.2.1_u1/lib/python3.10/site-packages (from google-auth<3,>=1.6.3->tensorboard->ezpz==0.3) (0.4.1)
Requirement already satisfied: rsa<5,>=3.1.4 in /opt/aurora/24.180.3/frameworks/aurora_nre_models_frameworks-2024.2.1_u1/lib/python3.10/site-packages (from google-auth<3,>=1.6.3->tensorboard->ezpz==0.3) (4.9)
Requirement already satisfied: cachetools<6.0,>=2.0.0 in /opt/aurora/24.180.3/frameworks/aurora_nre_models_frameworks-2024.2.1_u1/lib/python3.10/site-packages (from google-auth<3,>=1.6.3->tensorboard->ezpz==0.3) (5.5.0)
Requirement already satisfied: requests-oauthlib>=0.7.0 in /opt/aurora/24.180.3/frameworks/aurora_nre_models_frameworks-2024.2.1_u1/lib/python3.10/site-packages (from google-auth-oauthlib<2,>=0.5->tensorboard->ezpz==0.3) (2.0.0)
Requirement already satisfied: parso<0.9.0,>=0.8.4 in /opt/aurora/24.180.3/frameworks/aurora_nre_models_frameworks-2024.2.1_u1/lib/python3.10/site-packages (from jedi>=0.16->ipython->ezpz==0.3) (0.8.4)
Requirement already satisfied: mdurl~=0.1 in /home/foremans/.local/aurora/frameworks/2024.2.1_u1/lib/python3.10/site-packages (from markdown-it-py>=2.2.0->rich->ezpz==0.3) (0.1.2)
Requirement already satisfied: kiwisolver>=1.0.1 in /opt/aurora/24.180.3/frameworks/aurora_nre_models_frameworks-2024.2.1_u1/lib/python3.10/site-packages (from matplotlib->ambivalent@ git+https://github.com/saforem2/ambivalent->ezpz==0.3) (1.4.8)
Requirement already satisfied: cycler>=0.10 in /opt/aurora/24.180.3/frameworks/aurora_nre_models_frameworks-2024.2.1_u1/lib/python3.10/site-packages (from matplotlib->ambivalent@ git+https://github.com/saforem2/ambivalent->ezpz==0.3) (0.12.1)
Requirement already satisfied: pyparsing>=2.2.1 in /opt/aurora/24.180.3/frameworks/aurora_nre_models_frameworks-2024.2.1_u1/lib/python3.10/site-packages (from matplotlib->ambivalent@ git+https://github.com/saforem2/ambivalent->ezpz==0.3) (3.2.1)
Requirement already satisfied: pillow>=6.2.0 in /opt/aurora/24.180.3/frameworks/aurora_nre_models_frameworks-2024.2.1_u1/lib/python3.10/site-packages (from matplotlib->ambivalent@ git+https://github.com/saforem2/ambivalent->ezpz==0.3) (11.1.0)
Requirement already satisfied: python-dateutil>=2.7 in /opt/aurora/24.180.3/frameworks/aurora_nre_models_frameworks-2024.2.1_u1/lib/python3.10/site-packages (from matplotlib->ambivalent@ git+https://github.com/saforem2/ambivalent->ezpz==0.3) (2.9.0)
Requirement already satisfied: fonttools>=4.22.0 in /opt/aurora/24.180.3/frameworks/aurora_nre_models_frameworks-2024.2.1_u1/lib/python3.10/site-packages (from matplotlib->ambivalent@ git+https://github.com/saforem2/ambivalent->ezpz==0.3) (4.55.4)
Requirement already satisfied: tzdata>=2022.7 in /home/foremans/.local/aurora/frameworks/2024.2.1_u1/lib/python3.10/site-packages (from pandas>=1.2->seaborn->ezpz==0.3) (2025.1)
Requirement already satisfied: pytz>=2020.1 in /opt/aurora/24.180.3/frameworks/aurora_nre_models_frameworks-2024.2.1_u1/lib/python3.10/site-packages (from pandas>=1.2->seaborn->ezpz==0.3) (2024.1)
Requirement already satisfied: ptyprocess>=0.5 in /opt/aurora/24.180.3/frameworks/aurora_nre_models_frameworks-2024.2.1_u1/lib/python3.10/site-packages (from pexpect>4.3->ipython->ezpz==0.3) (0.7.0)
Requirement already satisfied: wcwidth in /opt/aurora/24.180.3/frameworks/aurora_nre_models_frameworks-2024.2.1_u1/lib/python3.10/site-packages (from prompt_toolkit<3.1.0,>=3.0.41->ipython->ezpz==0.3) (0.2.13)
Requirement already satisfied: pydantic-core==2.27.2 in /opt/aurora/24.180.3/frameworks/aurora_nre_models_frameworks-2024.2.1_u1/lib/python3.10/site-packages (from pydantic<3,>=2.6->wandb->ezpz==0.3) (2.27.2)
Requirement already satisfied: annotated-types>=0.6.0 in /opt/aurora/24.180.3/frameworks/aurora_nre_models_frameworks-2024.2.1_u1/lib/python3.10/site-packages (from pydantic<3,>=2.6->wandb->ezpz==0.3) (0.7.0)
Requirement already satisfied: idna<4,>=2.5 in /opt/aurora/24.180.3/frameworks/aurora_nre_models_frameworks-2024.2.1_u1/lib/python3.10/site-packages (from requests->ambivalent@ git+https://github.com/saforem2/ambivalent->ezpz==0.3) (3.7)
Requirement already satisfied: urllib3<3,>=1.21.1 in /opt/aurora/24.180.3/frameworks/aurora_nre_models_frameworks-2024.2.1_u1/lib/python3.10/site-packages (from requests->ambivalent@ git+https://github.com/saforem2/ambivalent->ezpz==0.3) (2.2.1)
Requirement already satisfied: certifi>=2017.4.17 in /opt/aurora/24.180.3/frameworks/aurora_nre_models_frameworks-2024.2.1_u1/lib/python3.10/site-packages (from requests->ambivalent@ git+https://github.com/saforem2/ambivalent->ezpz==0.3) (2024.12.14)
Requirement already satisfied: charset-normalizer<4,>=2 in /opt/aurora/24.180.3/frameworks/aurora_nre_models_frameworks-2024.2.1_u1/lib/python3.10/site-packages (from requests->ambivalent@ git+https://github.com/saforem2/ambivalent->ezpz==0.3) (3.3.2)
Requirement already satisfied: MarkupSafe>=2.1.1 in /opt/aurora/24.180.3/frameworks/aurora_nre_models_frameworks-2024.2.1_u1/lib/python3.10/site-packages (from werkzeug>=1.0.1->tensorboard->ezpz==0.3) (3.0.2)
Requirement already satisfied: pure-eval in /opt/aurora/24.180.3/frameworks/aurora_nre_models_frameworks-2024.2.1_u1/lib/python3.10/site-packages (from stack_data->ipython->ezpz==0.3) (0.2.3)
Requirement already satisfied: asttokens>=2.1.0 in /opt/aurora/24.180.3/frameworks/aurora_nre_models_frameworks-2024.2.1_u1/lib/python3.10/site-packages (from stack_data->ipython->ezpz==0.3) (3.0.0)
Requirement already satisfied: executing>=1.2.0 in /opt/aurora/24.180.3/frameworks/aurora_nre_models_frameworks-2024.2.1_u1/lib/python3.10/site-packages (from stack_data->ipython->ezpz==0.3) (2.1.0)
Requirement already satisfied: mpmath<1.4,>=1.1.0 in /opt/aurora/24.180.3/frameworks/aurora_nre_models_frameworks-2024.2.1_u1/lib/python3.10/site-packages (from sympy->torch->ezpz==0.3) (1.3.0)
Requirement already satisfied: smmap<6,>=3.0.1 in /opt/aurora/24.180.3/frameworks/aurora_nre_models_frameworks-2024.2.1_u1/lib/python3.10/site-packages (from gitdb<5,>=4.0.1->gitpython!=3.1.29,>=1.0.0->wandb->ezpz==0.3) (5.0.2)
Requirement already satisfied: pyasn1<0.7.0,>=0.4.6 in /opt/aurora/24.180.3/frameworks/aurora_nre_models_frameworks-2024.2.1_u1/lib/python3.10/site-packages (from pyasn1-modules>=0.2.1->google-auth<3,>=1.6.3->tensorboard->ezpz==0.3) (0.6.1)
Requirement already satisfied: oauthlib>=3.0.0 in /opt/aurora/24.180.3/frameworks/aurora_nre_models_frameworks-2024.2.1_u1/lib/python3.10/site-packages (from requests-oauthlib>=0.7.0->google-auth-oauthlib<2,>=0.5->tensorboard->ezpz==0.3) (3.2.2)
[notice] A new release of pip is available: 23.0.1 -> 25.0.1
[notice] To update, run: pip install --upgrade pip
took: 0h:01m:18s#[🐍 aurora_nre_models_frameworks-2024.2.1_u1](👻 aurora_nre_models_frameworks-2024.2.1_u
#[09:01:28 AM][x4716c2s4b0n0][/f/d/f/p/a/Megatron-DeepSpeed][🌱 main][✓]
$ PBS_O_WORKDIR=$(pwd) bash train_aGPT_7B.sh
Using WORKING_DIR: /flare/datascience/foremans/projects/argonne-lcf/Megatron-DeepSpeed
Running on: aurora
Found ezpz in /flare/datascience/foremans/projects/argonne-lcf/Megatron-DeepSpeed/deps/ezpz
Using WORKING_DIR: /flare/datascience/foremans/projects/argonne-lcf/Megatron-DeepSpeed
Using virtual_env: /lus/flare/projects/datascience/foremans/projects/argonne-lcf/Megatron-DeepSpeed/venvs/aurora_nre_models_frameworks-2024.2.1_u1 on top of conda from: /opt/aurora/24.180.3/frameworks/aurora_nre_models_frameworks-2024.2.1_u1
[python] Using /lus/flare/projects/datascience/foremans/projects/argonne-lcf/Megatron-DeepSpeed/venvs/aurora_nre_models_frameworks-2024.2.1_u1/bin/python3
[🍋 ezpz/bin/utils.sh
• USER=foremans
• MACHINE=aurora
• HOST=x4716c2s4b0n0
• TSTAMP=2025-03-12-090157
[ezpz_setup_host_pbs]
• Using hostfile: /var/spool/pbs/aux/3211413.aurora-pbs-0001.hostmgmt.cm.aurora.alcf.anl.gov
• Found in environment:
• HOSTFILE: /var/spool/pbs/aux/3211413.aurora-pbs-0001.hostmgmt.cm.aurora.alcf.anl.gov
• Writing PBS vars to: /home/foremans/.pbsenv
[ezpz_save_pbs_env]
• Setting:
• HOSTFILE: /var/spool/pbs/aux/3211413.aurora-pbs-0001.hostmgmt.cm.aurora.alcf.anl.gov
• JOBENV_FILE: /home/foremans/.pbsenv
[HOSTS]
• [host:0] - x4716c2s3b0n0.hostmgmt2716.cm.aurora.alcf.anl.gov
• [host:1] - x4716c2s4b0n0.hostmgmt2716.cm.aurora.alcf.anl.gov
[DIST INFO]
• NGPUS=24
• NHOSTS=2
• NGPU_PER_HOST=12
• HOSTFILE=/var/spool/pbs/aux/3211413.aurora-pbs-0001.hostmgmt.cm.aurora.alcf.anl.gov
• DIST_LAUNCH=mpiexec --verbose --envall -n 24 -ppn 12 --hostfile /var/spool/pbs/aux/3211413.aurora-pbs-0001.hostmgmt.cm.aurora.alcf.anl.gov --cpu-bind depth -d 8 --no-vni
[LAUNCH]:
• To launch across all available GPUs, use: launch
launch = mpiexec --verbose --envall -n 24 -ppn 12 --hostfile /var/spool/pbs/aux/3211413.aurora-pbs-0001.hostmgmt.cm.aurora.alcf.anl.gov --cpu-bind depth -d 8 --no-vni
[notice] A new release of pip is available: 23.0.1 -> 25.0.1
[notice] To update, run: pip install --upgrade pip
[ezpz_install] Found ezpz @ 0.3
[install_dependencies] Ensuring all dependencies from /flare/datascience/foremans/projects/argonne-lcf/Megatron-DeepSpeed/ALCF/requirements/requirements.txt installed...
[notice] A new release of pip is available: 23.0.1 -> 25.0.1
[notice] To update, run: pip install --upgrade pip
[install_dependencies] No 'deepspeed' command found on aurora[install_dependencies] !! No deepsepeed in /lus/flare/projects/datascience/foremans/projects/argonne-lcf/Megatron-DeepSpeed/venvs/aurora_nre_models_frameworks-2024.2.1_u1/bin/python3[setParams] Using GRAD_ACC_STEPS: 16
TRAIN_TOKENS=2000000000000 (=2000B tokens)
TRAIN_ITERS=1271565
DS_CONFIG: /flare/datascience/foremans/projects/argonne-lcf/Megatron-DeepSpeed/ds-configs/ds_stage1_mb1_gb384_pp1_bf16.json
ZS=1, MB=1, GB=384, PP=1, DTYPE=bf16
{
"train_batch_size": 384,
"train_micro_batch_size_per_gpu": 1,
"gradient_clipping": 1,
"steps_per_print": 1,
"gradient_accumulation_steps": 16,
"zero_force_ds_cpu_optimizer": false,
"zero_allow_untested_optimizer": true,
"wall_clock_breakdown": false,
"zero_optimization": {
"stage": 1
},
"fp16": {
"enabled": false,
"loss_scale": 0,
"loss_scale_window": 1000,
"hysteresis": 2,
"min_loss_scale": 1
},
"bfloat16": {
"enabled": true,
"loss_scale": 1
},
"comms_logger": {
"enabled": false,
"verbose": false,
"debug": false
},
"flops_profiler": {
"enabled": true,
"profile_step": 2,
"module_depth": -1,
"top_modules": 1,
"detailed": true,
"output_file": null
}
}
Checkpoints will be saved to: checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash
Please see logs at: logs/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/20250312-090212_24_x4716c2s4b0n0
Setting up tokenizer with Llama2Tokenizer
Using data_file_list: /flare/datascience/foremans/projects/argonne-lcf/Megatron-DeepSpeed/ALCF/data-lists/aurora/dolma.txt
Using tokenizer: Llama2Tokenizer. Setting up data with
Calling: setData() with /flare/datascience/foremans/projects/argonne-lcf/Megatron-DeepSpeed/ALCF/data-lists/aurora/dolma.txt
--------------------
Updated environment:
DATA_FILE_LIST: /flare/datascience/foremans/projects/argonne-lcf/Megatron-DeepSpeed/ALCF/data-lists/aurora/dolma.txt
NUM_DOCS: 2419
WEIGHT_SUM: 1.0
DFL_STEM: dolma
DATA_CACHE_PATH: .cache/dolma/index-cache
DATA_FLAGS:
--------------------
[setData] DATA_FLAGS:
[setData] TOKENIZER_FLAGS: --tokenizer-type Llama2Tokenizer --tokenizer-model /flare/datascience/foremans/projects/argonne-lcf/Megatron-DeepSpeed/ALCF/tokenizer.model
Requirement already satisfied: pybind11 in /home/foremans/.local/aurora/frameworks/2024.2.1_u1/lib/python3.10/site-packages (2.13.6)
[notice] A new release of pip is available: 23.0.1 -> 25.0.1
[notice] To update, run: pip install --upgrade pip
make: Nothing to be done for 'default'.
/flare/datascience/foremans/projects/argonne-lcf/Megatron-DeepSpeed
++++++++++++++++++++++++++++++++++++++++++++++++++
- MPICH_DIR=/opt/aurora/24.180.3/spack/unified/0.8.0/install/linux-sles15-x86_64/oneapi-2024.2.1/mpich-4.3.0rc3-hipyfz6
- Using /lus/flare/projects/datascience/foremans/projects/argonne-lcf/Megatron-DeepSpeed/venvs/aurora_nre_models_frameworks-2024.2.1_u1/bin/python3
- WORLD_SIZE:24
- BACKEND: ccl
- MODEL_TYPE: llama-gb384-seq4096-pp1-tp1-32layers-32heads-4096hidden
- Using DATA_FILE_LIST: /flare/datascience/foremans/projects/argonne-lcf/Megatron-DeepSpeed/ALCF/data-lists/aurora/dolma.txt
++++++++++++++++++++++++++++++++++++++++++++++++++
Currently Loaded Modules:
1) gcc-runtime/12.2.0-267awrk 5) gcc/12.2.0 9) oneapi/release/2024.2.1 13) yaksa/0.3-euoqglg
2) gmp/6.2.1-yctcuid 6) libfabric/1.20.1 10) pti-gpu/d3639de 14) mpich/opt/4.3.0rc3
3) mpfr/4.2.1-fhgnwe7 7) cray-pals/1.4.0 11) frameworks/2024.2.1_u1
4) mpc/1.3.1-ygprpb4 8) cray-libpals/1.4.0 12) hwloc/master-git.1793e43-level-zero
Saving environment to checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/.env
Not currently running. Continuing!
Launching with: MPICH
mpiexec --verbose --envall -n 24 -ppn 12 --hostfile /var/spool/pbs/aux/3211413.aurora-pbs-0001.hostmgmt.cm.aurora.alcf.anl.gov --cpu-bind depth -d 8 --no-vni --pmi=pmix --genvall /lus/flare/projects/datascience/foremans/projects/argonne-lcf/Megatron-DeepSpeed/venvs/aurora_nre_models_frameworks-2024.2.1_u1/bin/python3 -Wignore /flare/datascience/foremans/projects/argonne-lcf/Megatron-DeepSpeed/pretrain_gpt_alcf.py
Using data_cache_path: checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/.cache/dolma/index-cache
Training Arguments:
--accumulate-allreduce-grads-in-fp32
--adam-beta1=0.9
--adam-beta2=0.95
--adam-eps=0.00001
--attention-dropout 0
--bf16
--blend-sample-in-corpus
--clip-grad=1.0
--data-cache-path=checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/.cache/dolma/index-cache
--data-file-list=/flare/datascience/foremans/projects/argonne-lcf/Megatron-DeepSpeed/ALCF/data-lists/aurora/dolma.txt
--deepspeed
--deepspeed_config=/flare/datascience/foremans/projects/argonne-lcf/Megatron-DeepSpeed/ds-configs/ds_stage1_mb1_gb384_pp1_bf16.json
--disable-bias-linear
--distributed-backend=ccl
--ds-sequence-parallel-size=1
--eval-interval=100
--eval-iters=20
--ffn-hidden-size 11008
--global-batch-size=384
--hidden-dropout 0
--hidden-size=4096
--load=checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash
--log-interval=1
--log-optimizer-states-to-tensorboard
--log-timers-to-tensorboard
--lr 0.0002
--lr-decay-style cosine
--lr-warmup-fraction 0.05
--max-position-embeddings=4096
--micro-batch-size=1
--no-bias-dropout-fusion
--no-bias-gelu-fusion
--no-gradient-accumulation-fusion
--no-masked-softmax-fusion
--no-pipeline-parallel
--no-query-key-layer-scaling
--normalization rmsnorm
--num-attention-heads=32
--num-key-value-heads 8
--num-layers=32
--optimizer=adamw
--pipeline-model-parallel-size=1
--save=checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash
--save-interval=50
--seq-length=4096
--shuffle-sample-in-corpus
--split=990,10,0
--swiglu
--tensorboard-dir checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/tensorboard
--tensor-model-parallel-size=1
--timing-log-level=1
--tokenizer-type Llama2Tokenizer --tokenizer-model /flare/datascience/foremans/projects/argonne-lcf/Megatron-DeepSpeed/ALCF/tokenizer.model
--train-iters=1271565
--untie-embeddings-and-output-weights
--use-checkpoint-opt_param-scheduler
--use-flash-attn-builder
--use-rotary-position-embeddings
--weight-decay=0.1
--zero-stage=1
mpiexec --verbose --envall -n 24 -ppn 12 --hostfile /var/spool/pbs/aux/3211413.aurora-pbs-0001.hostmgmt.cm.aurora.alcf.anl.gov --cpu-bind depth -d 8 --no-vni --pmi=pmix --genvall /lus/flare/projects/datascience/foremans/projects/argonne-lcf/Megatron-DeepSpeed/venvs/aurora_nre_models_frameworks-2024.2.1_u1/bin/python3 -Wignore /flare/datascience/foremans/projects/argonne-lcf/Megatron-DeepSpeed/pretrain_gpt_alcf.py --use-checkpoint-opt_param-scheduler --lr 0.0002 --lr-decay-style cosine --lr-warmup-fraction 0.05 --swiglu --hidden-dropout 0 --attention-dropout 0 --normalization rmsnorm --disable-bias-linear --no-query-key-layer-scaling --use-rotary-position-embeddings --untie-embeddings-and-output-weights --num-key-value-heads 8 --ffn-hidden-size 11008 --use-flash-attn-builder --tokenizer-type Llama2Tokenizer --tokenizer-model /flare/datascience/foremans/projects/argonne-lcf/Megatron-DeepSpeed/ALCF/tokenizer.model --log-timers-to-tensorboard --log-optimizer-states-to-tensorboard --tensorboard-dir checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/tensorboar
d --deepspeed --no-pipeline-parallel --deepspeed_config=/flare/datascience/foremans/projects/argonne-lcf/Megatron-DeepSpeed/ds-configs/ds_stage1_mb1_gb384_pp1_bf16.json --zero-stage=1 --bf16 --shuffle-sample-in-corpus --blend-sample-in-corpus --accumulate-allreduce-grads-in-fp32 --no-bias-gelu-fusion --no-bias-dropout-fusion --no-masked-softmax-fusion --no-gradient-accumulation-fusion --optimizer=adamw --tensor-model-parallel-size=1 --pipeline-model-parallel-size=1 --max-position-embeddings=4096 --micro-batch-size=1 --ds-sequence-parallel-size=1 --global-batch-size=384 --split=990,10,0 --timing-log-level=1 --eval-interval=100 --eval-iters=20 --save-interval=50 --log-interval=1 --save=checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash --load=checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash --seq-length=4096 --num-layers=32 --hidden-size=4096 --train-iters=1271565 --distributed-backend=ccl --weight-decay=0.1 --adam-beta1=0.9 --adam-beta2=0.95 --adam-eps=0.00001 --clip-grad=1.0 --num-attention-heads=32 --data-cac
he-path=checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/.cache/dolma/index-cache --data-file-list=/flare/datascience/foremans/projects/argonne-lcf/Megatron-DeepSpeed/ALCF/data-lists/aurora/dolma.txt
[!! NOTE] View output at:
logs/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/20250312-090212_24_x4716c2s4b0n0/output.log
Disabling local launch: multi-node application
Connected to tcp://x4716c2s3b0n0.hostmgmt2716.cm.aurora.alcf.anl.gov:7919
Launching application 9419d72d-4156-4bcd-a54e-8cd39acce81a
[2025-03-12 09:02:39,280] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to xpu (auto detect)
[2025-03-12 09:02:39,280] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to xpu (auto detect)
[2025-03-12 09:02:39,285] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to xpu (auto detect)
[2025-03-12 09:02:39,285] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to xpu (auto detect)
[2025-03-12 09:02:39,285] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to xpu (auto detect)
[2025-03-12 09:02:39,289] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to xpu (auto detect)
[2025-03-12 09:02:39,310] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to xpu (auto detect)
[2025-03-12 09:02:39,310] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to xpu (auto detect)
[2025-03-12 09:02:39,311] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to xpu (auto detect)
[2025-03-12 09:02:39,311] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to xpu (auto detect)
[2025-03-12 09:02:39,311] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to xpu (auto detect)
[2025-03-12 09:02:39,311] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to xpu (auto detect)
[2025-03-12 09:02:39,356] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to xpu (auto detect)
[2025-03-12 09:02:39,360] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to xpu (auto detect)
[2025-03-12 09:02:39,370] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to xpu (auto detect)
[2025-03-12 09:02:39,370] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to xpu (auto detect)
[2025-03-12 09:02:39,372] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to xpu (auto detect)
[2025-03-12 09:02:39,372] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to xpu (auto detect)
[2025-03-12 09:02:39,372] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to xpu (auto detect)
[2025-03-12 09:02:39,373] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to xpu (auto detect)
[2025-03-12 09:02:39,374] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to xpu (auto detect)
[2025-03-12 09:02:39,374] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to xpu (auto detect)
[2025-03-12 09:02:39,374] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to xpu (auto detect)
[2025-03-12 09:02:39,374] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to xpu (auto detect)
[2025-03-12 09:02:42,644] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to xpu (auto detect)
[2025-03-12 09:02:42,645] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to xpu (auto detect)
[2025-03-12 09:02:42,645] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to xpu (auto detect)
[2025-03-12 09:02:42,645] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to xpu (auto detect)
[2025-03-12 09:02:42,645] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to xpu (auto detect)
[2025-03-12 09:02:42,645] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to xpu (auto detect)
[2025-03-12 09:02:42,646] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to xpu (auto detect)
[2025-03-12 09:02:42,646] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to xpu (auto detect)
[2025-03-12 09:02:42,646] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to xpu (auto detect)
[2025-03-12 09:02:42,646] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to xpu (auto detect)
[2025-03-12 09:02:42,646] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to xpu (auto detect)
[2025-03-12 09:02:42,646] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to xpu (auto detect)
[2025-03-12 09:02:42,646] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to xpu (auto detect)
[2025-03-12 09:02:42,646] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to xpu (auto detect)
[2025-03-12 09:02:42,651] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to xpu (auto detect)
[2025-03-12 09:02:42,651] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to xpu (auto detect)
[2025-03-12 09:02:42,651] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to xpu (auto detect)
[2025-03-12 09:02:42,651] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to xpu (auto detect)
[2025-03-12 09:02:42,651] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to xpu (auto detect)
[2025-03-12 09:02:42,651] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to xpu (auto detect)
[2025-03-12 09:02:42,651] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to xpu (auto detect)
[2025-03-12 09:02:42,651] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to xpu (auto detect)
[2025-03-12 09:02:42,651] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to xpu (auto detect)
[2025-03-12 09:02:42,651] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to xpu (auto detect)
[2025-03-12 09:12:31,996] [INFO] [comm.py:161:init_deepspeed_backend] Initialize ccl backend
[2025-03-12 09:12:31,996] [INFO] [comm.py:658:init_distributed] cdb=None
[2025-03-12 09:12:31,997] [INFO] [comm.py:673:init_distributed] Not using the DeepSpeed or dist launchers, attempting to detect MPI environment...
[2025-03-12 09:12:31,996] [INFO] [comm.py:161:init_deepspeed_backend] Initialize ccl backend
[2025-03-12 09:12:31,997] [INFO] [comm.py:658:init_distributed] cdb=None
[2025-03-12 09:12:31,997] [INFO] [comm.py:673:init_distributed] Not using the DeepSpeed or dist launchers, attempting to detect MPI environment...
[2025-03-12 09:12:31,996] [INFO] [comm.py:161:init_deepspeed_backend] Initialize ccl backend
[2025-03-12 09:12:31,997] [INFO] [comm.py:658:init_distributed] cdb=None
[2025-03-12 09:12:31,997] [INFO] [comm.py:673:init_distributed] Not using the DeepSpeed or dist launchers, attempting to detect MPI environment...
[2025-03-12 09:12:31,997] [INFO] [comm.py:161:init_deepspeed_backend] Initialize ccl backend
[2025-03-12 09:12:31,997] [INFO] [comm.py:658:init_distributed] cdb=None
[2025-03-12 09:12:31,997] [INFO] [comm.py:673:init_distributed] Not using the DeepSpeed or dist launchers, attempting to detect MPI environment...
[2025-03-12 09:12:31,997] [INFO] [comm.py:161:init_deepspeed_backend] Initialize ccl backend
[2025-03-12 09:12:31,997] [INFO] [comm.py:658:init_distributed] cdb=None
[2025-03-12 09:12:31,997] [INFO] [comm.py:673:init_distributed] Not using the DeepSpeed or dist launchers, attempting to detect MPI environment...
[2025-03-12 09:12:31,997] [INFO] [comm.py:161:init_deepspeed_backend] Initialize ccl backend
[2025-03-12 09:12:31,997] [INFO] [comm.py:658:init_distributed] cdb=None
[2025-03-12 09:12:31,997] [INFO] [comm.py:673:init_distributed] Not using the DeepSpeed or dist launchers, attempting to detect MPI environment...
[2025-03-12 09:12:31,997] [INFO] [comm.py:161:init_deepspeed_backend] Initialize ccl backend
[2025-03-12 09:12:31,997] [INFO] [comm.py:658:init_distributed] cdb=None
[2025-03-12 09:12:31,997] [INFO] [comm.py:673:init_distributed] Not using the DeepSpeed or dist launchers, attempting to detect MPI environment...
[2025-03-12 09:12:31,996] [INFO] [comm.py:161:init_deepspeed_backend] Initialize ccl backend
[2025-03-12 09:12:31,997] [INFO] [comm.py:658:init_distributed] cdb=None
[2025-03-12 09:12:31,997] [INFO] [comm.py:673:init_distributed] Not using the DeepSpeed or dist launchers, attempting to detect MPI environment...
[2025-03-12 09:12:31,996] [INFO] [comm.py:161:init_deepspeed_backend] Initialize ccl backend
[2025-03-12 09:12:31,997] [INFO] [comm.py:658:init_distributed] cdb=None
[2025-03-12 09:12:31,997] [INFO] [comm.py:673:init_distributed] Not using the DeepSpeed or dist launchers, attempting to detect MPI environment...
[2025-03-12 09:12:31,996] [INFO] [comm.py:161:init_deepspeed_backend] Initialize ccl backend
[2025-03-12 09:12:31,997] [INFO] [comm.py:658:init_distributed] cdb=None
[2025-03-12 09:12:31,997] [INFO] [comm.py:673:init_distributed] Not using the DeepSpeed or dist launchers, attempting to detect MPI environment...
[2025-03-12 09:12:31,996] [INFO] [comm.py:161:init_deepspeed_backend] Initialize ccl backend
[2025-03-12 09:12:31,997] [INFO] [comm.py:658:init_distributed] cdb=None
[2025-03-12 09:12:31,997] [INFO] [comm.py:673:init_distributed] Not using the DeepSpeed or dist launchers, attempting to detect MPI environment...
[2025-03-12 09:12:31,997] [INFO] [comm.py:161:init_deepspeed_backend] Initialize ccl backend
[2025-03-12 09:12:31,998] [INFO] [comm.py:658:init_distributed] cdb=None
[2025-03-12 09:12:31,998] [INFO] [comm.py:673:init_distributed] Not using the DeepSpeed or dist launchers, attempting to detect MPI environment...
[2025-03-12 09:12:54,370] [INFO] [comm.py:161:init_deepspeed_backend] Initialize ccl backend
[2025-03-12 09:12:54,370] [INFO] [comm.py:161:init_deepspeed_backend] Initialize ccl backend
[2025-03-12 09:12:54,370] [INFO] [comm.py:658:init_distributed] cdb=None
[2025-03-12 09:12:54,370] [INFO] [comm.py:673:init_distributed] Not using the DeepSpeed or dist launchers, attempting to detect MPI environment...
[2025-03-12 09:12:54,370] [INFO] [comm.py:161:init_deepspeed_backend] Initialize ccl backend
[2025-03-12 09:12:54,370] [INFO] [comm.py:658:init_distributed] cdb=None
[2025-03-12 09:12:54,370] [INFO] [comm.py:673:init_distributed] Not using the DeepSpeed or dist launchers, attempting to detect MPI environment...
[2025-03-12 09:12:54,370] [INFO] [comm.py:161:init_deepspeed_backend] Initialize ccl backend
[2025-03-12 09:12:54,370] [INFO] [comm.py:658:init_distributed] cdb=None
[2025-03-12 09:12:54,370] [INFO] [comm.py:673:init_distributed] Not using the DeepSpeed or dist launchers, attempting to detect MPI environment...
[2025-03-12 09:12:54,370] [INFO] [comm.py:161:init_deepspeed_backend] Initialize ccl backend
[2025-03-12 09:12:54,370] [INFO] [comm.py:658:init_distributed] cdb=None
[2025-03-12 09:12:54,370] [INFO] [comm.py:673:init_distributed] Not using the DeepSpeed or dist launchers, attempting to detect MPI environment...
[2025-03-12 09:12:54,370] [INFO] [comm.py:161:init_deepspeed_backend] Initialize ccl backend
[2025-03-12 09:12:54,370] [INFO] [comm.py:658:init_distributed] cdb=None
[2025-03-12 09:12:54,370] [INFO] [comm.py:673:init_distributed] Not using the DeepSpeed or dist launchers, attempting to detect MPI environment...
[2025-03-12 09:12:54,370] [INFO] [comm.py:658:init_distributed] cdb=None
[2025-03-12 09:12:54,370] [INFO] [comm.py:673:init_distributed] Not using the DeepSpeed or dist launchers, attempting to detect MPI environment...
[2025-03-12 09:12:54,370] [INFO] [comm.py:161:init_deepspeed_backend] Initialize ccl backend
[2025-03-12 09:12:54,370] [INFO] [comm.py:658:init_distributed] cdb=None
[2025-03-12 09:12:54,370] [INFO] [comm.py:673:init_distributed] Not using the DeepSpeed or dist launchers, attempting to detect MPI environment...
[2025-03-12 09:12:54,370] [INFO] [comm.py:161:init_deepspeed_backend] Initialize ccl backend
[2025-03-12 09:12:54,370] [INFO] [comm.py:658:init_distributed] cdb=None
[2025-03-12 09:12:54,370] [INFO] [comm.py:673:init_distributed] Not using the DeepSpeed or dist launchers, attempting to detect MPI environment...
[2025-03-12 09:12:54,370] [INFO] [comm.py:161:init_deepspeed_backend] Initialize ccl backend
[2025-03-12 09:12:54,370] [INFO] [comm.py:658:init_distributed] cdb=None
[2025-03-12 09:12:54,370] [INFO] [comm.py:673:init_distributed] Not using the DeepSpeed or dist launchers, attempting to detect MPI environment...
[2025-03-12 09:12:54,370] [INFO] [comm.py:161:init_deepspeed_backend] Initialize ccl backend
[2025-03-12 09:12:54,370] [INFO] [comm.py:658:init_distributed] cdb=None
[2025-03-12 09:12:54,370] [INFO] [comm.py:673:init_distributed] Not using the DeepSpeed or dist launchers, attempting to detect MPI environment...
[2025-03-12 09:12:54,371] [INFO] [comm.py:161:init_deepspeed_backend] Initialize ccl backend
[2025-03-12 09:12:54,371] [INFO] [comm.py:658:init_distributed] cdb=None
[2025-03-12 09:12:54,371] [INFO] [comm.py:673:init_distributed] Not using the DeepSpeed or dist launchers, attempting to detect MPI environment...
[2025-03-12 09:12:54,370] [INFO] [comm.py:161:init_deepspeed_backend] Initialize ccl backend
[2025-03-12 09:12:54,370] [INFO] [comm.py:658:init_distributed] cdb=None
[2025-03-12 09:12:54,370] [INFO] [comm.py:673:init_distributed] Not using the DeepSpeed or dist launchers, attempting to detect MPI environment...
[2025-03-12 09:12:54,372] [INFO] [comm.py:724:mpi_discovery] Discovered MPI settings of world_rank=19, local_rank=7, world_size=24, master_addr=10.115.81.149, master_port=29500
[2025-03-12 09:12:54,372] [INFO] [comm.py:724:mpi_discovery] Discovered MPI settings of world_rank=13, local_rank=1, world_size=24, master_addr=10.115.81.149, master_port=29500
[2025-03-12 09:12:54,372] [INFO] [comm.py:724:mpi_discovery] Discovered MPI settings of world_rank=14, local_rank=2, world_size=24, master_addr=10.115.81.149, master_port=29500
[2025-03-12 09:12:54,372] [INFO] [comm.py:724:mpi_discovery] Discovered MPI settings of world_rank=15, local_rank=3, world_size=24, master_addr=10.115.81.149, master_port=29500
[2025-03-12 09:12:54,372] [INFO] [comm.py:724:mpi_discovery] Discovered MPI settings of world_rank=16, local_rank=4, world_size=24, master_addr=10.115.81.149, master_port=29500
[2025-03-12 09:12:54,372] [INFO] [comm.py:724:mpi_discovery] Discovered MPI settings of world_rank=17, local_rank=5, world_size=24, master_addr=10.115.81.149, master_port=29500
[2025-03-12 09:12:54,372] [INFO] [comm.py:724:mpi_discovery] Discovered MPI settings of world_rank=10, local_rank=10, world_size=24, master_addr=10.115.81.149, master_port=29500
[2025-03-12 09:12:54,372] [INFO] [comm.py:724:mpi_discovery] Discovered MPI settings of world_rank=18, local_rank=6, world_size=24, master_addr=10.115.81.149, master_port=29500
[2025-03-12 09:12:54,372] [INFO] [comm.py:724:mpi_discovery] Discovered MPI settings of world_rank=21, local_rank=9, world_size=24, master_addr=10.115.81.149, master_port=29500
[2025-03-12 09:12:54,372] [INFO] [comm.py:724:mpi_discovery] Discovered MPI settings of world_rank=22, local_rank=10, world_size=24, master_addr=10.115.81.149, master_port=29500
[2025-03-12 09:12:54,372] [INFO] [comm.py:724:mpi_discovery] Discovered MPI settings of world_rank=23, local_rank=11, world_size=24, master_addr=10.115.81.149, master_port=29500
[2025-03-12 09:12:54,372] [INFO] [comm.py:724:mpi_discovery] Discovered MPI settings of world_rank=12, local_rank=0, world_size=24, master_addr=10.115.81.149, master_port=29500
[2025-03-12 09:12:54,372] [INFO] [comm.py:724:mpi_discovery] Discovered MPI settings of world_rank=20, local_rank=8, world_size=24, master_addr=10.115.81.149, master_port=29500
[2025-03-12 09:12:54,372] [INFO] [comm.py:724:mpi_discovery] Discovered MPI settings of world_rank=0, local_rank=0, world_size=24, master_addr=10.115.81.149, master_port=29500
[2025-03-12 09:12:54,372] [INFO] [comm.py:689:init_distributed] Initializing TorchBackend in DeepSpeed with backend ccl
[2025-03-12 09:12:54,372] [INFO] [comm.py:724:mpi_discovery] Discovered MPI settings of world_rank=1, local_rank=1, world_size=24, master_addr=10.115.81.149, master_port=29500
[2025-03-12 09:12:54,372] [INFO] [comm.py:724:mpi_discovery] Discovered MPI settings of world_rank=2, local_rank=2, world_size=24, master_addr=10.115.81.149, master_port=29500
[2025-03-12 09:12:54,372] [INFO] [comm.py:724:mpi_discovery] Discovered MPI settings of world_rank=3, local_rank=3, world_size=24, master_addr=10.115.81.149, master_port=29500
[2025-03-12 09:12:54,372] [INFO] [comm.py:724:mpi_discovery] Discovered MPI settings of world_rank=4, local_rank=4, world_size=24, master_addr=10.115.81.149, master_port=29500
[2025-03-12 09:12:54,372] [INFO] [comm.py:724:mpi_discovery] Discovered MPI settings of world_rank=5, local_rank=5, world_size=24, master_addr=10.115.81.149, master_port=29500
[2025-03-12 09:12:54,372] [INFO] [comm.py:724:mpi_discovery] Discovered MPI settings of world_rank=6, local_rank=6, world_size=24, master_addr=10.115.81.149, master_port=29500
[2025-03-12 09:12:54,372] [INFO] [comm.py:724:mpi_discovery] Discovered MPI settings of world_rank=7, local_rank=7, world_size=24, master_addr=10.115.81.149, master_port=29500
[2025-03-12 09:12:54,372] [INFO] [comm.py:724:mpi_discovery] Discovered MPI settings of world_rank=8, local_rank=8, world_size=24, master_addr=10.115.81.149, master_port=29500
[2025-03-12 09:12:54,372] [INFO] [comm.py:724:mpi_discovery] Discovered MPI settings of world_rank=9, local_rank=9, world_size=24, master_addr=10.115.81.149, master_port=29500
[2025-03-12 09:12:54,372] [INFO] [comm.py:724:mpi_discovery] Discovered MPI settings of world_rank=11, local_rank=11, world_size=24, master_addr=10.115.81.149, master_port=29500
[2025-03-12 09:12:54][I][ezpz/dist:895] ['x4716c2s3b0n0'][10/23]
[2025-03-12 09:12:54][I][ezpz/dist:895] ['x4716c2s4b0n0'][20/23]
[2025-03-12 09:12:54][I][ezpz/dist:895] ['x4716c2s3b0n0'][ 5/23]
[2025-03-12 09:12:54][I][ezpz/dist:895] ['x4716c2s4b0n0'][15/23]
[2025-03-12 09:12:54][I][ezpz/dist:895] ['x4716c2s4b0n0'][22/23]
[2025-03-12 09:12:54][I][ezpz/dist:895] ['x4716c2s3b0n0'][ 1/23]
[2025-03-12 09:12:54][I][ezpz/dist:895] ['x4716c2s3b0n0'][ 7/23]
[2025-03-12 09:12:54][I][ezpz/dist:895] ['x4716c2s3b0n0'][11/23]
[2025-03-12 09:12:54][I][ezpz/dist:895] ['x4716c2s4b0n0'][14/23]
[2025-03-12 09:12:54][I][ezpz/dist:895] ['x4716c2s3b0n0'][ 9/23]
[2025-03-12 09:12:54][I][ezpz/dist:895] ['x4716c2s3b0n0'][ 4/23]
[2025-03-12 09:12:54][I][ezpz/dist:895] ['x4716c2s4b0n0'][23/23]
[2025-03-12 09:12:54][I][ezpz/dist:895] ['x4716c2s4b0n0'][19/23]
[2025-03-12 09:12:54][I][ezpz/dist:895] ['x4716c2s3b0n0'][ 6/23]
[2025-03-12 09:12:54][I][ezpz/dist:895] ['x4716c2s4b0n0'][21/23]
[2025-03-12 09:12:54][I][ezpz/dist:895] ['x4716c2s4b0n0'][12/23]
[2025-03-12 09:12:54][I][ezpz/dist:895] ['x4716c2s4b0n0'][13/23]
[2025-03-12 09:12:54][I][ezpz/dist:895] ['x4716c2s4b0n0'][16/23]
[2025-03-12 09:12:54][I][ezpz/dist:895] ['x4716c2s4b0n0'][17/23]
[2025-03-12 09:12:54][I][ezpz/dist:895] ['x4716c2s4b0n0'][18/23]
[2025-03-12 09:12:54][I][ezpz/dist:895] ['x4716c2s3b0n0'][ 2/23]
[2025-03-12 09:12:54][I][ezpz/dist:895] ['x4716c2s3b0n0'][ 8/23]
[2025-03-12 09:12:54][I][ezpz/dist:895] ['x4716c2s3b0n0'][ 3/23]
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
runtime if needed. Op compatibility means that your system
meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [OKAY]
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
deepspeed_not_implemented [NO] ....... [OKAY]
[WARNING] async_io requires the dev libaio .so object and headers but these were not found.
[WARNING] If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [NO] ....... [NO]
cpu_adagrad ............ [NO] ....... [OKAY]
cpu_adam ............... [NO] ....... [OKAY]
flash_attn ............. [NO] ....... [OKAY]
fused_adam ............. [NO] ....... [OKAY]
transformer_inference .. [NO] ....... [OKAY]
pack_bits .............. [NO] ....... [OKAY]
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/opt/aurora/24.180.3/frameworks/aurora_nre_models_frameworks-2024.2.1_u1/lib/python3.10/site-packages/torch']
torch version .................... 2.3.1+cxx11.abi
deepspeed install path ........... ['/lus/flare/projects/Aurora_deployment/foremans/projects/deepspeedai/Megatron-DeepSpeed/deps/DeepSpeed/deepspeed']
deepspeed info ................... 0.16.4+9f1ac32c, 9f1ac32c, saforem2/ucp-bug
deepspeed wheel compiled w. ...... torch 2.3
shared memory (/dev/shm) size .... 503.18 GB
[2025-03-12 09:12:54][I][ezpz/configs:286] **** Git info for DeepSpeed: git_hash=8098a708 git_branch=main ****
[2025-03-12 09:12:54][I][ezpz/dist:845] Using device='xpu' with backend='deepspeed' + 'ccl' for distributed training.
[2025-03-12 09:12:54][I][ezpz/dist:895] ['x4716c2s3b0n0'][ 0/23]
[2025-03-12 09:12:54][I][Megatron-DeepSpeed/pretrain_gpt_alcf:69:__main__] Import python modules in 603.5460715293884 seconds
[2025-03-12 09:12:54][I][Megatron-DeepSpeed/pretrain_gpt_alcf:70:__main__] ez.setup_torch time: 22.705841779708862 seconds
[2025-03-12 09:12:54][I][Megatron-DeepSpeed/pretrain_gpt_alcf:80:__main__] Setting up W&B from: 0 with AuroraGPT
[2025-03-12 09:12:54][I][ezpz/dist:1071] Setting up wandb from rank=0
[2025-03-12 09:12:54][I][ezpz/dist:1072] Using=WB PROJECT=AuroraGPT
wandb: Currently logged in as: foremans (aurora_gpt) to https://api.wandb.ai. Use `wandb login --relogin` to force relogin
wandb: Using wandb-core as the SDK backend. Please refer to https://wandb.me/wandb-core for more information.
2025-03-12 09:12:56.509920: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2025-03-12 09:12:56.509946: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2025-03-12 09:12:56.578118: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
wandb: Tracking run with wandb version 0.19.6
wandb: Run data is saved locally in /lus/flare/projects/datascience/foremans/projects/argonne-lcf/Megatron-DeepSpeed/wandb/run-20250312_091254-by2ozrz3
wandb: Run `wandb offline` to turn off syncing.
wandb: Syncing run misty-deluge-1402
wandb: ⭐️ View project at https://wandb.ai/aurora_gpt/AuroraGPT
wandb: 🚀 View run at https://wandb.ai/aurora_gpt/AuroraGPT/runs/by2ozrz
2025-03-12 09:12:57.924088: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
[2025-03-12 09:12:58][I][ezpz/dist:1097] W&B RUN=[misty-deluge-1402](https://wandb.ai/aurora_gpt/AuroraGPT/runs/by2ozrz3)
[2025-03-12 09:12:58][I][ezpz/dist:301] Updating wandb.run: misty-deluge-1402 config with "DIST_INFO"
[2025-03-12 09:12:58][I][ezpz/dist:1142] Running on machine='Aurora'
using world size: 24, data-parallel-size: 24, sequence-parallel size: 1, tensor-model-parallel size: 1, pipeline-model-parallel size: 1
using torch.bfloat16 for parameters ...
------------------------ arguments ------------------------
accumulate_allreduce_grads_in_fp32 .............. True
adam_beta1 ...................................... 0.9
adam_beta2 ...................................... 0.95
adam_eps ........................................ 1e-05
add_bias_linear ................................. False
add_position_embedding .......................... False
adlr_autoresume ................................. False
adlr_autoresume_interval ........................ 1000
aml_data_download_path .......................... None
apply_layernorm_1p .............................. False
apply_query_key_layer_scaling ................... False
apply_residual_connection_post_layernorm ........ False
async_tensor_model_parallel_allreduce ........... False
attention_dropout ............................... 0.0
attention_softmax_in_fp32 ....................... False
barrier_with_L1_time ............................ True
bert_binary_head ................................ True
bert_embedder_type .............................. megatron
bert_load ....................................... None
bf16 ............................................ True
bias_dropout_fusion ............................. False
bias_gelu_fusion ................................ False
biencoder_projection_dim ........................ 0
biencoder_shared_query_context_model ............ False
blend_sample_in_corpus .......................... True
block_data_path ................................. None
checkpoint_activations .......................... False
checkpoint_in_cpu ............................... False
checkpoint_num_layers ........................... 1
classes_fraction ................................ 1.0
clip_grad ....................................... 1.0
compression_training ............................ False
consumed_train_samples .......................... 0
consumed_train_tokens ........................... 0
consumed_valid_samples .......................... 0
contigious_checkpointing ........................ False
cpu_optimizer ................................... False
cpu_torch_adam .................................. False
create_moe_param_group .......................... False
curriculum_learning_legacy ...................... False
data_cache_path ................................. checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/.cache/dolma/index-cache
data_efficiency_curriculum_learning ............. False
data_file_list .................................. /flare/datascience/foremans/projects/argonne-lcf/Megatron-DeepSpeed/ALCF/data-lists/aurora/dolma.txt
data_impl ....................................... infer
data_parallel_random_init ....................... False
data_parallel_size .............................. 24
data_path ....................................... None
data_per_class_fraction ......................... 1.0
data_sharding ................................... True
dataloader_type ................................. single
DDP_impl ........................................ local
decoder_num_layers .............................. None
decoder_seq_length .............................. None
deepscale ....................................... False
deepscale_config ................................ None
deepspeed ....................................... True
deepspeed_activation_checkpointing .............. False
deepspeed_config ................................ /flare/datascience/foremans/projects/argonne-lcf/Megatron-DeepSpeed/ds-configs/ds_stage1_mb1_gb384_pp1_bf16.json
dino_bottleneck_size ............................ 256
dino_freeze_last_layer .......................... 1
dino_head_hidden_size ........................... 2048
dino_local_crops_number ......................... 10
dino_local_img_size ............................. 96
dino_norm_last_layer ............................ False
dino_teacher_temp ............................... 0.07
dino_warmup_teacher_temp ........................ 0.04
dino_warmup_teacher_temp_epochs ................. 30
distribute_checkpointed_activations ............. False
distribute_saved_activations .................... False
distributed_backend ............................. ccl
distributed_timeout_minutes ..................... 10
ds_fused_adam ................................... False
ds_inference .................................... False
ds_pipeline_enabled ............................. False
ds_sequence_parallel_size ....................... 1
embedding_path .................................. None
embedding_weights_in_fp32 ....................... False
empty_unused_memory_level ....................... 0
enable_expert_tensor_parallelism ................ False
enable_zbh1_exact_semantics ..................... False
enable_zbh1_pipeline ............................ False
encoder_num_layers .............................. 32
encoder_seq_length .............................. 4096
end_weight_decay ................................ 0.1
eod_mask_loss ................................... False
eval_interval ................................... 100
eval_iters ...................................... 20
evidence_data_path .............................. None
exit_duration_in_mins ........................... None
exit_interval ................................... None
exit_on_missing_checkpoint ...................... False
exit_signal_handler ............................. False
expert_interval ................................. 2
ffn_hidden_size ................................. 11008
finetune ........................................ False
force_ds_sequence_parallel ...................... False
fp16 ............................................ False
fp16_lm_cross_entropy ........................... False
fp32_residual_connection ........................ False
fp8_amax_compute_algo ........................... most_recent
fp8_amax_history_len ............................ 1
fp8_e4m3 ........................................ False
fp8_hybrid ...................................... False
fp8_interval .................................... 1
fp8_margin ...................................... 0
fp8_wgrad ....................................... True
global_batch_size ............................... 384
gradient_accumulation_fusion .................... False
head_lr_mult .................................... 1.0
hidden_dropout .................................. 0.0
hidden_size ..................................... 4096
hidden_size_teacher ............................. None
hysteresis ...................................... 2
ict_head_size ................................... None
ict_load ........................................ None
img_h ........................................... 224
img_w ........................................... 224
indexer_batch_size .............................. 128
indexer_log_interval ............................ 1000
inference ....................................... False
inference_batch_times_seqlen_threshold .......... 512
init_method_std ................................. 0.02
init_method_xavier_uniform ...................... False
initial_loss_scale .............................. 4294967296
iter_per_epoch .................................. 1250
kd .............................................. False
kd_alpha_ce ..................................... 1
kd_beta_ce ...................................... 1
kd_temp ......................................... 1.0
kill_switch_file ................................ None
kv_channels ..................................... 128
layernorm_epsilon ............................... 1e-05
lazy_mpu_init ................................... None
load ............................................ checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash
load_tag ........................................ None
load_teacher .................................... None
local_rank ...................................... None
log_batch_size_to_tensorboard ................... False
log_interval .................................... 1
log_learning_rate_to_tensorboard ................ True
log_loss_scale_to_tensorboard ................... True
log_memory_to_tensorboard ....................... False
log_num_zeros_in_grad ........................... False
log_optimizer_states_to_tensorboard ............. True
log_params_norm ................................. False
log_timers_to_tensorboard ....................... True
log_validation_ppl_to_tensorboard ............... False
log_world_size_to_tensorboard ................... False
loss_scale ...................................... None
loss_scale_window ............................... 1000
lr .............................................. 0.0002
lr_decay_iters .................................. None
lr_decay_samples ................................ None
lr_decay_style .................................. cosine
lr_decay_tokens ................................. None
lr_warmup_fraction .............................. 0.05
lr_warmup_iters ................................. 0
lr_warmup_samples ............................... 0
lr_warmup_tokens ................................ None
make_vocab_size_divisible_by .................... 128
mask_factor ..................................... 1.0
mask_prob ....................................... 0.15
mask_type ....................................... random
masked_softmax_fusion ........................... False
max_position_embeddings ......................... 4096
max_tokens_to_oom ............................... 12000
mem_efficient_ln ................................ True
memory_centric_tiled_linear ..................... False
merge_file ...................................... None
micro_batch_size ................................ 1
min_loss_scale .................................. 1.0
min_lr .......................................... 0.0
mlp_type ........................................ standard
mmap_warmup ..................................... False
moe_eval_capacity_factor ........................ 1.0
moe_expert_parallel_size ........................ 1
moe_loss_coeff .................................. 0.1
moe_min_capacity ................................ 4
moe_token_dropping .............................. True
moe_top2_2nd_expert_sampling .................... True
moe_train_capacity_factor ....................... 1.0
mos ............................................. False
multiprocessing_context ......................... fork
no_load_lr_state ................................ False
no_load_optim ................................... None
no_load_rng ..................................... None
no_persist_layer_norm ........................... False
no_pipeline_parallel ............................ True
no_save_optim ................................... None
no_save_rng ..................................... None
normalization ................................... rmsnorm
num_attention_heads ............................. 32
num_attention_heads_teacher ..................... None
num_channels .................................... 3
num_classes ..................................... 1000
num_experts ..................................... [1]
num_experts_switch .............................. None
num_experts_teacher ............................. [1]
num_key_value_heads ............................. 8
num_layers ...................................... 32
num_layers_per_virtual_pipeline_stage ........... None
num_layers_teacher .............................. None
num_workers ..................................... 2
onnx_safe ....................................... None
openai_gelu ..................................... False
optimizer ....................................... adamw
output_bert_embeddings .......................... False
overlap_p2p_comm ................................ False
override_opt_param_scheduler .................... False
params_dtype .................................... torch.bfloat16
partition_activations ........................... False
patch_dim ....................................... 16
perform_initialization .......................... True
pipeline_model_parallel_size .................... 1
pipeline_model_parallel_split_rank .............. None
profile ......................................... None
profile_backward ................................ False
profile_ranks ................................... None
profile_steps ................................... 2,3
query_in_block_prob ............................. 0.1
rampup_batch_size ............................... None
random_ltd ...................................... False
rank ............................................ 0
recompute_granularity ........................... None
recompute_method ................................ None
recompute_num_layers ............................ 1
remote_device ................................... none
repeated_dataloader ............................. False
reset_attention_mask ............................ False
reset_iteration ................................. False
reset_position_ids .............................. False
retriever_report_topk_accuracies ................ []
retriever_score_scaling ......................... False
retriever_seq_length ............................ 256
retro_add_retriever ............................. False
retro_cyclic_train_iters ........................ None
retro_encoder_attention_dropout ................. 0.1
retro_encoder_hidden_dropout .................... 0.1
retro_encoder_layers ............................ 2
retro_num_neighbors ............................. 2
retro_num_retrieved_chunks ...................... 2
retro_return_doc_ids ............................ False
retro_workdir ................................... None
return_data_index ............................... False
rope_theta ...................................... 10000
rotary_percent .................................. 1.0
sample_rate ..................................... 1.0
save ............................................ checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash
save_interval ................................... 50
scatter_gather_tensors_in_pipeline .............. True
scattered_embeddings ............................ False
schedulefree_for_each ........................... False
seed ............................................ 1234
seq_length ...................................... 4096
sequence_parallel ............................... False
sgd_momentum .................................... 0.9
short_seq_prob .................................. 0.1
shuffle_sample_in_corpus ........................ True
skip_train ...................................... False
sophiag_beta1 ................................... 0.9
sophiag_beta2 ................................... 0.95
sophiag_rho ..................................... 0.01
split ........................................... 990,10,0
split_transformers .............................. False
squared_relu .................................... False
standalone_embedding_stage ...................... False
start_weight_decay .............................. 0.1
swiglu .......................................... True
swin_backbone_type .............................. tiny
synchronize_each_layer .......................... False
tensor_model_parallel_size ...................... 1
tensorboard_dir ................................. checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/tensorboard
tensorboard_log_interval ........................ 1
tensorboard_queue_size .......................... 1000
test_data_path .................................. None
tile_factor ..................................... 1
timing_log_level ................................ 1
timing_log_option ............................... minmax
titles_data_path ................................ None
tokenizer_model ................................. /flare/datascience/foremans/projects/argonne-lcf/Megatron-DeepSpeed/ALCF/tokenizer.model
tokenizer_type .................................. Llama2Tokenizer
topk ............................................ 1
trace_dir ....................................... ./trace/
train_data_exact_num_epochs ..................... None
train_data_path ................................. None
train_desc_path ................................. None
train_doc_idx_path .............................. None
train_idx_path .................................. None
train_iters ..................................... 1271565
train_iters_to_skip ............................. None
train_range_to_skip ............................. None
train_sample_idx_path ........................... None
train_samples ................................... None
train_shuffle_idx_path .......................... None
train_tokens .................................... None
transformer_impl ................................ local
transformer_pipeline_model_parallel_size ........ 1
trust_remote_code ............................... False
universal_checkpoint ............................ False
untie_embeddings_and_output_weights ............. True
use_checkpoint_args ............................. False
use_checkpoint_opt_param_scheduler .............. True
use_contiguous_buffers_in_local_ddp ............. True
use_cpu_initialization .......................... None
use_dataset_only ................................ False
use_distributed_optimizer ....................... False
use_flash_attn .................................. True
use_flash_attn_builder .......................... True
use_flash_attn_triton ........................... False
use_flash_attn_v1 ............................... False
use_flash_attn_v2 ............................... False
use_mics ........................................ False
use_one_sent_docs ............................... False
use_pin_memory .................................. False
use_ring_exchange_p2p ........................... False
use_rotary_position_embeddings .................. True
use_tutel ....................................... False
valid_data_path ................................. None
variable_seq_lengths ............................ False
virtual_pipeline_model_parallel_size ............ None
vision_backbone_type ............................ vit
vision_pretraining .............................. False
vision_pretraining_type ......................... classify
vocab_extra_ids ................................. 0
vocab_file ...................................... None
vocab_size ...................................... None
wandb_exp_name ..................................
wandb_project ...................................
wandb_save_dir ..................................
weight_decay .................................... 0.1
weight_decay_incr_style ......................... constant
world_size ...................................... 24
zero_allgather_bucket_size ...................... 0.0
zero_contigious_gradients ....................... False
zero_reduce_bucket_size ......................... 0.0
zero_reduce_scatter ............................. False
zero_stage ...................................... 1
-------------------- end of arguments ---------------------
setting number of micro-batches to constant 16
> building Llama2Tokenizer tokenizer ...
> padded vocab (size: 32000) with 0 dummy tokens (new size: 32000)
torch distributed is already initialized, skipping initialization ...
> initialized tensor model parallel with size 1
> initialized pipeline model parallel with size 1
> setting random seeds to 1234 ...
> initializing model parallel cuda seeds on global rank 0, model parallel rank 0, and data parallel rank 0 with model parallel seed: 3952 and data parallel seed: 1234
make: Entering directory '/lus/flare/projects/datascience/foremans/projects/argonne-lcf/Megatron-DeepSpeed/megatron/data'
make: Nothing to be done for 'default'.
make: Leaving directory '/lus/flare/projects/datascience/foremans/projects/argonne-lcf/Megatron-DeepSpeed/megatron/data'
> compiling dataset index builder ...
>>> done with dataset index builder. Compilation time: 0.146 seconds
>fused kernel is only supported in cuda, skip loading fused kernel
[2025-03-12 09:12:58][I][megatron/training:185] time to finish initialize_megatron: 26.979411840438843 seconds
2025:03:12-09:12:58:(18380) |CCL_WARN| value of CCL_KVS_MODE changed to be mpi (default:pmi)
2025:03:12-09:12:58:(18380) |CCL_WARN| value of CCL_KVS_CONNECTION_TIMEOUT changed to be 3600 (default:120)
2025:03:12-09:12:58:(18380) |CCL_WARN| value of CCL_BCAST changed to be double_tree (default:)
2025:03:12-09:12:58:(18380) |CCL_WARN| value of CCL_ENABLE_SYCL_KERNELS changed to be 1 (default:0)
2025:03:12-09:12:58:(18380) |CCL_WARN| value of CCL_SYCL_ESIMD changed to be 1 (default:0)
2025:03:12-09:12:58:(18380) |CCL_WARN| value of CCL_PROCESS_LAUNCHER changed to be pmix (default:hydra)
2025:03:12-09:12:58:(18380) |CCL_WARN| value of CCL_ZE_CACHE_OPEN_IPC_HANDLES_THRESHOLD changed to be 32768 (default:1000)
2025:03:12-09:12:58:(18380) |CCL_WARN| CCL_ALLGATHERV_MEDIUM_SIZE_THRESHOLD=0 is unknown to and unused by oneCCL code but is present in the environment, check if it is not mistyped.
2025:03:12-09:12:58:(18380) |CCL_WARN| CCL_SKIP_SCHEDULER=1 is unknown to and unused by oneCCL code but is present in the environment, check if it is not mistyped.
2025-03-12 09:13:00.634801: W external/local_tsl/tsl/lib/monitoring/collection_registry.cc:81] Trying to register 2 metrics with the same name: /tensorflow/core/bfc_allocator_delay. The old value will be erased in order to register a new one. Please check if you link the metric more than once, or if the name is already used by other metrics.
2025-03-12 09:13:00.658209: W external/local_tsl/tsl/lib/monitoring/collection_registry.cc:81] Trying to register 2 metrics with the same name: /xla/service/gpu/compiled_programs_count. The old value will be erased in order to register a new one. Please check if you link the metric more than once, or if the name is already used by other metrics.
2025-03-12 09:13:00.680012: W external/local_tsl/tsl/lib/monitoring/collection_registry.cc:81] Trying to register 2 metrics with the same name: /jax/pjrt/pjrt_executable_executions. The old value will be erased in order to register a new one. Please check if you link the metric more than once, or if the name is already used by other metrics.
2025-03-12 09:13:00.680028: W external/local_tsl/tsl/lib/monitoring/collection_registry.cc:81] Trying to register 2 metrics with the same name: /jax/pjrt/pjrt_executable_execution_time_usecs. The old value will be erased in order to register a new one. Please check if you link the metric more than once, or if the name is already used by other metrics.
2025-03-12 09:13:03.431071: I itex/core/wrapper/itex_gpu_wrapper.cc:38] Intel Extension for Tensorflow* GPU backend is loaded.
2025-03-12 09:13:03.507480: I itex/core/devices/gpu/itex_gpu_runtime.cc:130] Selected platform: Intel(R) Level-Zero
2025-03-12 09:13:03.507832: I itex/core/devices/gpu/itex_gpu_runtime.cc:155] number of sub-devices is zero, expose root device.
2025-03-12 09:13:03.507834: I itex/core/devices/gpu/itex_gpu_runtime.cc:155] number of sub-devices is zero, expose root device.
2025-03-12 09:13:03.507836: I itex/core/devices/gpu/itex_gpu_runtime.cc:155] number of sub-devices is zero, expose root device.
2025-03-12 09:13:03.507838: I itex/core/devices/gpu/itex_gpu_runtime.cc:155] number of sub-devices is zero, expose root device.
2025-03-12 09:13:03.507839: I itex/core/devices/gpu/itex_gpu_runtime.cc:155] number of sub-devices is zero, expose root device.
2025-03-12 09:13:03.507841: I itex/core/devices/gpu/itex_gpu_runtime.cc:155] number of sub-devices is zero, expose root device.
2025-03-12 09:13:03.507843: I itex/core/devices/gpu/itex_gpu_runtime.cc:155] number of sub-devices is zero, expose root device.
2025-03-12 09:13:03.507845: I itex/core/devices/gpu/itex_gpu_runtime.cc:155] number of sub-devices is zero, expose root device.
2025-03-12 09:13:03.507846: I itex/core/devices/gpu/itex_gpu_runtime.cc:155] number of sub-devices is zero, expose root device.
2025-03-12 09:13:03.507848: I itex/core/devices/gpu/itex_gpu_runtime.cc:155] number of sub-devices is zero, expose root device.
2025-03-12 09:13:03.507849: I itex/core/devices/gpu/itex_gpu_runtime.cc:155] number of sub-devices is zero, expose root device.
2025-03-12 09:13:03.507851: I itex/core/devices/gpu/itex_gpu_runtime.cc:155] number of sub-devices is zero, expose root device.
> setting tensorboard ...
WARNING: WANDB writing requested but no legit wandb project or experiment name provided, therefore no WANDB logs will be written according to random generated project or experiment name.
[2025-03-12 09:13:13][I][megatron/training:193] allreduce call time: 14.624404430389404 seconds
[2025-03-12 09:13:13][I][megatron/training:195] time to initialize megatron (seconds)=41.767
[2025-03-12 09:13:13][I][megatron/training:96] [after megatron is initialized] datetime=2025-03-12 09:13:13
[2025-03-12 09:13:13][I][Megatron-DeepSpeed/pretrain_gpt_alcf:87:__main__] building GPT model ...
[2025-03-12 09:13:13,668] [INFO] [utils.py:781:see_memory_usage] Before Building Model
[2025-03-12 09:13:13,668] [INFO] [utils.py:782:see_memory_usage] MA 0.0 GB Max_MA 0.0 GB CA 0.0 GB Max_CA 0 GB
[2025-03-12 09:13:13,668] [INFO] [utils.py:789:see_memory_usage] CPU Virtual Memory: used = 38.34 GB, percent = 3.4%
>fused kernel is only supported in cuda, skip loading fused kernel
[2025-03-12 09:13:13,975] [INFO] [config.py:734:__init__] Config mesh_device None world_size = 24
[2025-03-12 09:13:13][I][Megatron-DeepSpeed/pretrain_gpt_alcf:147:__main__] --------------------------------------------------------------------------------
[2025-03-12 09:13:13][I][Megatron-DeepSpeed/pretrain_gpt_alcf:148:__main__] Number of parameters in model: 5933109248
[2025-03-12 09:13:13][I][Megatron-DeepSpeed/pretrain_gpt_alcf:149:__main__] --------------------------------------------------------------------------------
[2025-03-12 09:13:14,183] [INFO] [utils.py:781:see_memory_usage] After Building Model
[2025-03-12 09:13:14,184] [INFO] [utils.py:782:see_memory_usage] MA 11.05 GB Max_MA 11.05 GB CA 11.05 GB Max_CA 11 GB
[2025-03-12 09:13:14,184] [INFO] [utils.py:789:see_memory_usage] CPU Virtual Memory: used = 38.41 GB, percent = 3.4%
[2025-03-12 09:13:14][I][Megatron-DeepSpeed/pretrain_gpt_alcf:157:__main__] Patching tensorboard from checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/tensorboard
>fused kernel is only supported in cuda, skip loading fused kernel
[2025-03-12 09:13:14,226] [INFO] [config.py:734:__init__] Config mesh_device None world_size = 24
>fused kernel is only supported in cuda, skip loading fused kernel
[2025-03-12 09:13:14,379] [INFO] [config.py:734:__init__] Config mesh_device None world_size = 24
>fused kernel is only supported in cuda, skip loading fused kernel
[2025-03-12 09:13:14,426] [INFO] [config.py:734:__init__] Config mesh_device None world_size = 24
>fused kernel is only supported in cuda, skip loading fused kernel
[2025-03-12 09:13:14,456] [INFO] [config.py:734:__init__] Config mesh_device None world_size = 24
>fused kernel is only supported in cuda, skip loading fused kernel
[2025-03-12 09:13:14,530] [INFO] [config.py:734:__init__] Config mesh_device None world_size = 24
>fused kernel is only supported in cuda, skip loading fused kernel
[2025-03-12 09:13:14,560] [INFO] [config.py:734:__init__] Config mesh_device None world_size = 24
>fused kernel is only supported in cuda, skip loading fused kernel
[2025-03-12 09:13:14,561] [INFO] [config.py:734:__init__] Config mesh_device None world_size = 24
>fused kernel is only supported in cuda, skip loading fused kernel
[2025-03-12 09:13:14,571] [INFO] [config.py:734:__init__] Config mesh_device None world_size = 24
>fused kernel is only supported in cuda, skip loading fused kernel
[2025-03-12 09:13:14,571] [INFO] [config.py:734:__init__] Config mesh_device None world_size = 24
>fused kernel is only supported in cuda, skip loading fused kernel
[2025-03-12 09:13:14,577] [INFO] [config.py:734:__init__] Config mesh_device None world_size = 24
>fused kernel is only supported in cuda, skip loading fused kernel
[2025-03-12 09:13:14,586] [INFO] [config.py:734:__init__] Config mesh_device None world_size = 24
>fused kernel is only supported in cuda, skip loading fused kernel
[2025-03-12 09:13:14,587] [INFO] [config.py:734:__init__] Config mesh_device None world_size = 24
>fused kernel is only supported in cuda, skip loading fused kernel
[2025-03-12 09:13:14,587] [INFO] [config.py:734:__init__] Config mesh_device None world_size = 24
>fused kernel is only supported in cuda, skip loading fused kernel
[2025-03-12 09:13:14,588] [INFO] [config.py:734:__init__] Config mesh_device None world_size = 24
>fused kernel is only supported in cuda, skip loading fused kernel
[2025-03-12 09:13:14,594] [INFO] [config.py:734:__init__] Config mesh_device None world_size = 24
>fused kernel is only supported in cuda, skip loading fused kernel
[2025-03-12 09:13:14,599] [INFO] [config.py:734:__init__] Config mesh_device None world_size = 24
>fused kernel is only supported in cuda, skip loading fused kernel
[2025-03-12 09:13:14,602] [INFO] [config.py:734:__init__] Config mesh_device None world_size = 24
>fused kernel is only supported in cuda, skip loading fused kernel
[2025-03-12 09:13:14,608] [INFO] [config.py:734:__init__] Config mesh_device None world_size = 24
>fused kernel is only supported in cuda, skip loading fused kernel
[2025-03-12 09:13:14,618] [INFO] [config.py:734:__init__] Config mesh_device None world_size = 24
>fused kernel is only supported in cuda, skip loading fused kernel
[2025-03-12 09:13:14,619] [INFO] [config.py:734:__init__] Config mesh_device None world_size = 24
>fused kernel is only supported in cuda, skip loading fused kernel
[2025-03-12 09:13:14,620] [INFO] [config.py:734:__init__] Config mesh_device None world_size = 24
>fused kernel is only supported in cuda, skip loading fused kernel
[2025-03-12 09:13:14,621] [INFO] [config.py:734:__init__] Config mesh_device None world_size = 24
2025-03-12 09:13:15.568542: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2025-03-12 09:13:15.568571: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2025-03-12 09:13:15.636421: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2025-03-12 09:13:16.987447: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
2025-03-12 09:13:19.784258: W external/local_tsl/tsl/lib/monitoring/collection_registry.cc:81] Trying to register 2 metrics with the same name: /tensorflow/core/bfc_allocator_delay. The old value will be erased in order to register a new one. Please check if you link the metric more than once, or if the name is already used by other metrics.
2025-03-12 09:13:19.807894: W external/local_tsl/tsl/lib/monitoring/collection_registry.cc:81] Trying to register 2 metrics with the same name: /xla/service/gpu/compiled_programs_count. The old value will be erased in order to register a new one. Please check if you link the metric more than once, or if the name is already used by other metrics.
2025-03-12 09:13:19.829831: W external/local_tsl/tsl/lib/monitoring/collection_registry.cc:81] Trying to register 2 metrics with the same name: /jax/pjrt/pjrt_executable_executions. The old value will be erased in order to register a new one. Please check if you link the metric more than once, or if the name is already used by other metrics.
2025-03-12 09:13:19.829852: W external/local_tsl/tsl/lib/monitoring/collection_registry.cc:81] Trying to register 2 metrics with the same name: /jax/pjrt/pjrt_executable_execution_time_usecs. The old value will be erased in order to register a new one. Please check if you link the metric more than once, or if the name is already used by other metrics.
2025-03-12 09:13:22.580672: I itex/core/wrapper/itex_gpu_wrapper.cc:38] Intel Extension for Tensorflow* GPU backend is loaded.
2025-03-12 09:13:22.678320: I itex/core/devices/gpu/itex_gpu_runtime.cc:130] Selected platform: Intel(R) Level-Zero
2025-03-12 09:13:22.678675: I itex/core/devices/gpu/itex_gpu_runtime.cc:155] number of sub-devices is zero, expose root device.
2025-03-12 09:13:22.678679: I itex/core/devices/gpu/itex_gpu_runtime.cc:155] number of sub-devices is zero, expose root device.
2025-03-12 09:13:22.678681: I itex/core/devices/gpu/itex_gpu_runtime.cc:155] number of sub-devices is zero, expose root device.
2025-03-12 09:13:22.678683: I itex/core/devices/gpu/itex_gpu_runtime.cc:155] number of sub-devices is zero, expose root device.
2025-03-12 09:13:22.678685: I itex/core/devices/gpu/itex_gpu_runtime.cc:155] number of sub-devices is zero, expose root device.
2025-03-12 09:13:22.678686: I itex/core/devices/gpu/itex_gpu_runtime.cc:155] number of sub-devices is zero, expose root device.
2025-03-12 09:13:22.678688: I itex/core/devices/gpu/itex_gpu_runtime.cc:155] number of sub-devices is zero, expose root device.
2025-03-12 09:13:22.678689: I itex/core/devices/gpu/itex_gpu_runtime.cc:155] number of sub-devices is zero, expose root device.
2025-03-12 09:13:22.678691: I itex/core/devices/gpu/itex_gpu_runtime.cc:155] number of sub-devices is zero, expose root device.
2025-03-12 09:13:22.678692: I itex/core/devices/gpu/itex_gpu_runtime.cc:155] number of sub-devices is zero, expose root device.
2025-03-12 09:13:22.678694: I itex/core/devices/gpu/itex_gpu_runtime.cc:155] number of sub-devices is zero, expose root device.
2025-03-12 09:13:22.678696: I itex/core/devices/gpu/itex_gpu_runtime.cc:155] number of sub-devices is zero, expose root device.
[2025-03-12 09:13:23][I][Megatron-DeepSpeed/pretrain_gpt_alcf:164:__main__] Updating WandB run.config: [misty-deluge-1402](https://wandb.ai/aurora_gpt/AuroraGPT/runs/by2ozrz3)
[2025-03-12 09:13:23][I][ezpz/dist:125] `model_provider`, {'pre_process': True, 'post_process': True}) took: dt=9.6073s
> number of parameters on (tensor, pipeline) model parallel rank (0, 0)=5933109248
[2025-03-12 09:13:23][I][ezpz/dist:125] `get_model`((<function model_provider at 0x152f6ca89ea0>, <ModelType.encoder_or_decoder: 1>)) took: dt=9.6094s
[2025-03-12 09:13:23][I][megatron/utils:368] > learning rate decay style: cosine
[2025-03-12 09:13:23][I][ezpz/dist:125] `get_optimizer_param_scheduler`((AdamW (
Parameter Group 0
amsgrad: False
betas: (0.9, 0.95)
capturable: False
differentiable: False
eps: 1e-05
foreach: None
fused: None
lr: 0.0
lr_mult: 1.0
maximize: False
name: wd_no_scale_lr
wd_mult: 1.0
weight_decay: 0.1
Parameter Group 1
amsgrad: False
betas: (0.9, 0.95)
capturable: False
differentiable: False
eps: 1e-05
foreach: None
fused: None
lr: 0.0
lr_mult: 1.0
maximize: False
name: no_wd_no_scale_lr
wd_mult: 0.0
weight_decay: 0.0
),)) took: dt=0.0005s
[2025-03-12 09:13:23][I][megatron/training:692] DeepSpeed is enabled.
[2025-03-12 09:13:23][I][megatron/training:747] Did NOT catch: ('args.data_efficiency_curriculum_learning' and 'build_train_valid_test_datasets_provider is not None')
[2025-03-12 09:13:23][I][megatron/training:756] Calling 'deepspeed.initialize'...
[2025-03-12 09:13:23][I][megatron/training:757] Wrapped with: profiler=<megatron.utils.Profile object at 0x152e64716b00>
[2025-03-12 09:13:23,094] [INFO] [logging.py:128:log_dist] [Rank 0] DeepSpeed info: version=0.16.4+9f1ac32c, git-hash=9f1ac32c, git-branch=saforem2/ucp-bug
[2025-03-12 09:13:23,095] [INFO] [config.py:734:__init__] Config mesh_device None world_size = 24
[2025-03-12 09:13:30,773] [INFO] [logging.py:128:log_dist] [Rank 0] DeepSpeed Flops Profiler Enabled: True
[2025-03-12 09:13:30,774] [INFO] [logging.py:128:log_dist] [Rank 0] Using client Optimizer as basic optimizer
[2025-03-12 09:13:30,775] [INFO] [logging.py:128:log_dist] [Rank 0] Removing param_group that has no 'params' in the basic Optimizer
[2025-03-12 09:13:30,778] [INFO] [logging.py:128:log_dist] [Rank 0] DeepSpeed Basic Optimizer = AdamW
[2025-03-12 09:13:30,778] [INFO] [utils.py:59:is_zero_supported_optimizer] Checking ZeRO support for optimizer=AdamW type=<class 'torch.optim.adamw.AdamW'>
[2025-03-12 09:13:30,778] [INFO] [logging.py:128:log_dist] [Rank 0] Creating torch.bfloat16 ZeRO stage 1 optimizer
[2025-03-12 09:13:30,778] [INFO] [stage_1_and_2.py:149:__init__] Reduce bucket size 500000000
[2025-03-12 09:13:30,778] [INFO] [stage_1_and_2.py:150:__init__] Allgather bucket size 500000000
[2025-03-12 09:13:30,778] [INFO] [stage_1_and_2.py:151:__init__] CPU Offload: False
[2025-03-12 09:13:30,779] [INFO] [stage_1_and_2.py:152:__init__] Round robin gradient partitioning: False
[2025-03-12 09:13:34,939] [INFO] [utils.py:781:see_memory_usage] Before initializing optimizer states
[2025-03-12 09:13:34,940] [INFO] [utils.py:782:see_memory_usage] MA 11.97 GB Max_MA 11.97 GB CA 11.97 GB Max_CA 12 GB
[2025-03-12 09:13:34,940] [INFO] [utils.py:789:see_memory_usage] CPU Virtual Memory: used = 51.36 GB, percent = 4.5%
[2025-03-12 09:13:35,132] [INFO] [utils.py:781:see_memory_usage] After initializing optimizer states
[2025-03-12 09:13:35,132] [INFO] [utils.py:782:see_memory_usage] MA 11.97 GB Max_MA 12.9 GB CA 12.9 GB Max_CA 13 GB
[2025-03-12 09:13:35,132] [INFO] [utils.py:789:see_memory_usage] CPU Virtual Memory: used = 51.36 GB, percent = 4.5%
[2025-03-12 09:13:35,132] [INFO] [stage_1_and_2.py:550:__init__] optimizer state initialized
[2025-03-12 09:13:35,305] [INFO] [utils.py:781:see_memory_usage] After initializing ZeRO optimizer
[2025-03-12 09:13:35,306] [INFO] [utils.py:782:see_memory_usage] MA 11.97 GB Max_MA 11.97 GB CA 12.9 GB Max_CA 13 GB
[2025-03-12 09:13:35,306] [INFO] [utils.py:789:see_memory_usage] CPU Virtual Memory: used = 51.36 GB, percent = 4.5%
[2025-03-12 09:13:35,307] [INFO] [logging.py:128:log_dist] [Rank 0] DeepSpeed Final Optimizer = DeepSpeedZeroOptimizer
[2025-03-12 09:13:35,307] [INFO] [logging.py:128:log_dist] [Rank 0] DeepSpeed using client LR scheduler
[2025-03-12 09:13:35,307] [INFO] [logging.py:128:log_dist] [Rank 0] DeepSpeed LR Scheduler = <megatron.optimizer_param_scheduler.OptimizerParamScheduler object at 0x152e64716bc0>
[2025-03-12 09:13:35,307] [INFO] [logging.py:128:log_dist] [Rank 0] step=0, skipped=0, lr=[0.0, 0.0], mom=[(0.9, 0.95), (0.9, 0.95)]
[2025-03-12 09:13:35,308] [INFO] [config.py:1001:print] DeepSpeedEngine configuration:
[2025-03-12 09:13:35,308] [INFO] [config.py:1005:print] activation_checkpointing_config {
"partition_activations": false,
"contiguous_memory_optimization": false,
"cpu_checkpointing": false,
"number_checkpoints": null,
"synchronize_checkpoint_boundary": false,
"profile": false
}
[2025-03-12 09:13:35,308] [INFO] [config.py:1005:print] aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True, 'use_gds': False}
[2025-03-12 09:13:35,308] [INFO] [config.py:1005:print] amp_enabled .................. False
[2025-03-12 09:13:35,308] [INFO] [config.py:1005:print] amp_params ................... False
[2025-03-12 09:13:35,309] [INFO] [config.py:1005:print] autotuning_config ............ {
"enabled": false,
"start_step": null,
"end_step": null,
"metric_path": null,
"arg_mappings": null,
"metric": "throughput",
"model_info": null,
"results_dir": "autotuning_results",
"exps_dir": "autotuning_exps",
"overwrite": true,
"fast": true,
"start_profile_step": 3,
"end_profile_step": 5,
"tuner_type": "gridsearch",
"tuner_early_stopping": 5,
"tuner_num_trials": 50,
"model_info_path": null,
"mp_size": 1,
"max_train_batch_size": null,
"min_train_batch_size": 1,
"max_train_micro_batch_size_per_gpu": 1.024000e+03,
"min_train_micro_batch_size_per_gpu": 1,
"num_tuning_micro_batch_sizes": 3
}
[2025-03-12 09:13:35,309] [INFO] [config.py:1005:print] bfloat16_enabled ............. True
[2025-03-12 09:13:35,309] [INFO] [config.py:1005:print] bfloat16_immediate_grad_update False
[2025-03-12 09:13:35,309] [INFO] [config.py:1005:print] checkpoint_parallel_write_pipeline False
[2025-03-12 09:13:35,309] [INFO] [config.py:1005:print] checkpoint_tag_validation_enabled True
[2025-03-12 09:13:35,309] [INFO] [config.py:1005:print] checkpoint_tag_validation_fail False
[2025-03-12 09:13:35,309] [INFO] [config.py:1005:print] comms_config ................. <deepspeed.comm.config.DeepSpeedCommsConfig object at 0x152e64745780>
[2025-03-12 09:13:35,309] [INFO] [config.py:1005:print] communication_data_type ...... None
[2025-03-12 09:13:35,309] [INFO] [config.py:1005:print] compression_config ........... {'weight_quantization': {'shared_parameters': {'enabled': False, 'quantizer_kernel': False, 'schedule_offset': 0, 'quantize_groups': 1, 'quantize_verbose': False, 'quantization_type': 'symmetric', 'quantize_weight_in_forward': False, 'rounding': 'nearest', 'fp16_mixed_quantize': False, 'quantize_change_ratio': 0.001}, 'different_groups': {}}, 'activation_quantization': {'shared_parameters': {'enabled': False, 'quantization_type': 'symmetric', 'range_calibration': 'dynamic', 'schedule_offset': 1000}, 'different_groups': {}}, 'sparse_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'row_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'head_pruning': {'shared_parameters': {'enabled': False, 'method': 'topk', 'schedule_offset': 1000}, 'different_groups': {}}, 'channel_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {
}}, 'layer_reduction': {'enabled': False}}
[2025-03-12 09:13:35,309] [INFO] [config.py:1005:print] curriculum_enabled_legacy .... False
[2025-03-12 09:13:35,309] [INFO] [config.py:1005:print] curriculum_params_legacy ..... False
[2025-03-12 09:13:35,309] [INFO] [config.py:1005:print] data_efficiency_config ....... {'enabled': False, 'seed': 1234, 'data_sampling': {'enabled': False, 'num_epochs': 1000, 'num_workers': 0, 'curriculum_learning': {'enabled': False}}, 'data_routing': {'enabled': False, 'random_ltd': {'enabled': False, 'layer_token_lr_schedule': {'enabled': False}}}}
[2025-03-12 09:13:35,309] [INFO] [config.py:1005:print] data_efficiency_enabled ...... False
[2025-03-12 09:13:35,309] [INFO] [config.py:1005:print] dataloader_drop_last ......... False
[2025-03-12 09:13:35,309] [INFO] [config.py:1005:print] disable_allgather ............ False
[2025-03-12 09:13:35,310] [INFO] [config.py:1005:print] dump_state ................... False
[2025-03-12 09:13:35,310] [INFO] [config.py:1005:print] dynamic_loss_scale_args ...... None
[2025-03-12 09:13:35,310] [INFO] [config.py:1005:print] eigenvalue_enabled ........... False
[2025-03-12 09:13:35,310] [INFO] [config.py:1005:print] eigenvalue_gas_boundary_resolution 1
[2025-03-12 09:13:35,310] [INFO] [config.py:1005:print] eigenvalue_layer_name ........ bert.encoder.layer
[2025-03-12 09:13:35,310] [INFO] [config.py:1005:print] eigenvalue_layer_num ......... 0
[2025-03-12 09:13:35,310] [INFO] [config.py:1005:print] eigenvalue_max_iter .......... 100
[2025-03-12 09:13:35,310] [INFO] [config.py:1005:print] eigenvalue_stability ......... 1e-06
[2025-03-12 09:13:35,310] [INFO] [config.py:1005:print] eigenvalue_tol ............... 0.01
[2025-03-12 09:13:35,310] [INFO] [config.py:1005:print] eigenvalue_verbose ........... False
[2025-03-12 09:13:35,310] [INFO] [config.py:1005:print] elasticity_enabled ........... False
[2025-03-12 09:13:35,310] [INFO] [config.py:1005:print] flops_profiler_config ........ {
"enabled": true,
"recompute_fwd_factor": 0.0,
"profile_step": 2,
"module_depth": -1,
"top_modules": 1,
"detailed": true,
"output_file": null
}
[2025-03-12 09:13:35,310] [INFO] [config.py:1005:print] fp16_auto_cast ............... None
[2025-03-12 09:13:35,310] [INFO] [config.py:1005:print] fp16_enabled ................. False
[2025-03-12 09:13:35,310] [INFO] [config.py:1005:print] fp16_master_weights_and_gradients False
[2025-03-12 09:13:35,310] [INFO] [config.py:1005:print] global_rank .................. 0
[2025-03-12 09:13:35,310] [INFO] [config.py:1005:print] grad_accum_dtype ............. None
[2025-03-12 09:13:35,311] [INFO] [config.py:1005:print] gradient_accumulation_steps .. 16
[2025-03-12 09:13:35,311] [INFO] [config.py:1005:print] gradient_clipping ............ 1.0
[2025-03-12 09:13:35,311] [INFO] [config.py:1005:print] gradient_predivide_factor .... 1.0
[2025-03-12 09:13:35,311] [INFO] [config.py:1005:print] graph_harvesting ............. False
[2025-03-12 09:13:35,311] [INFO] [config.py:1005:print] hybrid_engine ................ enabled=False max_out_tokens=512 inference_tp_size=1 release_inference_cache=False pin_parameters=True tp_gather_partition_size=8
[2025-03-12 09:13:35,311] [INFO] [config.py:1005:print] initial_dynamic_scale ........ 1
[2025-03-12 09:13:35,311] [INFO] [config.py:1005:print] load_universal_checkpoint .... False
[2025-03-12 09:13:35,311] [INFO] [config.py:1005:print] loss_scale ................... 1.0
[2025-03-12 09:13:35,311] [INFO] [config.py:1005:print] memory_breakdown ............. False
[2025-03-12 09:13:35,311] [INFO] [config.py:1005:print] mics_hierarchial_params_gather False
[2025-03-12 09:13:35,311] [INFO] [config.py:1005:print] mics_shard_size .............. -1
[2025-03-12 09:13:35,311] [INFO] [config.py:1005:print] monitor_config ............... tensorboard=TensorBoardConfig(enabled=False, output_path='', job_name='DeepSpeedJobName') comet=CometConfig(enabled=False, samples_log_interval=100, project=None, workspace=None, api_key=None, experiment_name=None, experiment_key=None, online=None, mode=None) wandb=WandbConfig(enabled=False, group=None, team=None, project='deepspeed') csv_monitor=CSVConfig(enabled=False, output_path='', job_name='DeepSpeedJobName')
[2025-03-12 09:13:35,311] [INFO] [config.py:1005:print] nebula_config ................ {
"enabled": false,
"persistent_storage_path": null,
"persistent_time_interval": 100,
"num_of_version_in_retention": 2,
"enable_nebula_load": true,
"load_path": null
}
[2025-03-12 09:13:35,311] [INFO] [config.py:1005:print] optimizer_legacy_fusion ...... False
[2025-03-12 09:13:35,311] [INFO] [config.py:1005:print] optimizer_name ............... None
[2025-03-12 09:13:35,311] [INFO] [config.py:1005:print] optimizer_params ............. None
[2025-03-12 09:13:35,311] [INFO] [config.py:1005:print] pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0, 'pipe_partitioned': True, 'grad_partitioned': True}
[2025-03-12 09:13:35,312] [INFO] [config.py:1005:print] pld_enabled .................. False
[2025-03-12 09:13:35,312] [INFO] [config.py:1005:print] pld_params ................... False
[2025-03-12 09:13:35,312] [INFO] [config.py:1005:print] prescale_gradients ........... False
[2025-03-12 09:13:35,312] [INFO] [config.py:1005:print] scheduler_name ............... None
[2025-03-12 09:13:35,312] [INFO] [config.py:1005:print] scheduler_params ............. None
[2025-03-12 09:13:35,312] [INFO] [config.py:1005:print] seq_parallel_communication_data_type torch.float32
[2025-03-12 09:13:35,312] [INFO] [config.py:1005:print] sparse_attention ............. None
[2025-03-12 09:13:35,312] [INFO] [config.py:1005:print] sparse_gradients_enabled ..... False
[2025-03-12 09:13:35,312] [INFO] [config.py:1005:print] steps_per_print .............. 1
[2025-03-12 09:13:35,312] [INFO] [config.py:1005:print] tensor_parallel_config ....... dtype=torch.float16 autotp_size=0 tensor_parallel=TPConfig(tp_size=1, tp_grain_size=1, mpu=None, tp_group=None) injection_policy_tuple=None keep_module_on_host=False replace_with_kernel_inject=False
[2025-03-12 09:13:35,312] [INFO] [config.py:1005:print] timers_config ................ enabled=True synchronized=True
[2025-03-12 09:13:35,312] [INFO] [config.py:1005:print] train_batch_size ............. 384
[2025-03-12 09:13:35,312] [INFO] [config.py:1005:print] train_micro_batch_size_per_gpu 1
[2025-03-12 09:13:35,312] [INFO] [config.py:1005:print] use_data_before_expert_parallel_ False
[2025-03-12 09:13:35,312] [INFO] [config.py:1005:print] use_node_local_storage ....... False
[2025-03-12 09:13:35,312] [INFO] [config.py:1005:print] wall_clock_breakdown ......... True
[2025-03-12 09:13:35,312] [INFO] [config.py:1005:print] weight_quantization_config ... None
[2025-03-12 09:13:35,313] [INFO] [config.py:1005:print] world_size ................... 24
[2025-03-12 09:13:35,313] [INFO] [config.py:1005:print] zero_allow_untested_optimizer True
[2025-03-12 09:13:35,313] [INFO] [config.py:1005:print] zero_config .................. stage=1 contiguous_gradients=True reduce_scatter=True reduce_bucket_size=500000000 use_multi_rank_bucket_allreduce=True allgather_partitions=True allgather_bucket_size=500000000 overlap_comm=False load_from_fp32_weights=True elastic_checkpoint=False offload_param=None offload_optimizer=None sub_group_size=1000000000 cpu_offload_param=None cpu_offload_use_pin_memory=None cpu_offload=None prefetch_bucket_size=50000000 param_persistence_threshold=100000 model_persistence_threshold=9223372036854775807 max_live_parameters=1000000000 max_reuse_distance=1000000000 gather_16bit_weights_on_model_save=False module_granularity_threshold=0 use_all_reduce_for_fetch_params=False stage3_gather_fp16_weights_on_model_save=False ignore_unused_parameters=True legacy_stage1=False round_robin_gradients=False zero_hpz_partition_size=1 zero_quantized_weights=False zero_quantized_nontrainable_weights=False zero_quantized_gradients=False zeropp_loco_param=None mics_shard_size=-1 mics_hierarchical_params_gather=False memory_efficient_line
ar=True pipeline_loading_checkpoint=False override_module_apply=True
[2025-03-12 09:13:35,313] [INFO] [config.py:1005:print] zero_enabled ................. True
[2025-03-12 09:13:35,313] [INFO] [config.py:1005:print] zero_force_ds_cpu_optimizer .. False
[2025-03-12 09:13:35,313] [INFO] [config.py:1005:print] zero_optimization_stage ...... 1
[2025-03-12 09:13:35,313] [INFO] [config.py:991:print_user_config] json = {
"train_batch_size": 384,
"train_micro_batch_size_per_gpu": 1,
"gradient_clipping": 1.0,
"steps_per_print": 1,
"gradient_accumulation_steps": 16,
"zero_force_ds_cpu_optimizer": false,
"zero_allow_untested_optimizer": true,
"wall_clock_breakdown": false,
"zero_optimization": {
"stage": 1
},
"fp16": {
"enabled": false,
"loss_scale": 0,
"loss_scale_window": 1000,
"hysteresis": 2,
"min_loss_scale": 1
},
"bfloat16": {
"enabled": true,
"loss_scale": 1.0
},
"comms_logger": {
"enabled": false,
"verbose": false,
"debug": false
},
"flops_profiler": {
"enabled": true,
"profile_step": 2,
"module_depth": -1,
"top_modules": 1,
"detailed": true,
"output_file": null
}
}
[2025-03-12 09:13:35][I][megatron/training:767] 'deepspeed.initialize' took: 12.21954s
[2025-03-12 09:13:35][I][megatron/checkpointing:568] Unable to load lr_state_dict from lr_state_dict_fp=PosixPath('checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/lr_state_dict_0_of_24.yaml'), but strict=False. Returning empty dictionary: lr_state_dict={}
[2025-03-12 09:13:35,320] [WARNING] [engine.py:2909:load_checkpoint] Unable to find latest file at checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2025-03-12 09:13:35,320] [WARNING] [engine.py:2909:load_checkpoint] Unable to find latest file at checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2025-03-12 09:13:35,320] [WARNING] [engine.py:2909:load_checkpoint] Unable to find latest file at checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2025-03-12 09:13:35,320] [WARNING] [engine.py:2909:load_checkpoint] Unable to find latest file at checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2025-03-12 09:13:35,320] [WARNING] [engine.py:2909:load_checkpoint] Unable to find latest file at checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2025-03-12 09:13:35,320] [WARNING] [engine.py:2909:load_checkpoint] Unable to find latest file at checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2025-03-12 09:13:35,320] [WARNING] [engine.py:2909:load_checkpoint] Unable to find latest file at checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2025-03-12 09:13:35,320] [WARNING] [engine.py:2909:load_checkpoint] Unable to find latest file at checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2025-03-12 09:13:35,320] [WARNING] [engine.py:2909:load_checkpoint] Unable to find latest file at checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2025-03-12 09:13:35,320] [WARNING] [engine.py:2909:load_checkpoint] Unable to find latest file at checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2025-03-12 09:13:35,320] [WARNING] [engine.py:2909:load_checkpoint] Unable to find latest file at checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2025-03-12 09:13:35,320] [WARNING] [engine.py:2909:load_checkpoint] Unable to find latest file at checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2025-03-12 09:13:35][I][megatron/utils:368] WARNING: could not find the metadata file checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash
[2025-03-12 09:13:35,320] [WARNING] [engine.py:2909:load_checkpoint] Unable to find latest file at checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2025-03-12 09:13:35][I][megatron/utils:368] will not load any checkpoints and will start from random
[2025-03-12 09:13:35,321] [WARNING] [engine.py:2909:load_checkpoint] Unable to find latest file at checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2025-03-12 09:13:35,321] [WARNING] [engine.py:2909:load_checkpoint] Unable to find latest file at checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2025-03-12 09:13:35,322] [WARNING] [engine.py:2909:load_checkpoint] Unable to find latest file at checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2025-03-12 09:13:35,323] [WARNING] [engine.py:2909:load_checkpoint] Unable to find latest file at checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2025-03-12 09:13:35,324] [WARNING] [engine.py:2909:load_checkpoint] Unable to find latest file at checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2025-03-12 09:13:35,325] [WARNING] [engine.py:2909:load_checkpoint] Unable to find latest file at checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2025-03-12 09:13:35,325] [WARNING] [engine.py:2909:load_checkpoint] Unable to find latest file at checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2025-03-12 09:13:35,326] [WARNING] [engine.py:2909:load_checkpoint] Unable to find latest file at checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2025-03-12 09:13:35,327] [WARNING] [engine.py:2909:load_checkpoint] Unable to find latest file at checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2025-03-12 09:13:35,328] [WARNING] [engine.py:2909:load_checkpoint] Unable to find latest file at checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2025-03-12 09:13:35,329] [WARNING] [engine.py:2909:load_checkpoint] Unable to find latest file at checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
(min, max) time across ranks (ms):
load-checkpoint ................................: (15.33, 15.42)
[2025-03-12 09:13:44][I][ezpz/dist:125] `setup_model_and_optimizer`((<function model_provider at 0x152f6ca89ea0>, <ModelType.encoder_or_decoder: 1>), {'teacher': False, 'data_post_process': <function data_post_process at 0x152f6ca8a290>, 'build_train_valid_test_datasets_provider': <function train_valid_test_datasets_provider at 0x152f6ca8ab90>}) took: dt=31.3965s
[2025-03-12 09:13:44][I][megatron/training:96] [after model, optimizer, and learning rate scheduler are built] datetime=2025-03-12 09:13:44
[2025-03-12 09:13:44][I][megatron/training:1510] > building train, validation, and test datasets ...
[2025-03-12 09:13:44][I][megatron/training:1493] > datasets target sizes (minimum size):
[2025-03-12 09:13:44][I][megatron/training:1494] train: 488280960
[2025-03-12 09:13:44][I][megatron/training:1495] validation: 97658880
[2025-03-12 09:13:44][I][megatron/training:1496] test: 7680
[2025-03-12 09:13:44][I][Megatron-DeepSpeed/pretrain_gpt_alcf:465:__main__] > building train, validation, and test datasets for GPT ...
[2025-03-12 09:13:44][I][Megatron-DeepSpeed/pretrain_gpt_alcf:468:__main__] Reading datasets from /flare/datascience/foremans/projects/argonne-lcf/Megatron-DeepSpeed/ALCF/data-lists/aurora/dolma.txt
[2025-03-12 09:13:44][W][utils/_logger:68:megatron.data.gpt_dataset] > WARNING: could not find index map files, building on rank 0
using:
number of documents: 76719
number of epochs: 3
sequence length: 4096
total number of samples: 1076724
[2025-03-12 09:13:45][W][utils/_logger:68:megatron.data.gpt_dataset] > WARNING: could not find index map files, building on rank 0
using:
number of documents: 16107
number of epochs: 3
sequence length: 4096
total number of samples: 230638
[2025-03-12 09:13:45][W][utils/_logger:68:megatron.data.gpt_dataset] > WARNING: could not find index map files, building on rank 0
using:
number of documents: 13889
number of epochs: 3
sequence length: 4096
total number of samples: 202946
[2025-03-12 09:13:45][W][utils/_logger:68:megatron.data.gpt_dataset] > WARNING: could not find index map files, building on rank 0
using:
number of documents: 12255
number of epochs: 3
sequence length: 4096
total number of samples: 183947
[2025-03-12 09:13:45][W][utils/_logger:68:megatron.data.gpt_dataset] > WARNING: could not find index map files, building on rank 0
using:
number of documents: 13559
number of epochs: 3
sequence length: 4096
total number of samples: 191776
[2025-03-12 09:13:45][W][utils/_logger:68:megatron.data.gpt_dataset] > WARNING: could not find index map files, building on rank 0
using:
number of documents: 1535650
number of epochs: 1
sequence length: 4096
total number of samples: 232658
[2025-03-12 09:13:45][W][utils/_logger:68:megatron.data.gpt_dataset] > WARNING: could not find index map files, building on rank 0
using:
number of documents: 1536175
number of epochs: 1
sequence length: 4096
total number of samples: 232428
[2025-03-12 09:13:46][W][utils/_logger:68:megatron.data.gpt_dataset] > WARNING: could not find index map files, building on rank 0
using:
number of documents: 1485173
number of epochs: 1
sequence length: 4096
total number of samples: 226616
[2025-03-12 09:13:46][W][utils/_logger:68:megatron.data.gpt_dataset] > WARNING: could not find index map files, building on rank 0
using:
number of documents: 1452918
number of epochs: 1
sequence length: 4096
total number of samples: 221729
[2025-03-12 09:13:46][W][utils/_logger:68:megatron.data.gpt_dataset] > WARNING: could not find index map files, building on rank 0
using:
number of documents: 1427747
number of epochs: 1
sequence length: 4096
total number of samples: 218369
[2025-03-12 09:13:46][W][utils/_logger:68:megatron.data.gpt_dataset] > WARNING: could not find index map files, building on rank 0
using:
number of documents: 1418426
number of epochs: 1
sequence length: 4096
total number of samples: 216980
[2025-03-12 09:13:46][W][utils/_logger:68:megatron.data.gpt_dataset] > WARNING: could not find index map files, building on rank 0
using:
number of documents: 1394724
number of epochs: 1
sequence length: 4096
total number of samples: 214265
[2025-03-12 09:13:47][W][utils/_logger:68:megatron.data.gpt_dataset] > WARNING: could not find index map files, building on rank 0
using:
number of documents: 1377335
number of epochs: 1
sequence length: 4096
total number of samples: 211248
[2025-03-12 09:13:47][W][utils/_logger:68:megatron.data.gpt_dataset] > WARNING: could not find index map files, building on rank 0
using:
number of documents: 1950844
number of epochs: 1
sequence length: 4096
total number of samples: 429672
[2025-03-12 09:13:47][W][utils/_logger:68:megatron.data.gpt_dataset] > WARNING: could not find index map files, building on rank 0
using:
number of documents: 1386132
number of epochs: 1
sequence length: 4096
total number of samples: 300551
[2025-03-12 09:13:47][W][utils/_logger:68:megatron.data.gpt_dataset] > WARNING: could not find index map files, building on rank 0
using:
number of documents: 1452549
number of epochs: 1
sequence length: 4096
total number of samples: 297764
[2025-03-12 09:13:48][W][utils/_logger:68:megatron.data.gpt_dataset] > WARNING: could not find index map files, building on rank 0
using:
number of documents: 1202980
number of epochs: 1
sequence length: 4096
total number of samples: 243814
[2025-03-12 09:13:48][W][utils/_logger:68:megatron.data.gpt_dataset] > WARNING: could not find index map files, building on rank 0
using:
number of documents: 2283343
number of epochs: 1
sequence length: 4096
total number of samples: 475304
[2025-03-12 09:13:48][W][utils/_logger:68:megatron.data.gpt_dataset] > WARNING: could not find index map files, building on rank 0
using:
number of documents: 1524141
number of epochs: 1
sequence length: 4096
total number of samples: 296513
[2025-03-12 09:13:48][W][utils/_logger:68:megatron.data.gpt_dataset] > WARNING: could not find index map files, building on rank 0
using:
number of documents: 1567022
number of epochs: 1
sequence length: 4096
total number of samples: 324782
[2025-03-12 09:13:49][W][utils/_logger:68:megatron.data.gpt_dataset] > WARNING: could not find index map files, building on rank 0
using:
number of documents: 1284147
number of epochs: 1
sequence length: 4096
total number of samples: 254471
[2025-03-12 09:13:49][W][utils/_logger:68:megatron.data.gpt_dataset] > WARNING: could not find index map files, building on rank 0
using:
number of documents: 1923644
number of epochs: 1
sequence length: 4096
total number of samples: 396586
[2025-03-12 09:13:49][W][utils/_logger:68:megatron.data.gpt_dataset] > WARNING: could not find index map files, building on rank 0
using:
number of documents: 1673241
number of epochs: 1
sequence length: 4096
total number of samples: 336782
[2025-03-12 09:13:49][W][utils/_logger:68:megatron.data.gpt_dataset] > WARNING: could not find index map files, building on rank 0
using:
number of documents: 2103473
number of epochs: 1
sequence length: 4096
total number of samples: 429258
[2025-03-12 09:13:50][W][utils/_logger:68:megatron.data.gpt_dataset] > WARNING: could not find index map files, building on rank 0
using:
number of documents: 1987726
number of epochs: 1
sequence length: 4096
total number of samples: 437762
[2025-03-12 09:13:50][W][utils/_logger:68:megatron.data.gpt_dataset] > WARNING: could not find index map files, building on rank 0
using:
number of documents: 1940226
number of epochs: 1
sequence length: 4096
total number of samples: 419521
[2025-03-12 09:13:50][W][utils/_logger:68:megatron.data.gpt_dataset] > WARNING: could not find index map files, building on rank 0
using:
number of documents: 1810202
number of epochs: 1
sequence length: 4096
total number of samples: 387484
[2025-03-12 09:13:51][W][utils/_logger:68:megatron.data.gpt_dataset] > WARNING: could not find index map files, building on rank 0
using:
number of documents: 1834404
number of epochs: 1
sequence length: 4096
total number of samples: 405380
[2025-03-12 09:13:51][W][utils/_logger:68:megatron.data.gpt_dataset] > WARNING: could not find index map files, building on rank 0
using:
number of documents: 1529006
number of epochs: 1
sequence length: 4096
total number of samples: 291698
[2025-03-12 09:13:51][W][utils/_logger:68:megatron.data.gpt_dataset] > WARNING: could not find index map files, building on rank 0
using:
number of documents: 1439716
number of epochs: 1
sequence length: 4096
total number of samples: 283551
[2025-03-12 09:13:51][W][utils/_logger:68:megatron.data.gpt_dataset] > WARNING: could not find index map files, building on rank 0
using:
number of documents: 1952651
number of epochs: 1
sequence length: 4096
total number of samples: 361801
[2025-03-12 09:13:52][W][utils/_logger:68:megatron.data.gpt_dataset] > WARNING: could not find index map files, building on rank 0
using:
number of documents: 1938814
number of epochs: 1
sequence length: 4096
total number of samples: 371649
[2025-03-12 09:13:52][W][utils/_logger:68:megatron.data.gpt_dataset] > WARNING: could not find index map files, building on rank 0
using:
number of documents: 1225643
number of epochs: 1
sequence length: 4096
total number of samples: 263051
[2025-03-12 09:13:52][W][utils/_logger:68:megatron.data.gpt_dataset] > WARNING: could not find index map files, building on rank 0
using:
number of documents: 1189447
number of epochs: 1
sequence length: 4096
total number of samples: 253377
[2025-03-12 09:13:52][W][utils/_logger:68:megatron.data.gpt_dataset] > WARNING: could not find index map files, building on rank 0
using:
number of documents: 1216006
number of epochs: 1
sequence length: 4096
total number of samples: 241829
[2025-03-12 09:13:52][W][utils/_logger:68:megatron.data.gpt_dataset] > WARNING: could not find index map files, building on rank 0
using:
number of documents: 1500532
number of epochs: 1
sequence length: 4096
total number of samples: 296694
[2025-03-12 09:13:53][W][utils/_logger:68:megatron.data.gpt_dataset] > WARNING: could not find index map files, building on rank 0
using:
number of documents: 1485856
number of epochs: 1
sequence length: 4096
total number of samples: 290219
[2025-03-12 09:13:53][W][utils/_logger:68:megatron.data.gpt_dataset] > WARNING: could not find index map files, building on rank 0
using:
number of documents: 1738037
number of epochs: 1
sequence length: 4096
total number of samples: 328277
[2025-03-12 09:13:53][W][utils/_logger:68:megatron.data.gpt_dataset] > WARNING: could not find index map files, building on rank 0
using:
number of documents: 1584844
number of epochs: 1
sequence length: 4096
total number of samples: 308502
[2025-03-12 09:13:53][W][utils/_logger:68:megatron.data.gpt_dataset] > WARNING: could not find index map files, building on rank 0
using:
number of documents: 1504078
number of epochs: 1
sequence length: 4096
total number of samples: 304730
[2025-03-12 09:13:54][W][utils/_logger:68:megatron.data.gpt_dataset] > WARNING: could not find index map files, building on rank 0
using:
number of documents: 1932012
number of epochs: 1
sequence length: 4096
total number of samples: 278333
[2025-03-12 09:13:54][W][utils/_logger:68:megatron.data.gpt_dataset] > WARNING: could not find index map files, building on rank 0
using:
number of documents: 1326699
number of epochs: 1
sequence length: 4096
total number of samples: 188164
[2025-03-12 09:13:54][W][utils/_logger:68:megatron.data.gpt_dataset] > WARNING: could not find index map files, building on rank 0
using:
number of documents: 1478427
number of epochs: 1
sequence length: 4096
total number of samples: 216844
[2025-03-12 09:13:54][W][utils/_logger:68:megatron.data.gpt_dataset] > WARNING: could not find index map files, building on rank 0
using:
number of documents: 1870780
number of epochs: 1
sequence length: 4096
total number of samples: 292973
[2025-03-12 09:13:55][W][utils/_logger:68:megatron.data.gpt_dataset] > WARNING: could not find index map files, building on rank 0
using:
number of documents: 1346694
number of epochs: 1
sequence length: 4096
total number of samples: 197333
[2025-03-12 09:13:55][W][utils/_logger:68:megatron.data.gpt_dataset] > WARNING: could not find index map files, building on rank 0
using:
number of documents: 1867538
number of epochs: 1
sequence length: 4096
total number of samples: 285221
[2025-03-12 09:13:55][W][utils/_logger:68:megatron.data.gpt_dataset] > WARNING: could not find index map files, building on rank 0
using:
number of documents: 2180198
number of epochs: 1
sequence length: 4096
total number of samples: 344117
[2025-03-12 09:13:55][W][utils/_logger:68:megatron.data.gpt_dataset] > WARNING: could not find index map files, building on rank 0
using:
number of documents: 1878140
number of epochs: 1
sequence length: 4096
total number of samples: 319221
[2025-03-12 09:13:56][W][utils/_logger:68:megatron.data.gpt_dataset] > WARNING: could not find index map files, building on rank 0
using:
number of documents: 1083117
number of epochs: 1
sequence length: 4096
total number of samples: 181387
[2025-03-12 09:13:56][W][utils/_logger:68:megatron.data.gpt_dataset] > WARNING: could not find index map files, building on rank 0
using:
number of documents: 1047206
number of epochs: 1
sequence length: 4096
total number of samples: 200809
[2025-03-12 09:13:56][W][utils/_logger:68:megatron.data.gpt_dataset] > WARNING: could not find index map files, building on rank 0
using:
number of documents: 907049
number of epochs: 1
sequence length: 4096
total number of samples: 188732
[2025-03-12 09:13:56][W][utils/_logger:68:megatron.data.gpt_dataset] > WARNING: could not find index map files, building on rank 0
using:
number of documents: 992535
number of epochs: 1
sequence length: 4096
total number of samples: 181178
[2025-03-12 09:13:56][W][utils/_logger:68:megatron.data.gpt_dataset] > WARNING: could not find index map files, building on rank 0
using:
number of documents: 1351495
number of epochs: 1
sequence length: 4096
total number of samples: 223210
[2025-03-12 09:13:56][W][utils/_logger:68:megatron.data.gpt_dataset] > WARNING: could not find index map files, building on rank 0
using:
number of documents: 1639484
number of epochs: 1
sequence length: 4096
total number of samples: 296497
[2025-03-12 09:13:57][W][utils/_logger:68:megatron.data.gpt_dataset] > WARNING: could not find index map files, building on rank 0
using:
number of documents: 1701336
number of epochs: 1
sequence length: 4096
total number of samples: 274972
[2025-03-12 09:13:57][W][utils/_logger:68:megatron.data.gpt_dataset] > WARNING: could not find index map files, building on rank 0
using:
number of documents: 1080033
number of epochs: 1
sequence length: 4096
total number of samples: 175149
[2025-03-12 09:13:57][W][utils/_logger:68:megatron.data.gpt_dataset] > WARNING: could not find index map files, building on rank 0
using:
number of documents: 1892876
number of epochs: 1
sequence length: 4096
total number of samples: 331007
[2025-03-12 09:13:57][W][utils/_logger:68:megatron.data.gpt_dataset] > WARNING: could not find index map files, building on rank 0
using:
number of documents: 1898716
number of epochs: 1
sequence length: 4096
total number of samples: 328440
[2025-03-12 09:13:58][W][utils/_logger:68:megatron.data.gpt_dataset] > WARNING: could not find index map files, building on rank 0
using:
number of documents: 2323171
number of epochs: 3
sequence length: 4096
total number of samples: 1234953
[2025-03-12 09:13:58][W][utils/_logger:68:megatron.data.gpt_dataset] > WARNING: could not find index map files, building on rank 0
using:
number of documents: 1338962
number of epochs: 1
sequence length: 4096
total number of samples: 251413
[2025-03-12 09:13:58][W][utils/_logger:68:megatron.data.gpt_dataset] > WARNING: could not find index map files, building on rank 0
using:
number of documents: 1344419
number of epochs: 1
sequence length: 4096
total number of samples: 255855
[2025-03-12 09:13:59][W][utils/_logger:68:megatron.data.gpt_dataset] > WARNING: could not find index map files, building on rank 0
using:
number of documents: 1340310
number of epochs: 1
sequence length: 4096
total number of samples: 254109
[2025-03-12 09:13:59][W][utils/_logger:68:megatron.data.gpt_dataset] > WARNING: could not find index map files, building on rank 0
using:
number of documents: 1335427
number of epochs: 1
sequence length: 4096
total number of samples: 249938
[2025-03-12 09:13:59][W][utils/_logger:68:megatron.data.gpt_dataset] > WARNING: could not find index map files, building on rank 0
using:
number of documents: 1341637
number of epochs: 1
sequence length: 4096
total number of samples: 254494
[2025-03-12 09:13:59][W][utils/_logger:68:megatron.data.gpt_dataset] > WARNING: could not find index map files, building on rank 0
using:
number of documents: 1318060
number of epochs: 1
sequence length: 4096
total number of samples: 249232
[2025-03-12 09:14:00][W][utils/_logger:68:megatron.data.gpt_dataset] > WARNING: could not find index map files, building on rank 0
using:
number of documents: 1325614
number of epochs: 1
sequence length: 4096
total number of samples: 252013
[2025-03-12 09:14:00][W][utils/_logger:68:megatron.data.gpt_dataset] > WARNING: could not find index map files, building on rank 0
using:
number of documents: 1319093
number of epochs: 1
sequence length: 4096
total number of samples: 253530
[2025-03-12 09:14:00][W][utils/_logger:68:megatron.data.gpt_dataset] > WARNING: could not find index map files, building on rank 0
using:
number of documents: 1314383
number of epochs: 1
sequence length: 4096
total number of samples: 250341
[2025-03-12 09:14:00][W][utils/_logger:68:megatron.data.gpt_dataset] > WARNING: could not find index map files, building on rank 0
using:
number of documents: 1323808
number of epochs: 1
sequence length: 4096
total number of samples: 253309
[2025-03-12 09:14:00][W][utils/_logger:68:megatron.data.gpt_dataset] > WARNING: could not find index map files, building on rank 0
using:
number of documents: 1299771
number of epochs: 1
sequence length: 4096
total number of samples: 246119
[2025-03-12 09:14:01][W][utils/_logger:68:megatron.data.gpt_dataset] > WARNING: could not find index map files, building on rank 0
using:
number of documents: 1302356
number of epochs: 1
sequence length: 4096
total number of samples: 252107
[2025-03-12 09:14:01][W][utils/_logger:68:megatron.data.gpt_dataset] > WARNING: could not find index map files, building on rank 0
using:
number of documents: 1306591
number of epochs: 1
sequence length: 4096
total number of samples: 250995
[2025-03-12 09:14:01][W][utils/_logger:68:megatron.data.gpt_dataset] > WARNING: could not find index map files, building on rank 0
using:
number of documents: 1303414
number of epochs: 1
sequence length: 4096
total number of samples: 248234
[2025-03-12 09:14:01][W][utils/_logger:68:megatron.data.gpt_dataset] > WARNING: could not find index map files, building on rank 0
using:
number of documents: 1298642
number of epochs: 1
sequence length: 4096
total number of samples: 250193
[2025-03-12 09:14:01][W][utils/_logger:68:megatron.data.gpt_dataset] > WARNING: could not find index map files, building on rank 0
using:
number of documents: 1309217
number of epochs: 1
sequence length: 4096
total number of samples: 250386
[2025-03-12 09:14:02][W][utils/_logger:68:megatron.data.gpt_dataset] > WARNING: could not find index map files, building on rank 0
using:
number of documents: 1284962
number of epochs: 1
sequence length: 4096
total number of samples: 247510
[2025-03-12 09:14:02][W][utils/_logger:68:megatron.data.gpt_dataset] > WARNING: could not find index map files, building on rank 0
using:
number of documents: 1290348
number of epochs: 1
sequence length: 4096
total number of samples: 247609
[2025-03-12 09:14:02][W][utils/_logger:68:megatron.data.gpt_dataset] > WARNING: could not find index map files, building on rank 0
using:
number of documents: 1298179
number of epochs: 1
sequence length: 4096
total number of samples: 251943
[2025-03-12 09:14:02][W][utils/_logger:68:megatron.data.gpt_dataset] > WARNING: could not find index map files, building on rank 0
using:
number of documents: 1291877
number of epochs: 1
sequence length: 4096
total number of samples: 248300
[2025-03-12 09:14:02][W][utils/_logger:68:megatron.data.gpt_dataset] > WARNING: could not find index map files, building on rank 0
using:
number of documents: 1299821
number of epochs: 1
sequence length: 4096
total number of samples: 256596
[2025-03-12 09:14:03][W][utils/_logger:68:megatron.data.gpt_dataset] > WARNING: could not find index map files, building on rank 0
using:
number of documents: 6151
number of epochs: 3
sequence length: 4096
total number of samples: 6638
[2025-03-12 09:14:03][W][utils/_logger:68:megatron.data.gpt_dataset] > WARNING: could not find index map files, building on rank 0
using:
number of documents: 6321
number of epochs: 3
sequence length: 4096
total number of samples: 7220
[2025-03-12 09:14:03][W][utils/_logger:68:megatron.data.gpt_dataset] > WARNING: could not find index map files, building on rank 0
using:
number of documents: 23226
number of epochs: 3
sequence length: 4096
total number of samples: 29261
[2025-03-12 09:14:03][W][utils/_logger:68:megatron.data.gpt_dataset] > WARNING: could not find index map files, building on rank 0
using:
number of documents: 26873
number of epochs: 3
sequence length: 4096
total number of samples: 35083
[2025-03-12 09:14:03][W][utils/_logger:68:megatron.data.gpt_dataset] > WARNING: could not find index map files, building on rank 0
using:
number of documents: 12268
number of epochs: 3
sequence length: 4096
total number of samples: 15612
[2025-03-12 09:14:03][W][utils/_logger:68:megatron.data.gpt_dataset] > WARNING: could not find index map files, building on rank 0
using:
number of documents: 8603
number of epochs: 3
sequence length: 4096
total number of samples: 9954
[2025-03-12 09:14:03][W][utils/_logger:68:megatron.data.gpt_dataset] > WARNING: could not find index map files, building on rank 0
using:
number of documents: 4452
number of epochs: 3
sequence length: 4096
total number of samples: 5007
[2025-03-12 09:14:03][W][utils/_logger:68:megatron.data.gpt_dataset] > WARNING: could not find index map files, building on rank 0
using:
number of documents: 9868
number of epochs: 3
sequence length: 4096
total number of samples: 10321
[2025-03-12 09:14:03][W][utils/_logger:68:megatron.data.gpt_dataset] > WARNING: could not find index map files, building on rank 0
using:
number of documents: 7576
number of epochs: 3
sequence length: 4096
total number of samples: 8769
[2025-03-12 09:14:03][W][utils/_logger:68:megatron.data.gpt_dataset] > WARNING: could not find index map files, building on rank 0
using:
number of documents: 3397
number of epochs: 3
sequence length: 4096
total number of samples: 3502
[2025-03-12 09:14:03][W][utils/_logger:68:megatron.data.gpt_dataset] > WARNING: could not find index map files, building on rank 0
using:
number of documents: 493755
number of epochs: 3
sequence length: 4096
total number of samples: 841284
[2025-03-12 09:14:03][W][utils/_logger:68:megatron.data.gpt_dataset] > WARNING: could not find index map files, building on rank 0
using:
number of documents: 488661
number of epochs: 3
sequence length: 4096
total number of samples: 2550288
[2025-03-12 09:14:04][W][utils/_logger:68:megatron.data.gpt_dataset] > WARNING: could not find index map files, building on rank 0
using:
number of documents: 4974886
number of epochs: 2
sequence length: 4096
total number of samples: 549391
[2025-03-12 09:14:05][W][utils/_logger:68:megatron.data.gpt_dataset] > WARNING: could not find index map files, building on rank 0
using:
number of documents: 4992910
number of epochs: 2
sequence length: 4096
total number of samples: 551776
[2025-03-12 09:14:05][W][utils/_logger:68:megatron.data.gpt_dataset] > WARNING: could not find index map files, building on rank 0
using:
number of documents: 4937663
number of epochs: 2
sequence length: 4096
total number of samples: 542815
[2025-03-12 09:14:06][W][utils/_logger:68:megatron.data.gpt_dataset] > WARNING: could not find index map files, building on rank 0
using:
number of documents: 967341
number of epochs: 3
sequence length: 4096
total number of samples: 521329
[2025-03-12 09:14:06][W][utils/_logger:68:megatron.data.gpt_dataset] > WARNING: could not find index map files, building on rank 0
using:
number of documents: 3264225
number of epochs: 2
sequence length: 4096
total number of samples: 3815008
[2025-03-12 09:14:07][W][utils/_logger:68:megatron.data.gpt_dataset] > WARNING: could not find index map files, building on rank 0
using:
number of documents: 5060412
number of epochs: 2
sequence length: 4096
total number of samples: 2823789
[2025-03-12 09:14:08][W][utils/_logger:68:megatron.data.gpt_dataset] > WARNING: could not find index map files, building on rank 0
using:
number of documents: 494774
number of epochs: 3
sequence length: 4096
total number of samples: 207587
[2025-03-12 09:14:08][W][utils/_logger:68:megatron.data.gpt_dataset] > WARNING: could not find index map files, building on rank 0
using:
number of documents: 1390327
number of epochs: 3
sequence length: 4096
total number of samples: 177457
[2025-03-12 09:14:08][W][utils/_logger:68:megatron.data.gpt_dataset] > WARNING: could not find index map files, building on rank 0
using:
number of documents: 198000
number of epochs: 3
sequence length: 4096
total number of samples: 156655
> building indices for blendable datasets ...
> sample ratios:
dataset 0, input: 0.108499, achieved: 0.108499
dataset 1, input: 0.103053, achieved: 0.103053
dataset 2, input: 0.085475, achieved: 0.085475
dataset 3, input: 0.0433843, achieved: 0.0433843
dataset 4, input: 0.0113768, achieved: 0.0113768
dataset 5, input: 0.0527751, achieved: 0.0527751
dataset 6, input: 0.00885526, achieved: 0.00885526
dataset 7, input: 0.0852543, achieved: 0.0852543
dataset 8, input: 0.0730516, achieved: 0.0730516
dataset 9, input: 0.0799137, achieved: 0.0799137
dataset 10, input: 0.0413844, achieved: 0.0413844
dataset 11, input: 0.0496325, achieved: 0.0496325
dataset 12, input: 0.011625, achieved: 0.011625
dataset 13, input: 0.032061, achieved: 0.032061
dataset 14, input: 0.106373, achieved: 0.106373
dataset 15, input: 0.107286, achieved: 0.107286
[2025-03-12 09:14:09][I][data/gpt_dataset:191:megatron.data.gpt_dataset] [BuildConcatDataset] Caught args.shuffle_sample_in_corpus=True across 8376602 samples
> building indices for blendable datasets ...
> sample ratios:
dataset 0, input: 0.00859629, achieved: 0.00859629
dataset 1, input: 0.00880482, achieved: 0.00880482
dataset 2, input: 0.0105313, achieved: 0.0105313
dataset 3, input: 0.0097168, achieved: 0.0097168
dataset 4, input: 0.00944725, achieved: 0.00944725
dataset 5, input: 0.0101287, achieved: 0.0101287
dataset 6, input: 0.0105225, achieved: 0.0105225
dataset 7, input: 0.0102593, achieved: 0.0102593
dataset 8, input: 0.0105674, achieved: 0.0105674
dataset 9, input: 0.00843697, achieved: 0.00843697
dataset 10, input: 0.0102051, achieved: 0.0102051
dataset 11, input: 0.00864062, achieved: 0.00864062
dataset 12, input: 0.012604, achieved: 0.012604
dataset 13, input: 0.0093038, achieved: 0.0093038
dataset 14, input: 0.0111696, achieved: 0.0111696
dataset 15, input: 0.0101548, achieved: 0.0101548
dataset 16, input: 0.0107164, achieved: 0.0107164
dataset 17, input: 0.0110761, achieved: 0.0110761
dataset 18, input: 0.0103199, achieved: 0.0103199
dataset 19, input: 0.0107663, achieved: 0.0107663
dataset 20, input: 0.0116292, achieved: 0.0116292
dataset 21, input: 0.00938725, achieved: 0.00938725
dataset 22, input: 0.0101135, achieved: 0.0101135
dataset 23, input: 0.00983307, achieved: 0.00983307
dataset 24, input: 0.00962867, achieved: 0.00962867
dataset 25, input: 0.00957125, achieved: 0.00957125
dataset 26, input: 0.0097747, achieved: 0.0097747
dataset 27, input: 0.00901967, achieved: 0.00901967
dataset 28, input: 0.0103566, achieved: 0.0103566
dataset 29, input: 0.00999056, achieved: 0.00999056
dataset 30, input: 0.0124184, achieved: 0.0124184
dataset 31, input: 0.00891079, achieved: 0.00891079
dataset 32, input: 0.00931397, achieved: 0.00931397
dataset 33, input: 0.0114225, achieved: 0.0114225
dataset 34, input: 0.0119184, achieved: 0.0119184
dataset 35, input: 0.0103449, achieved: 0.0103449
dataset 36, input: 0.00920292, achieved: 0.00920292
dataset 37, input: 0.0100794, achieved: 0.0100794
dataset 38, input: 0.00899384, achieved: 0.00899384
dataset 39, input: 0.0100108, achieved: 0.0100108
dataset 40, input: 0.0094962, achieved: 0.0094962
dataset 41, input: 0.00916875, achieved: 0.00916875
dataset 42, input: 0.0105867, achieved: 0.0105867
dataset 43, input: 0.0110166, achieved: 0.0110166
dataset 44, input: 0.00956528, achieved: 0.00956528
dataset 45, input: 0.0100959, achieved: 0.0100959
dataset 46, input: 0.0111119, achieved: 0.0111119
dataset 47, input: 0.00861405, achieved: 0.00861405
dataset 48, input: 0.00969287, achieved: 0.00969287
dataset 49, input: 0.00888462, achieved: 0.00888462
dataset 50, input: 0.0106551, achieved: 0.0106551
dataset 51, input: 0.0107086, achieved: 0.0107086
dataset 52, input: 0.0105182, achieved: 0.0105182
dataset 53, input: 0.0105936, achieved: 0.0105936
dataset 54, input: 0.0101075, achieved: 0.0101075
dataset 55, input: 0.0106141, achieved: 0.0106141
dataset 56, input: 0.00844348, achieved: 0.00844348
dataset 57, input: 0.0100399, achieved: 0.0100399
dataset 58, input: 0.00954325, achieved: 0.00954325
dataset 59, input: 0.0104015, achieved: 0.0104015
dataset 60, input: 0.011547, achieved: 0.011547
dataset 61, input: 0.00886638, achieved: 0.00886638
dataset 62, input: 0.0115073, achieved: 0.0115073
dataset 63, input: 0.00804098, achieved: 0.00804098
dataset 64, input: 0.0102777, achieved: 0.0102777
dataset 65, input: 0.00969355, achieved: 0.00969355
dataset 66, input: 0.00880428, achieved: 0.00880428
dataset 67, input: 0.0101621, achieved: 0.0101621
dataset 68, input: 0.0106685, achieved: 0.0106685
dataset 69, input: 0.010303, achieved: 0.010303
dataset 70, input: 0.00776017, achieved: 0.00776017
dataset 71, input: 0.0101559, achieved: 0.0101559
dataset 72, input: 0.0117694, achieved: 0.0117694
dataset 73, input: 0.00965538, achieved: 0.00965538
dataset 74, input: 0.00980263, achieved: 0.00980263
dataset 75, input: 0.00957104, achieved: 0.00957104
dataset 76, input: 0.0102657, achieved: 0.0102657
dataset 77, input: 0.0101735, achieved: 0.0101735
dataset 78, input: 0.00952515, achieved: 0.00952515
dataset 79, input: 0.0095482, achieved: 0.0095482
dataset 80, input: 0.00878903, achieved: 0.00878903
dataset 81, input: 0.00989416, achieved: 0.00989416
dataset 82, input: 0.0107253, achieved: 0.0107253
dataset 83, input: 0.0105408, achieved: 0.0105408
dataset 84, input: 0.0103657, achieved: 0.0103657
dataset 85, input: 0.0113504, achieved: 0.0113504
dataset 86, input: 0.00890882, achieved: 0.00890882
dataset 87, input: 0.0101038, achieved: 0.0101038
dataset 88, input: 0.00923078, achieved: 0.00923078
dataset 89, input: 0.00903187, achieved: 0.00903187
dataset 90, input: 0.00932956, achieved: 0.00932956
dataset 91, input: 0.0107644, achieved: 0.0107644
dataset 92, input: 0.010269, achieved: 0.010269
dataset 93, input: 0.0113082, achieved: 0.0113082
dataset 94, input: 0.010295, achieved: 0.010295
dataset 95, input: 0.00908536, achieved: 0.00908536
dataset 96, input: 0.00956054, achieved: 0.00956054
dataset 97, input: 0.0103095, achieved: 0.0103095
dataset 98, input: 0.00869723, achieved: 0.00869723
dataset 99, input: 0.00959599, achieved: 0.00959599
[2025-03-12 09:14:11][I][data/gpt_dataset:191:megatron.data.gpt_dataset] [BuildConcatDataset] Caught args.shuffle_sample_in_corpus=True across 14750323 samples
> building indices for blendable datasets ...
> sample ratios:
dataset 0, input: 0.430653, achieved: 0.430653
dataset 1, input: 0.430584, achieved: 0.430584
dataset 2, input: 0.138763, achieved: 0.138763
[2025-03-12 09:14:11][I][data/gpt_dataset:191:megatron.data.gpt_dataset] [BuildConcatDataset] Caught args.shuffle_sample_in_corpus=True across 3535268 samples
> building indices for blendable datasets ...
> sample ratios:
dataset 0, input: 0.00616816, achieved: 0.00616816
dataset 1, input: 0.00616445, achieved: 0.00616445
dataset 2, input: 0.00616806, achieved: 0.00616806
dataset 3, input: 0.00616984, achieved: 0.00616984
dataset 4, input: 0.00616639, achieved: 0.00616639
dataset 5, input: 0.00616795, achieved: 0.00616795
dataset 6, input: 0.00617955, achieved: 0.00617955
dataset 7, input: 0.00615845, achieved: 0.00615845
dataset 8, input: 0.0061702, achieved: 0.0061702
dataset 9, input: 0.00617057, achieved: 0.00617057
dataset 10, input: 0.00615918, achieved: 0.00615918
dataset 11, input: 0.00617542, achieved: 0.00617542
dataset 12, input: 0.00617307, achieved: 0.00617307
dataset 13, input: 0.00617412, achieved: 0.00617412
dataset 14, input: 0.00617736, achieved: 0.00617736
dataset 15, input: 0.00616968, achieved: 0.00616968
dataset 16, input: 0.00618812, achieved: 0.00618812
dataset 17, input: 0.00618927, achieved: 0.00618927
dataset 18, input: 0.00615944, achieved: 0.00615944
dataset 19, input: 0.00615667, achieved: 0.00615667
dataset 20, input: 0.00614784, achieved: 0.00614784
dataset 21, input: 0.0061538, achieved: 0.0061538
dataset 22, input: 0.0061561, achieved: 0.0061561
dataset 23, input: 0.00617475, achieved: 0.00617475
dataset 24, input: 0.00617266, achieved: 0.00617266
dataset 25, input: 0.00615751, achieved: 0.00615751
dataset 26, input: 0.00617198, achieved: 0.00617198
dataset 27, input: 0.00617448, achieved: 0.00617448
dataset 28, input: 0.00617276, achieved: 0.00617276
dataset 29, input: 0.00616289, achieved: 0.00616289
dataset 30, input: 0.00618148, achieved: 0.00618148
dataset 31, input: 0.00605089, achieved: 0.00605089
dataset 32, input: 0.00601652, achieved: 0.00601652
dataset 33, input: 0.00600649, achieved: 0.00600649
dataset 34, input: 0.00600017, achieved: 0.00600017
dataset 35, input: 0.0060207, achieved: 0.0060207
dataset 36, input: 0.00600299, achieved: 0.00600299
dataset 37, input: 0.00600388, achieved: 0.00600388
dataset 38, input: 0.00600984, achieved: 0.00600984
dataset 39, input: 0.00599234, achieved: 0.00599234
dataset 40, input: 0.00601438, achieved: 0.00601438
dataset 41, input: 0.00599558, achieved: 0.00599558
dataset 42, input: 0.00599923, achieved: 0.00599923
dataset 43, input: 0.0059997, achieved: 0.0059997
dataset 44, input: 0.00598523, achieved: 0.00598523
dataset 45, input: 0.00599343, achieved: 0.00599343
dataset 46, input: 0.00599537, achieved: 0.00599537
dataset 47, input: 0.00598878, achieved: 0.00598878
dataset 48, input: 0.00600367, achieved: 0.00600367
dataset 49, input: 0.00600351, achieved: 0.00600351
dataset 50, input: 0.00598993, achieved: 0.00598993
dataset 51, input: 0.00598414, achieved: 0.00598414
dataset 52, input: 0.00599113, achieved: 0.00599113
dataset 53, input: 0.00599808, achieved: 0.00599808
dataset 54, input: 0.00598748, achieved: 0.00598748
dataset 55, input: 0.00598225, achieved: 0.00598225
dataset 56, input: 0.00599443, achieved: 0.00599443
dataset 57, input: 0.00597301, achieved: 0.00597301
dataset 58, input: 0.0059926, achieved: 0.0059926
dataset 59, input: 0.0059787, achieved: 0.0059787
dataset 60, input: 0.00597416, achieved: 0.00597416
dataset 61, input: 0.00598325, achieved: 0.00598325
dataset 62, input: 0.00594673, achieved: 0.00594673
dataset 63, input: 0.00590181, achieved: 0.00590181
dataset 64, input: 0.00589204, achieved: 0.00589204
dataset 65, input: 0.00587679, achieved: 0.00587679
dataset 66, input: 0.00587689, achieved: 0.00587689
dataset 67, input: 0.00587595, achieved: 0.00587595
dataset 68, input: 0.00586963, achieved: 0.00586963
dataset 69, input: 0.00587642, achieved: 0.00587642
dataset 70, input: 0.00586509, achieved: 0.00586509
dataset 71, input: 0.00586128, achieved: 0.00586128
dataset 72, input: 0.00587972, achieved: 0.00587972
dataset 73, input: 0.00587454, achieved: 0.00587454
dataset 74, input: 0.00587433, achieved: 0.00587433
dataset 75, input: 0.00587214, achieved: 0.00587214
dataset 76, input: 0.00588196, achieved: 0.00588196
dataset 77, input: 0.00587125, achieved: 0.00587125
dataset 78, input: 0.00588123, achieved: 0.00588123
dataset 79, input: 0.00588619, achieved: 0.00588619
dataset 80, input: 0.00585851, achieved: 0.00585851
dataset 81, input: 0.00587601, achieved: 0.00587601
dataset 82, input: 0.00585788, achieved: 0.00585788
dataset 83, input: 0.00585673, achieved: 0.00585673
dataset 84, input: 0.00586911, achieved: 0.00586911
dataset 85, input: 0.00585354, achieved: 0.00585354
dataset 86, input: 0.00586791, achieved: 0.00586791
dataset 87, input: 0.00584618, achieved: 0.00584618
dataset 88, input: 0.00585119, achieved: 0.00585119
dataset 89, input: 0.00587183, achieved: 0.00587183
dataset 90, input: 0.00586404, achieved: 0.00586404
dataset 91, input: 0.0058513, achieved: 0.0058513
dataset 92, input: 0.00586222, achieved: 0.00586222
dataset 93, input: 0.00584843, achieved: 0.00584843
dataset 94, input: 0.00579258, achieved: 0.00579258
dataset 95, input: 0.00578355, achieved: 0.00578355
dataset 96, input: 0.00579081, achieved: 0.00579081
dataset 97, input: 0.00578491, achieved: 0.00578491
dataset 98, input: 0.00578632, achieved: 0.00578632
dataset 99, input: 0.00576976, achieved: 0.00576976
dataset 100, input: 0.00578412, achieved: 0.00578412
dataset 101, input: 0.00578376, achieved: 0.00578376
dataset 102, input: 0.00576871, achieved: 0.00576871
dataset 103, input: 0.00577383, achieved: 0.00577383
dataset 104, input: 0.00577571, achieved: 0.00577571
dataset 105, input: 0.00575341, achieved: 0.00575341
dataset 106, input: 0.00575743, achieved: 0.00575743
dataset 107, input: 0.00575581, achieved: 0.00575581
dataset 108, input: 0.00575414, achieved: 0.00575414
dataset 109, input: 0.00576798, achieved: 0.00576798
dataset 110, input: 0.00575571, achieved: 0.00575571
dataset 111, input: 0.00576359, achieved: 0.00576359
dataset 112, input: 0.00575879, achieved: 0.00575879
dataset 113, input: 0.00575555, achieved: 0.00575555
dataset 114, input: 0.00576056, achieved: 0.00576056
dataset 115, input: 0.00575879, achieved: 0.00575879
dataset 116, input: 0.00574986, achieved: 0.00574986
dataset 117, input: 0.0057614, achieved: 0.0057614
dataset 118, input: 0.00575926, achieved: 0.00575926
dataset 119, input: 0.00573267, achieved: 0.00573267
dataset 120, input: 0.00575701, achieved: 0.00575701
dataset 121, input: 0.00574986, achieved: 0.00574986
dataset 122, input: 0.00575999, achieved: 0.00575999
dataset 123, input: 0.0057555, achieved: 0.0057555
dataset 124, input: 0.00575644, achieved: 0.00575644
dataset 125, input: 0.00571449, achieved: 0.00571449
dataset 126, input: 0.00570007, achieved: 0.00570007
dataset 127, input: 0.00568456, achieved: 0.00568456
dataset 128, input: 0.0057041, achieved: 0.0057041
dataset 129, input: 0.00567411, achieved: 0.00567411
dataset 130, input: 0.00567672, achieved: 0.00567672
dataset 131, input: 0.00568346, achieved: 0.00568346
dataset 132, input: 0.00567636, achieved: 0.00567636
dataset 133, input: 0.00566508, achieved: 0.00566508
dataset 134, input: 0.00567949, achieved: 0.00567949
dataset 135, input: 0.00567396, achieved: 0.00567396
dataset 136, input: 0.00568007, achieved: 0.00568007
dataset 137, input: 0.00567605, achieved: 0.00567605
dataset 138, input: 0.0056704, achieved: 0.0056704
dataset 139, input: 0.00566962, achieved: 0.00566962
dataset 140, input: 0.00565917, achieved: 0.00565917
dataset 141, input: 0.0056633, achieved: 0.0056633
dataset 142, input: 0.00566278, achieved: 0.00566278
dataset 143, input: 0.00565437, achieved: 0.00565437
dataset 144, input: 0.0056667, achieved: 0.0056667
dataset 145, input: 0.00567589, achieved: 0.00567589
dataset 146, input: 0.0056609, achieved: 0.0056609
dataset 147, input: 0.00565562, achieved: 0.00565562
dataset 148, input: 0.00565609, achieved: 0.00565609
dataset 149, input: 0.00565985, achieved: 0.00565985
dataset 150, input: 0.00566283, achieved: 0.00566283
dataset 151, input: 0.00566377, achieved: 0.00566377
dataset 152, input: 0.00566027, achieved: 0.00566027
dataset 153, input: 0.00566137, achieved: 0.00566137
dataset 154, input: 0.00565834, achieved: 0.00565834
dataset 155, input: 0.00565202, achieved: 0.00565202
dataset 156, input: 0.00566074, achieved: 0.00566074
dataset 157, input: 0.00563488, achieved: 0.00563488
dataset 158, input: 0.00561373, achieved: 0.00561373
dataset 159, input: 0.00561127, achieved: 0.00561127
dataset 160, input: 0.00560751, achieved: 0.00560751
dataset 161, input: 0.0056038, achieved: 0.0056038
dataset 162, input: 0.00560271, achieved: 0.00560271
dataset 163, input: 0.00561707, achieved: 0.00561707
dataset 164, input: 0.00561383, achieved: 0.00561383
dataset 165, input: 0.00560652, achieved: 0.00560652
dataset 166, input: 0.00558949, achieved: 0.00558949
dataset 167, input: 0.00560913, achieved: 0.00560913
dataset 168, input: 0.0056013, achieved: 0.0056013
dataset 169, input: 0.00559644, achieved: 0.00559644
dataset 170, input: 0.00186901, achieved: 0.00186901
[2025-03-12 09:14:16][I][data/gpt_dataset:191:megatron.data.gpt_dataset] [BuildConcatDataset] Caught args.shuffle_sample_in_corpus=True across 19143785 samples
> building indices for blendable datasets ...
> sample ratios:
dataset 0, input: 0.00105726, achieved: 0.00105726
dataset 1, input: 0.0010793, achieved: 0.0010793
dataset 2, input: 0.00109673, achieved: 0.00109673
dataset 3, input: 0.00109395, achieved: 0.00109395
dataset 4, input: 0.00109939, achieved: 0.00109939
dataset 5, input: 0.00107491, achieved: 0.00107491
dataset 6, input: 0.00108898, achieved: 0.00108898
dataset 7, input: 0.000683736, achieved: 0.000683736
dataset 8, input: 0.00110611, achieved: 0.00110611
dataset 9, input: 0.0011095, achieved: 0.0011095
dataset 10, input: 0.00108457, achieved: 0.00108457
dataset 11, input: 0.0011302, achieved: 0.0011302
dataset 12, input: 0.00113902, achieved: 0.00113902
dataset 13, input: 0.00108227, achieved: 0.00108227
dataset 14, input: 0.00110647, achieved: 0.00110647
dataset 15, input: 0.00112896, achieved: 0.00112896
dataset 16, input: 0.00111692, achieved: 0.00111692
dataset 17, input: 0.00118773, achieved: 0.00118773
dataset 18, input: 0.00117348, achieved: 0.00117348
dataset 19, input: 0.00110704, achieved: 0.00110704
dataset 20, input: 0.00110023, achieved: 0.00110023
dataset 21, input: 0.0011074, achieved: 0.0011074
dataset 22, input: 0.00125367, achieved: 0.00125367
dataset 23, input: 0.0012452, achieved: 0.0012452
dataset 24, input: 0.000702815, achieved: 0.000702815
dataset 25, input: 0.00111163, achieved: 0.00111163
dataset 26, input: 0.00116457, achieved: 0.00116457
dataset 27, input: 0.00114034, achieved: 0.00114034
dataset 28, input: 0.00108085, achieved: 0.00108085
dataset 29, input: 0.00114094, achieved: 0.00114094
dataset 30, input: 0.000691019, achieved: 0.000691019
dataset 31, input: 0.000760015, achieved: 0.000760015
dataset 32, input: 0.0010096, achieved: 0.0010096
dataset 33, input: 0.00129976, achieved: 0.00129976
dataset 34, input: 0.00073602, achieved: 0.00073602
dataset 35, input: 0.00109682, achieved: 0.00109682
dataset 36, input: 0.000696862, achieved: 0.000696862
dataset 37, input: 0.001137, achieved: 0.001137
dataset 38, input: 0.00115319, achieved: 0.00115319
dataset 39, input: 0.000760245, achieved: 0.000760245
dataset 40, input: 0.000760406, achieved: 0.000760406
dataset 41, input: 0.00118783, achieved: 0.00118783
dataset 42, input: 0.00119724, achieved: 0.00119724
dataset 43, input: 0.00127187, achieved: 0.00127187
dataset 44, input: 0.000751944, achieved: 0.000751944
dataset 45, input: 0.000882526, achieved: 0.000882526
dataset 46, input: 0.000878318, achieved: 0.000878318
dataset 47, input: 0.00129593, achieved: 0.00129593
dataset 48, input: 0.000647071, achieved: 0.000647071
dataset 49, input: 0.00107732, achieved: 0.00107732
dataset 50, input: 0.00110707, achieved: 0.00110707
dataset 51, input: 0.00127477, achieved: 0.00127477
dataset 52, input: 0.00135993, achieved: 0.00135993
dataset 53, input: 0.00111424, achieved: 0.00111424
dataset 54, input: 0.00112836, achieved: 0.00112836
dataset 55, input: 0.00107217, achieved: 0.00107217
dataset 56, input: 0.00111444, achieved: 0.00111444
dataset 57, input: 0.00109946, achieved: 0.00109946
dataset 58, input: 0.00109523, achieved: 0.00109523
dataset 59, input: 0.00110735, achieved: 0.00110735
dataset 60, input: 0.00110198, achieved: 0.00110198
dataset 61, input: 0.00112875, achieved: 0.00112875
dataset 62, input: 0.00110829, achieved: 0.00110829
dataset 63, input: 0.00111008, achieved: 0.00111008
dataset 64, input: 0.00110857, achieved: 0.00110857
dataset 65, input: 0.00109657, achieved: 0.00109657
dataset 66, input: 0.00110117, achieved: 0.00110117
dataset 67, input: 0.0010552, achieved: 0.0010552
dataset 68, input: 0.00102554, achieved: 0.00102554
dataset 69, input: 0.000947303, achieved: 0.000947303
dataset 70, input: 0.000869004, achieved: 0.000869004
dataset 71, input: 0.00111486, achieved: 0.00111486
dataset 72, input: 0.000803387, achieved: 0.000803387
dataset 73, input: 0.000821849, achieved: 0.000821849
dataset 74, input: 0.000815505, achieved: 0.000815505
dataset 75, input: 0.000801925, achieved: 0.000801925
dataset 76, input: 0.000820047, achieved: 0.000820047
dataset 77, input: 0.000797037, achieved: 0.000797037
dataset 78, input: 0.000825752, achieved: 0.000825752
dataset 79, input: 0.000809921, achieved: 0.000809921
dataset 80, input: 0.000809449, achieved: 0.000809449
dataset 81, input: 0.000808309, achieved: 0.000808309
dataset 82, input: 0.000798545, achieved: 0.000798545
dataset 83, input: 0.000805062, achieved: 0.000805062
dataset 84, input: 0.000795799, achieved: 0.000795799
dataset 85, input: 0.000741719, achieved: 0.000741719
dataset 86, input: 0.000721202, achieved: 0.000721202
dataset 87, input: 0.00073423, achieved: 0.00073423
dataset 88, input: 0.000713056, achieved: 0.000713056
dataset 89, input: 0.000717155, achieved: 0.000717155
dataset 90, input: 0.000723608, achieved: 0.000723608
dataset 91, input: 0.000740332, achieved: 0.000740332
dataset 92, input: 0.000739256, achieved: 0.000739256
dataset 93, input: 0.000734875, achieved: 0.000734875
dataset 94, input: 0.000711933, achieved: 0.000711933
dataset 95, input: 0.000912468, achieved: 0.000912468
dataset 96, input: 0.00105344, achieved: 0.00105344
dataset 97, input: 0.00107572, achieved: 0.00107572
dataset 98, input: 0.00102162, achieved: 0.00102162
dataset 99, input: 0.00103541, achieved: 0.00103541
dataset 100, input: 0.00104079, achieved: 0.00104079
dataset 101, input: 0.001014, achieved: 0.001014
dataset 102, input: 0.00102785, achieved: 0.00102785
dataset 103, input: 0.00101656, achieved: 0.00101656
dataset 104, input: 0.00102919, achieved: 0.00102919
dataset 105, input: 0.00103638, achieved: 0.00103638
dataset 106, input: 0.00102483, achieved: 0.00102483
dataset 107, input: 0.000989293, achieved: 0.000989293
dataset 108, input: 0.000995545, achieved: 0.000995545
dataset 109, input: 0.00100035, achieved: 0.00100035
dataset 110, input: 0.00102496, achieved: 0.00102496
dataset 111, input: 0.00106011, achieved: 0.00106011
dataset 112, input: 0.00107074, achieved: 0.00107074
dataset 113, input: 0.00106728, achieved: 0.00106728
dataset 114, input: 0.0010624, achieved: 0.0010624
dataset 115, input: 0.00096181, achieved: 0.00096181
dataset 116, input: 0.000943233, achieved: 0.000943233
dataset 117, input: 0.000940579, achieved: 0.000940579
dataset 118, input: 0.00138766, achieved: 0.00138766
dataset 119, input: 0.00090583, achieved: 0.00090583
dataset 120, input: 0.000908363, achieved: 0.000908363
dataset 121, input: 0.000912756, achieved: 0.000912756
dataset 122, input: 0.000880874, achieved: 0.000880874
dataset 123, input: 0.000877063, achieved: 0.000877063
dataset 124, input: 0.000879999, achieved: 0.000879999
dataset 125, input: 0.000859182, achieved: 0.000859182
dataset 126, input: 0.000854525, achieved: 0.000854525
dataset 127, input: 0.000853143, achieved: 0.000853143
dataset 128, input: 0.000825839, achieved: 0.000825839
dataset 129, input: 0.000809956, achieved: 0.000809956
dataset 130, input: 0.000803542, achieved: 0.000803542
dataset 131, input: 0.000804066, achieved: 0.000804066
dataset 132, input: 0.000789185, achieved: 0.000789185
dataset 133, input: 0.000771419, achieved: 0.000771419
dataset 134, input: 0.000765501, achieved: 0.000765501
dataset 135, input: 0.000777711, achieved: 0.000777711
dataset 136, input: 0.00121595, achieved: 0.00121595
dataset 137, input: 0.00134176, achieved: 0.00134176
dataset 138, input: 0.00134909, achieved: 0.00134909
dataset 139, input: 0.00132973, achieved: 0.00132973
dataset 140, input: 0.00131878, achieved: 0.00131878
dataset 141, input: 0.00130425, achieved: 0.00130425
dataset 142, input: 0.000865716, achieved: 0.000865716
dataset 143, input: 0.000821941, achieved: 0.000821941
dataset 144, input: 0.00077044, achieved: 0.00077044
dataset 145, input: 0.00115856, achieved: 0.00115856
dataset 146, input: 0.00105343, achieved: 0.00105343
dataset 147, input: 0.00103246, achieved: 0.00103246
dataset 148, input: 0.00103677, achieved: 0.00103677
dataset 149, input: 0.00104975, achieved: 0.00104975
dataset 150, input: 0.00101242, achieved: 0.00101242
dataset 151, input: 0.00100947, achieved: 0.00100947
dataset 152, input: 0.00100396, achieved: 0.00100396
dataset 153, input: 0.0013901, achieved: 0.0013901
dataset 154, input: 0.00128076, achieved: 0.00128076
dataset 155, input: 0.00127316, achieved: 0.00127316
dataset 156, input: 0.00125422, achieved: 0.00125422
dataset 157, input: 0.00122036, achieved: 0.00122036
dataset 158, input: 0.00121491, achieved: 0.00121491
dataset 159, input: 0.00118219, achieved: 0.00118219
dataset 160, input: 0.00121341, achieved: 0.00121341
dataset 161, input: 0.00122664, achieved: 0.00122664
dataset 162, input: 0.000977779, achieved: 0.000977779
dataset 163, input: 0.000962933, achieved: 0.000962933
dataset 164, input: 0.000937136, achieved: 0.000937136
dataset 165, input: 0.000958408, achieved: 0.000958408
dataset 166, input: 0.000948598, achieved: 0.000948598
dataset 167, input: 0.000971585, achieved: 0.000971585
dataset 168, input: 0.000976306, achieved: 0.000976306
dataset 169, input: 0.000953802, achieved: 0.000953802
dataset 170, input: 0.000938339, achieved: 0.000938339
dataset 171, input: 0.000944223, achieved: 0.000944223
dataset 172, input: 0.00140216, achieved: 0.00140216
dataset 173, input: 0.00141163, achieved: 0.00141163
dataset 174, input: 0.00141267, achieved: 0.00141267
dataset 175, input: 0.0014137, achieved: 0.0014137
dataset 176, input: 0.000784476, achieved: 0.000784476
dataset 177, input: 0.000802195, achieved: 0.000802195
dataset 178, input: 0.00078637, achieved: 0.00078637
dataset 179, input: 0.000774079, achieved: 0.000774079
dataset 180, input: 0.000788701, achieved: 0.000788701
dataset 181, input: 0.000790123, achieved: 0.000790123
dataset 182, input: 0.000754212, achieved: 0.000754212
dataset 183, input: 0.000732871, achieved: 0.000732871
dataset 184, input: 0.00106774, achieved: 0.00106774
dataset 185, input: 0.00118187, achieved: 0.00118187
dataset 186, input: 0.00123703, achieved: 0.00123703
dataset 187, input: 0.000771724, achieved: 0.000771724
dataset 188, input: 0.000780573, achieved: 0.000780573
dataset 189, input: 0.00076416, achieved: 0.00076416
dataset 190, input: 0.000742882, achieved: 0.000742882
dataset 191, input: 0.000734708, achieved: 0.000734708
dataset 192, input: 0.000724944, achieved: 0.000724944
dataset 193, input: 0.000728709, achieved: 0.000728709
dataset 194, input: 0.00071461, achieved: 0.00071461
dataset 195, input: 0.00107015, achieved: 0.00107015
dataset 196, input: 0.00137927, achieved: 0.00137927
dataset 197, input: 0.000925288, achieved: 0.000925288
dataset 198, input: 0.00135697, achieved: 0.00135697
dataset 199, input: 0.00131986, achieved: 0.00131986
dataset 200, input: 0.00122967, achieved: 0.00122967
dataset 201, input: 0.00124034, achieved: 0.00124034
dataset 202, input: 0.00137788, achieved: 0.00137788
dataset 203, input: 0.00136676, achieved: 0.00136676
dataset 204, input: 0.00135008, achieved: 0.00135008
dataset 205, input: 0.00130673, achieved: 0.00130673
dataset 206, input: 0.00127487, achieved: 0.00127487
dataset 207, input: 0.00127236, achieved: 0.00127236
dataset 208, input: 0.00125718, achieved: 0.00125718
dataset 209, input: 0.00126082, achieved: 0.00126082
dataset 210, input: 0.00125219, achieved: 0.00125219
dataset 211, input: 0.00120455, achieved: 0.00120455
dataset 212, input: 0.00119146, achieved: 0.00119146
dataset 213, input: 0.00117272, achieved: 0.00117272
dataset 214, input: 0.0011579, achieved: 0.0011579
dataset 215, input: 0.000945489, achieved: 0.000945489
dataset 216, input: 0.000947257, achieved: 0.000947257
dataset 217, input: 0.0013656, achieved: 0.0013656
dataset 218, input: 0.00133327, achieved: 0.00133327
dataset 219, input: 0.00131316, achieved: 0.00131316
dataset 220, input: 0.00128887, achieved: 0.00128887
dataset 221, input: 0.00139364, achieved: 0.00139364
dataset 222, input: 0.000997284, achieved: 0.000997284
dataset 223, input: 0.000999172, achieved: 0.000999172
dataset 224, input: 0.00137653, achieved: 0.00137653
dataset 225, input: 0.00136431, achieved: 0.00136431
dataset 226, input: 0.00135423, achieved: 0.00135423
dataset 227, input: 0.00135096, achieved: 0.00135096
dataset 228, input: 0.00131663, achieved: 0.00131663
dataset 229, input: 0.00111499, achieved: 0.00111499
dataset 230, input: 0.00110642, achieved: 0.00110642
dataset 231, input: 0.00110372, achieved: 0.00110372
dataset 232, input: 0.00107562, achieved: 0.00107562
dataset 233, input: 0.00104146, achieved: 0.00104146
dataset 234, input: 0.00101229, achieved: 0.00101229
dataset 235, input: 0.001015, achieved: 0.001015
dataset 236, input: 0.000999558, achieved: 0.000999558
dataset 237, input: 0.00101254, achieved: 0.00101254
dataset 238, input: 0.000983254, achieved: 0.000983254
dataset 239, input: 0.000964383, achieved: 0.000964383
dataset 240, input: 0.000960549, achieved: 0.000960549
dataset 241, input: 0.000944424, achieved: 0.000944424
dataset 242, input: 0.00131418, achieved: 0.00131418
dataset 243, input: 0.000830243, achieved: 0.000830243
dataset 244, input: 0.000810105, achieved: 0.000810105
dataset 245, input: 0.000771126, achieved: 0.000771126
dataset 246, input: 0.000749013, achieved: 0.000749013
dataset 247, input: 0.000757131, achieved: 0.000757131
dataset 248, input: 0.000729739, achieved: 0.000729739
dataset 249, input: 0.000752784, achieved: 0.000752784
dataset 250, input: 0.000713528, achieved: 0.000713528
dataset 251, input: 0.000729751, achieved: 0.000729751
dataset 252, input: 0.00120029, achieved: 0.00120029
dataset 253, input: 0.00139873, achieved: 0.00139873
dataset 254, input: 0.00135716, achieved: 0.00135716
dataset 255, input: 0.00131714, achieved: 0.00131714
dataset 256, input: 0.00128543, achieved: 0.00128543
dataset 257, input: 0.00125699, achieved: 0.00125699
dataset 258, input: 0.000819005, achieved: 0.000819005
dataset 259, input: 0.00123535, achieved: 0.00123535
dataset 260, input: 0.00127962, achieved: 0.00127962
dataset 261, input: 0.00127487, achieved: 0.00127487
dataset 262, input: 0.00125334, achieved: 0.00125334
dataset 263, input: 0.00124844, achieved: 0.00124844
dataset 264, input: 0.00122773, achieved: 0.00122773
dataset 265, input: 0.000823599, achieved: 0.000823599
dataset 266, input: 0.00121828, achieved: 0.00121828
dataset 267, input: 0.000811965, achieved: 0.000811965
dataset 268, input: 0.00132362, achieved: 0.00132362
dataset 269, input: 0.00139815, achieved: 0.00139815
dataset 270, input: 0.001256, achieved: 0.001256
dataset 271, input: 0.00108371, achieved: 0.00108371
dataset 272, input: 0.00107062, achieved: 0.00107062
dataset 273, input: 0.00105916, achieved: 0.00105916
dataset 274, input: 0.00102508, achieved: 0.00102508
dataset 275, input: 0.00111499, achieved: 0.00111499
dataset 276, input: 0.00109822, achieved: 0.00109822
dataset 277, input: 0.00117478, achieved: 0.00117478
dataset 278, input: 0.0011967, achieved: 0.0011967
dataset 279, input: 0.00114093, achieved: 0.00114093
dataset 280, input: 0.000779571, achieved: 0.000779571
dataset 281, input: 0.00123281, achieved: 0.00123281
dataset 282, input: 0.00062679, achieved: 0.00062679
dataset 283, input: 0.00125363, achieved: 0.00125363
dataset 284, input: 0.00109893, achieved: 0.00109893
dataset 285, input: 0.0012276, achieved: 0.0012276
dataset 286, input: 0.00127764, achieved: 0.00127764
dataset 287, input: 0.00117289, achieved: 0.00117289
dataset 288, input: 0.000738565, achieved: 0.000738565
dataset 289, input: 0.00106061, achieved: 0.00106061
dataset 290, input: 0.00123911, achieved: 0.00123911
dataset 291, input: 0.00130963, achieved: 0.00130963
dataset 292, input: 0.00122002, achieved: 0.00122002
dataset 293, input: 0.000671676, achieved: 0.000671676
dataset 294, input: 0.000733752, achieved: 0.000733752
dataset 295, input: 0.00113394, achieved: 0.00113394
dataset 296, input: 0.00123383, achieved: 0.00123383
dataset 297, input: 0.00115412, achieved: 0.00115412
dataset 298, input: 0.000686229, achieved: 0.000686229
dataset 299, input: 0.00125179, achieved: 0.00125179
dataset 300, input: 0.00123965, achieved: 0.00123965
dataset 301, input: 0.00107752, achieved: 0.00107752
dataset 302, input: 0.00115829, achieved: 0.00115829
dataset 303, input: 0.00119977, achieved: 0.00119977
dataset 304, input: 0.00117928, achieved: 0.00117928
dataset 305, input: 0.000645114, achieved: 0.000645114
dataset 306, input: 0.00123741, achieved: 0.00123741
dataset 307, input: 0.0012657, achieved: 0.0012657
dataset 308, input: 0.00114569, achieved: 0.00114569
dataset 309, input: 0.00119626, achieved: 0.00119626
dataset 310, input: 0.0012244, achieved: 0.0012244
dataset 311, input: 0.000677064, achieved: 0.000677064
dataset 312, input: 0.000732152, achieved: 0.000732152
dataset 313, input: 0.00120647, achieved: 0.00120647
dataset 314, input: 0.0012265, achieved: 0.0012265
dataset 315, input: 0.0011615, achieved: 0.0011615
dataset 316, input: 0.00121459, achieved: 0.00121459
dataset 317, input: 0.00119835, achieved: 0.00119835
dataset 318, input: 0.00127203, achieved: 0.00127203
dataset 319, input: 0.00110161, achieved: 0.00110161
dataset 320, input: 0.00109044, achieved: 0.00109044
dataset 321, input: 0.00119994, achieved: 0.00119994
dataset 322, input: 0.00109323, achieved: 0.00109323
dataset 323, input: 0.00118551, achieved: 0.00118551
dataset 324, input: 0.00115721, achieved: 0.00115721
dataset 325, input: 0.00123548, achieved: 0.00123548
dataset 326, input: 0.00118111, achieved: 0.00118111
dataset 327, input: 0.00118876, achieved: 0.00118876
dataset 328, input: 0.00107531, achieved: 0.00107531
dataset 329, input: 0.00107846, achieved: 0.00107846
dataset 330, input: 0.00124869, achieved: 0.00124869
dataset 331, input: 0.00110692, achieved: 0.00110692
dataset 332, input: 0.00102709, achieved: 0.00102709
dataset 333, input: 0.00117422, achieved: 0.00117422
dataset 334, input: 0.0011315, achieved: 0.0011315
dataset 335, input: 0.00111281, achieved: 0.00111281
dataset 336, input: 0.00110364, achieved: 0.00110364
dataset 337, input: 0.00121196, achieved: 0.00121196
dataset 338, input: 0.00119802, achieved: 0.00119802
dataset 339, input: 0.00115191, achieved: 0.00115191
dataset 340, input: 0.0011559, achieved: 0.0011559
dataset 341, input: 0.00119496, achieved: 0.00119496
dataset 342, input: 0.00104568, achieved: 0.00104568
dataset 343, input: 0.00107559, achieved: 0.00107559
dataset 344, input: 0.00109649, achieved: 0.00109649
dataset 345, input: 0.00113205, achieved: 0.00113205
dataset 346, input: 0.00101803, achieved: 0.00101803
dataset 347, input: 0.00109609, achieved: 0.00109609
dataset 348, input: 0.00106151, achieved: 0.00106151
dataset 349, input: 0.00119758, achieved: 0.00119758
dataset 350, input: 0.00130122, achieved: 0.00130122
dataset 351, input: 0.00127431, achieved: 0.00127431
dataset 352, input: 0.00124074, achieved: 0.00124074
dataset 353, input: 0.00125926, achieved: 0.00125926
dataset 354, input: 0.00121513, achieved: 0.00121513
dataset 355, input: 0.0012617, achieved: 0.0012617
dataset 356, input: 0.00125399, achieved: 0.00125399
dataset 357, input: 0.0012555, achieved: 0.0012555
dataset 358, input: 0.00118395, achieved: 0.00118395
dataset 359, input: 0.00124139, achieved: 0.00124139
dataset 360, input: 0.000609317, achieved: 0.000609317
dataset 361, input: 0.00107773, achieved: 0.00107773
dataset 362, input: 0.000908939, achieved: 0.000908939
dataset 363, input: 0.00089609, achieved: 0.00089609
dataset 364, input: 0.000916273, achieved: 0.000916273
dataset 365, input: 0.00115259, achieved: 0.00115259
dataset 366, input: 0.000930827, achieved: 0.000930827
dataset 367, input: 0.00108648, achieved: 0.00108648
dataset 368, input: 0.00108346, achieved: 0.00108346
dataset 369, input: 0.0010692, achieved: 0.0010692
dataset 370, input: 0.00108187, achieved: 0.00108187
dataset 371, input: 0.00107058, achieved: 0.00107058
dataset 372, input: 0.0010628, achieved: 0.0010628
dataset 373, input: 0.00105714, achieved: 0.00105714
dataset 374, input: 0.000961896, achieved: 0.000961896
dataset 375, input: 0.000869631, achieved: 0.000869631
dataset 376, input: 0.000964861, achieved: 0.000964861
dataset 377, input: 0.000934764, achieved: 0.000934764
dataset 378, input: 0.000975379, achieved: 0.000975379
dataset 379, input: 0.000934948, achieved: 0.000934948
dataset 380, input: 0.000880368, achieved: 0.000880368
dataset 381, input: 0.00091663, achieved: 0.00091663
dataset 382, input: 0.000851975, achieved: 0.000851975
dataset 383, input: 0.000893062, achieved: 0.000893062
dataset 384, input: 0.000926192, achieved: 0.000926192
dataset 385, input: 0.000934505, achieved: 0.000934505
dataset 386, input: 0.000911892, achieved: 0.000911892
dataset 387, input: 0.000905853, achieved: 0.000905853
dataset 388, input: 0.00111213, achieved: 0.00111213
dataset 389, input: 0.000974665, achieved: 0.000974665
dataset 390, input: 0.000943497, achieved: 0.000943497
dataset 391, input: 0.000927827, achieved: 0.000927827
dataset 392, input: 0.000950699, achieved: 0.000950699
dataset 393, input: 0.000920343, achieved: 0.000920343
dataset 394, input: 0.000930562, achieved: 0.000930562
dataset 395, input: 0.000935184, achieved: 0.000935184
dataset 396, input: 0.00091355, achieved: 0.00091355
dataset 397, input: 0.000896176, achieved: 0.000896176
dataset 398, input: 0.0008929, achieved: 0.0008929
dataset 399, input: 0.000873811, achieved: 0.000873811
dataset 400, input: 0.000873828, achieved: 0.000873828
dataset 401, input: 0.000937004, achieved: 0.000937004
dataset 402, input: 0.000879475, achieved: 0.000879475
dataset 403, input: 0.000877075, achieved: 0.000877075
dataset 404, input: 0.000863592, achieved: 0.000863592
dataset 405, input: 0.000869487, achieved: 0.000869487
dataset 406, input: 0.000827825, achieved: 0.000827825
dataset 407, input: 0.000860455, achieved: 0.000860455
dataset 408, input: 0.000857703, achieved: 0.000857703
dataset 409, input: 0.000894017, achieved: 0.000894017
dataset 410, input: 0.000883989, achieved: 0.000883989
dataset 411, input: 0.000877466, achieved: 0.000877466
dataset 412, input: 0.000880897, achieved: 0.000880897
dataset 413, input: 0.000841256, achieved: 0.000841256
dataset 414, input: 0.000850179, achieved: 0.000850179
dataset 415, input: 0.000808251, achieved: 0.000808251
dataset 416, input: 0.000844209, achieved: 0.000844209
dataset 417, input: 0.00080657, achieved: 0.00080657
dataset 418, input: 0.000799593, achieved: 0.000799593
dataset 419, input: 0.000804711, achieved: 0.000804711
dataset 420, input: 0.000806956, achieved: 0.000806956
dataset 421, input: 0.00077527, achieved: 0.00077527
dataset 422, input: 0.000757436, achieved: 0.000757436
dataset 423, input: 0.000966577, achieved: 0.000966577
dataset 424, input: 0.00113167, achieved: 0.00113167
dataset 425, input: 0.00111027, achieved: 0.00111027
dataset 426, input: 0.00109251, achieved: 0.00109251
dataset 427, input: 0.00107578, achieved: 0.00107578
dataset 428, input: 0.00107768, achieved: 0.00107768
dataset 429, input: 0.00107422, achieved: 0.00107422
dataset 430, input: 0.001056, achieved: 0.001056
dataset 431, input: 0.000535653, achieved: 0.000535653
dataset 432, input: 0.00104446, achieved: 0.00104446
dataset 433, input: 0.00103606, achieved: 0.00103606
dataset 434, input: 0.00102965, achieved: 0.00102965
dataset 435, input: 0.0010364, achieved: 0.0010364
dataset 436, input: 0.00101462, achieved: 0.00101462
dataset 437, input: 0.00102401, achieved: 0.00102401
dataset 438, input: 0.000797452, achieved: 0.000797452
dataset 439, input: 0.000865037, achieved: 0.000865037
dataset 440, input: 0.000831262, achieved: 0.000831262
dataset 441, input: 0.000854249, achieved: 0.000854249
dataset 442, input: 0.000833657, achieved: 0.000833657
dataset 443, input: 0.00082121, achieved: 0.00082121
dataset 444, input: 0.000825459, achieved: 0.000825459
dataset 445, input: 0.000801148, achieved: 0.000801148
dataset 446, input: 0.000794734, achieved: 0.000794734
dataset 447, input: 0.000775265, achieved: 0.000775265
dataset 448, input: 0.000776266, achieved: 0.000776266
dataset 449, input: 0.000776203, achieved: 0.000776203
dataset 450, input: 0.000776859, achieved: 0.000776859
dataset 451, input: 0.000766163, achieved: 0.000766163
dataset 452, input: 0.0007391, achieved: 0.0007391
dataset 453, input: 0.000756261, achieved: 0.000756261
dataset 454, input: 0.0010847, achieved: 0.0010847
dataset 455, input: 0.00109135, achieved: 0.00109135
dataset 456, input: 0.0010651, achieved: 0.0010651
dataset 457, input: 0.00102641, achieved: 0.00102641
dataset 458, input: 0.00101347, achieved: 0.00101347
dataset 459, input: 0.000990928, achieved: 0.000990928
dataset 460, input: 0.000954194, achieved: 0.000954194
dataset 461, input: 0.000954268, achieved: 0.000954268
dataset 462, input: 0.000930792, achieved: 0.000930792
dataset 463, input: 0.000939197, achieved: 0.000939197
dataset 464, input: 0.000878036, achieved: 0.000878036
dataset 465, input: 0.000840985, achieved: 0.000840985
dataset 466, input: 0.000837658, achieved: 0.000837658
dataset 467, input: 0.000828821, achieved: 0.000828821
dataset 468, input: 0.00080147, achieved: 0.00080147
dataset 469, input: 0.000810347, achieved: 0.000810347
dataset 470, input: 0.000888525, achieved: 0.000888525
dataset 471, input: 0.00100767, achieved: 0.00100767
dataset 472, input: 0.000980341, achieved: 0.000980341
dataset 473, input: 0.000864531, achieved: 0.000864531
dataset 474, input: 0.000748864, achieved: 0.000748864
dataset 475, input: 0.000746826, achieved: 0.000746826
dataset 476, input: 0.000845677, achieved: 0.000845677
dataset 477, input: 0.000897949, achieved: 0.000897949
dataset 478, input: 0.000767988, achieved: 0.000767988
dataset 479, input: 0.000885641, achieved: 0.000885641
dataset 480, input: 0.000896942, achieved: 0.000896942
dataset 481, input: 0.00107101, achieved: 0.00107101
dataset 482, input: 0.00105722, achieved: 0.00105722
dataset 483, input: 0.00104686, achieved: 0.00104686
dataset 484, input: 0.000883384, achieved: 0.000883384
dataset 485, input: 0.000876839, achieved: 0.000876839
dataset 486, input: 0.00103013, achieved: 0.00103013
dataset 487, input: 0.00100839, achieved: 0.00100839
dataset 488, input: 0.00100041, achieved: 0.00100041
dataset 489, input: 0.000997669, achieved: 0.000997669
dataset 490, input: 0.00100923, achieved: 0.00100923
dataset 491, input: 0.000992995, achieved: 0.000992995
dataset 492, input: 0.000994918, achieved: 0.000994918
dataset 493, input: 0.00097615, achieved: 0.00097615
dataset 494, input: 0.000985948, achieved: 0.000985948
dataset 495, input: 0.000979789, achieved: 0.000979789
dataset 496, input: 0.000979104, achieved: 0.000979104
dataset 497, input: 0.00100037, achieved: 0.00100037
dataset 498, input: 0.000985344, achieved: 0.000985344
dataset 499, input: 0.000982834, achieved: 0.000982834
dataset 500, input: 0.000964498, achieved: 0.000964498
dataset 501, input: 0.00097326, achieved: 0.00097326
dataset 502, input: 0.000739831, achieved: 0.000739831
dataset 503, input: 0.000756791, achieved: 0.000756791
dataset 504, input: 0.000775489, achieved: 0.000775489
dataset 505, input: 0.000751817, achieved: 0.000751817
dataset 506, input: 0.000758144, achieved: 0.000758144
dataset 507, input: 0.00073891, achieved: 0.00073891
dataset 508, input: 0.000737131, achieved: 0.000737131
dataset 509, input: 0.000746981, achieved: 0.000746981
dataset 510, input: 0.0010393, achieved: 0.0010393
dataset 511, input: 0.0010687, achieved: 0.0010687
dataset 512, input: 0.00107037, achieved: 0.00107037
dataset 513, input: 0.00102187, achieved: 0.00102187
dataset 514, input: 0.000717932, achieved: 0.000717932
dataset 515, input: 0.000765, achieved: 0.000765
dataset 516, input: 0.000764068, achieved: 0.000764068
dataset 517, input: 0.000749324, achieved: 0.000749324
dataset 518, input: 0.000758766, achieved: 0.000758766
dataset 519, input: 0.000727022, achieved: 0.000727022
dataset 520, input: 0.0007318, achieved: 0.0007318
dataset 521, input: 0.000715865, achieved: 0.000715865
dataset 522, input: 0.00072765, achieved: 0.00072765
dataset 523, input: 0.000723706, achieved: 0.000723706
dataset 524, input: 0.000725168, achieved: 0.000725168
dataset 525, input: 0.000718652, achieved: 0.000718652
dataset 526, input: 0.000706188, achieved: 0.000706188
dataset 527, input: 0.000834537, achieved: 0.000834537
dataset 528, input: 0.000985672, achieved: 0.000985672
dataset 529, input: 0.000981216, achieved: 0.000981216
dataset 530, input: 0.000931402, achieved: 0.000931402
dataset 531, input: 0.000933538, achieved: 0.000933538
dataset 532, input: 0.000914684, achieved: 0.000914684
dataset 533, input: 0.000895554, achieved: 0.000895554
dataset 534, input: 0.000864565, achieved: 0.000864565
dataset 535, input: 0.000850898, achieved: 0.000850898
dataset 536, input: 0.000842159, achieved: 0.000842159
dataset 537, input: 0.000825511, achieved: 0.000825511
dataset 538, input: 0.000985655, achieved: 0.000985655
dataset 539, input: 0.000962852, achieved: 0.000962852
dataset 540, input: 0.00096352, achieved: 0.00096352
dataset 541, input: 0.000948794, achieved: 0.000948794
dataset 542, input: 0.000944591, achieved: 0.000944591
dataset 543, input: 0.000926906, achieved: 0.000926906
dataset 544, input: 0.000911397, achieved: 0.000911397
dataset 545, input: 0.00089347, achieved: 0.00089347
dataset 546, input: 0.00089225, achieved: 0.00089225
dataset 547, input: 0.000896596, achieved: 0.000896596
dataset 548, input: 0.000883045, achieved: 0.000883045
dataset 549, input: 0.000850363, achieved: 0.000850363
dataset 550, input: 0.000866114, achieved: 0.000866114
dataset 551, input: 0.000871318, achieved: 0.000871318
dataset 552, input: 0.000874812, achieved: 0.000874812
dataset 553, input: 0.000835568, achieved: 0.000835568
dataset 554, input: 0.000844802, achieved: 0.000844802
dataset 555, input: 0.0008349, achieved: 0.0008349
dataset 556, input: 0.000810583, achieved: 0.000810583
dataset 557, input: 0.000825891, achieved: 0.000825891
dataset 558, input: 0.000817474, achieved: 0.000817474
dataset 559, input: 0.000806801, achieved: 0.000806801
dataset 560, input: 0.000790112, achieved: 0.000790112
dataset 561, input: 0.000794055, achieved: 0.000794055
dataset 562, input: 0.00078519, achieved: 0.00078519
dataset 563, input: 0.000982667, achieved: 0.000982667
dataset 564, input: 0.000985471, achieved: 0.000985471
dataset 565, input: 0.000982846, achieved: 0.000982846
dataset 566, input: 0.000977215, achieved: 0.000977215
dataset 567, input: 0.000954913, achieved: 0.000954913
dataset 568, input: 0.0009296, achieved: 0.0009296
dataset 569, input: 0.00096196, achieved: 0.00096196
dataset 570, input: 0.000917315, achieved: 0.000917315
dataset 571, input: 0.000894956, achieved: 0.000894956
dataset 572, input: 0.000898772, achieved: 0.000898772
dataset 573, input: 0.000891709, achieved: 0.000891709
dataset 574, input: 0.000847087, achieved: 0.000847087
dataset 575, input: 0.000820721, achieved: 0.000820721
dataset 576, input: 0.00118844, achieved: 0.00118844
dataset 577, input: 0.000583665, achieved: 0.000583665
dataset 578, input: 0.000977946, achieved: 0.000977946
dataset 579, input: 0.000581754, achieved: 0.000581754
dataset 580, input: 0.000965414, achieved: 0.000965414
dataset 581, input: 0.00114483, achieved: 0.00114483
dataset 582, input: 0.00115279, achieved: 0.00115279
dataset 583, input: 0.00112374, achieved: 0.00112374
dataset 584, input: 0.00110307, achieved: 0.00110307
dataset 585, input: 0.00110299, achieved: 0.00110299
dataset 586, input: 0.00109774, achieved: 0.00109774
dataset 587, input: 0.000557592, achieved: 0.000557592
dataset 588, input: 0.00106251, achieved: 0.00106251
dataset 589, input: 0.00105417, achieved: 0.00105417
dataset 590, input: 0.00105698, achieved: 0.00105698
dataset 591, input: 0.00103661, achieved: 0.00103661
dataset 592, input: 0.000535267, achieved: 0.000535267
dataset 593, input: 0.00104656, achieved: 0.00104656
dataset 594, input: 0.00101346, achieved: 0.00101346
dataset 595, input: 0.000978136, achieved: 0.000978136
dataset 596, input: 0.000989408, achieved: 0.000989408
dataset 597, input: 0.000980071, achieved: 0.000980071
dataset 598, input: 0.000958649, achieved: 0.000958649
dataset 599, input: 0.000949352, achieved: 0.000949352
dataset 600, input: 0.000947706, achieved: 0.000947706
dataset 601, input: 0.000930498, achieved: 0.000930498
dataset 602, input: 0.000927868, achieved: 0.000927868
dataset 603, input: 0.000799121, achieved: 0.000799121
dataset 604, input: 0.000789657, achieved: 0.000789657
dataset 605, input: 0.000792391, achieved: 0.000792391
dataset 606, input: 0.000782346, achieved: 0.000782346
dataset 607, input: 0.000770412, achieved: 0.000770412
dataset 608, input: 0.000770268, achieved: 0.000770268
dataset 609, input: 0.000753256, achieved: 0.000753256
dataset 610, input: 0.000736429, achieved: 0.000736429
dataset 611, input: 0.000745162, achieved: 0.000745162
dataset 612, input: 0.000732543, achieved: 0.000732543
dataset 613, input: 0.000734477, achieved: 0.000734477
dataset 614, input: 0.000687916, achieved: 0.000687916
dataset 615, input: 0.00069228, achieved: 0.00069228
dataset 616, input: 0.0006846, achieved: 0.0006846
dataset 617, input: 0.000829074, achieved: 0.000829074
dataset 618, input: 0.000939076, achieved: 0.000939076
dataset 619, input: 0.000930521, achieved: 0.000930521
dataset 620, input: 0.000926008, achieved: 0.000926008
dataset 621, input: 0.000909221, achieved: 0.000909221
dataset 622, input: 0.000900448, achieved: 0.000900448
dataset 623, input: 0.000894213, achieved: 0.000894213
dataset 624, input: 0.000884432, achieved: 0.000884432
dataset 625, input: 0.000879136, achieved: 0.000879136
dataset 626, input: 0.000876395, achieved: 0.000876395
dataset 627, input: 0.000872642, achieved: 0.000872642
dataset 628, input: 0.000833783, achieved: 0.000833783
dataset 629, input: 0.000841089, achieved: 0.000841089
dataset 630, input: 0.000863345, achieved: 0.000863345
dataset 631, input: 0.000903764, achieved: 0.000903764
dataset 632, input: 0.000872659, achieved: 0.000872659
dataset 633, input: 0.00087841, achieved: 0.00087841
dataset 634, input: 0.00087677, achieved: 0.00087677
dataset 635, input: 0.000868462, achieved: 0.000868462
dataset 636, input: 0.000844658, achieved: 0.000844658
dataset 637, input: 0.000835349, achieved: 0.000835349
dataset 638, input: 0.000840162, achieved: 0.000840162
dataset 639, input: 0.000828855, achieved: 0.000828855
dataset 640, input: 0.00081185, achieved: 0.00081185
dataset 641, input: 0.000798148, achieved: 0.000798148
dataset 642, input: 0.000792564, achieved: 0.000792564
dataset 643, input: 0.00105035, achieved: 0.00105035
dataset 644, input: 0.000988677, achieved: 0.000988677
dataset 645, input: 0.000968522, achieved: 0.000968522
dataset 646, input: 0.000888888, achieved: 0.000888888
dataset 647, input: 0.000937597, achieved: 0.000937597
dataset 648, input: 0.000921195, achieved: 0.000921195
dataset 649, input: 0.0010635, achieved: 0.0010635
dataset 650, input: 0.00101841, achieved: 0.00101841
dataset 651, input: 0.00103273, achieved: 0.00103273
dataset 652, input: 0.00101029, achieved: 0.00101029
dataset 653, input: 0.00108704, achieved: 0.00108704
dataset 654, input: 0.000783543, achieved: 0.000783543
dataset 655, input: 0.000794947, achieved: 0.000794947
dataset 656, input: 0.000558715, achieved: 0.000558715
dataset 657, input: 0.000806749, achieved: 0.000806749
dataset 658, input: 0.000806363, achieved: 0.000806363
dataset 659, input: 0.000549026, achieved: 0.000549026
dataset 660, input: 0.000786911, achieved: 0.000786911
dataset 661, input: 0.000804302, achieved: 0.000804302
dataset 662, input: 0.000810002, achieved: 0.000810002
dataset 663, input: 0.000784867, achieved: 0.000784867
dataset 664, input: 0.000803306, achieved: 0.000803306
dataset 665, input: 0.000817773, achieved: 0.000817773
dataset 666, input: 0.000810462, achieved: 0.000810462
dataset 667, input: 0.000796237, achieved: 0.000796237
dataset 668, input: 0.00081022, achieved: 0.00081022
dataset 669, input: 0.000544075, achieved: 0.000544075
dataset 670, input: 0.000812494, achieved: 0.000812494
dataset 671, input: 0.000796784, achieved: 0.000796784
dataset 672, input: 0.000799766, achieved: 0.000799766
dataset 673, input: 0.00081931, achieved: 0.00081931
dataset 674, input: 0.000784775, achieved: 0.000784775
dataset 675, input: 0.000563591, achieved: 0.000563591
dataset 676, input: 0.000821325, achieved: 0.000821325
dataset 677, input: 0.000785777, achieved: 0.000785777
dataset 678, input: 0.000550114, achieved: 0.000550114
dataset 679, input: 0.000578357, achieved: 0.000578357
dataset 680, input: 0.000810796, achieved: 0.000810796
dataset 681, input: 0.000559705, achieved: 0.000559705
dataset 682, input: 0.000795966, achieved: 0.000795966
dataset 683, input: 0.000786197, achieved: 0.000786197
dataset 684, input: 0.000799184, achieved: 0.000799184
dataset 685, input: 0.000789743, achieved: 0.000789743
dataset 686, input: 0.000782046, achieved: 0.000782046
dataset 687, input: 0.000817969, achieved: 0.000817969
dataset 688, input: 0.000797555, achieved: 0.000797555
dataset 689, input: 0.00079584, achieved: 0.00079584
dataset 690, input: 0.000785581, achieved: 0.000785581
dataset 691, input: 0.000807636, achieved: 0.000807636
dataset 692, input: 0.000794182, achieved: 0.000794182
dataset 693, input: 0.00058298, achieved: 0.00058298
dataset 694, input: 0.000549394, achieved: 0.000549394
dataset 695, input: 0.000805367, achieved: 0.000805367
dataset 696, input: 0.000786951, achieved: 0.000786951
dataset 697, input: 0.000796905, achieved: 0.000796905
dataset 698, input: 0.000817675, achieved: 0.000817675
dataset 699, input: 0.000756083, achieved: 0.000756083
dataset 700, input: 0.000672539, achieved: 0.000672539
dataset 701, input: 0.000692878, achieved: 0.000692878
dataset 702, input: 0.000674531, achieved: 0.000674531
dataset 703, input: 0.000660484, achieved: 0.000660484
dataset 704, input: 0.000682953, achieved: 0.000682953
dataset 705, input: 0.00066068, achieved: 0.00066068
dataset 706, input: 0.000671561, achieved: 0.000671561
dataset 707, input: 0.000639236, achieved: 0.000639236
dataset 708, input: 0.000680311, achieved: 0.000680311
dataset 709, input: 0.00066102, achieved: 0.00066102
dataset 710, input: 0.00064946, achieved: 0.00064946
dataset 711, input: 0.000676552, achieved: 0.000676552
dataset 712, input: 0.000658683, achieved: 0.000658683
dataset 713, input: 0.000660358, achieved: 0.000660358
dataset 714, input: 0.000675844, achieved: 0.000675844
dataset 715, input: 0.000676673, achieved: 0.000676673
dataset 716, input: 0.000657514, achieved: 0.000657514
dataset 717, input: 0.000652097, achieved: 0.000652097
dataset 718, input: 0.000632949, achieved: 0.000632949
dataset 719, input: 0.000644077, achieved: 0.000644077
dataset 720, input: 0.000624821, achieved: 0.000624821
dataset 721, input: 0.000614925, achieved: 0.000614925
dataset 722, input: 0.000621021, achieved: 0.000621021
dataset 723, input: 0.000620705, achieved: 0.000620705
dataset 724, input: 0.000637515, achieved: 0.000637515
dataset 725, input: 0.000616416, achieved: 0.000616416
dataset 726, input: 0.000616744, achieved: 0.000616744
dataset 727, input: 0.000610244, achieved: 0.000610244
dataset 728, input: 0.0006293, achieved: 0.0006293
dataset 729, input: 0.000617204, achieved: 0.000617204
dataset 730, input: 0.000633824, achieved: 0.000633824
dataset 731, input: 0.000612714, achieved: 0.000612714
dataset 732, input: 0.000616692, achieved: 0.000616692
dataset 733, input: 0.000578432, achieved: 0.000578432
dataset 734, input: 0.000607326, achieved: 0.000607326
dataset 735, input: 0.000609617, achieved: 0.000609617
dataset 736, input: 0.000615345, achieved: 0.000615345
dataset 737, input: 0.000613687, achieved: 0.000613687
dataset 738, input: 0.000614948, achieved: 0.000614948
dataset 739, input: 0.000984855, achieved: 0.000984855
dataset 740, input: 0.000985448, achieved: 0.000985448
dataset 741, input: 0.000887086, achieved: 0.000887086
dataset 742, input: 0.000855233, achieved: 0.000855233
dataset 743, input: 0.000844566, achieved: 0.000844566
dataset 744, input: 0.000841031, achieved: 0.000841031
dataset 745, input: 0.000844733, achieved: 0.000844733
dataset 746, input: 0.000837727, achieved: 0.000837727
dataset 747, input: 0.000836903, achieved: 0.000836903
dataset 748, input: 0.000836932, achieved: 0.000836932
dataset 749, input: 0.000839937, achieved: 0.000839937
dataset 750, input: 0.000819592, achieved: 0.000819592
dataset 751, input: 0.000815016, achieved: 0.000815016
dataset 752, input: 0.000610244, achieved: 0.000610244
dataset 753, input: 0.000602288, achieved: 0.000602288
dataset 754, input: 0.000608322, achieved: 0.000608322
dataset 755, input: 0.000524525, achieved: 0.000524525
dataset 756, input: 0.000585824, achieved: 0.000585824
dataset 757, input: 0.000589232, achieved: 0.000589232
dataset 758, input: 0.000566659, achieved: 0.000566659
dataset 759, input: 0.000586077, achieved: 0.000586077
dataset 760, input: 0.000583429, achieved: 0.000583429
dataset 761, input: 0.000583193, achieved: 0.000583193
dataset 762, input: 0.000566153, achieved: 0.000566153
dataset 763, input: 0.000581903, achieved: 0.000581903
dataset 764, input: 0.000556303, achieved: 0.000556303
dataset 765, input: 0.00055731, achieved: 0.00055731
dataset 766, input: 0.000575899, achieved: 0.000575899
dataset 767, input: 0.000571932, achieved: 0.000571932
dataset 768, input: 0.000563602, achieved: 0.000563602
dataset 769, input: 0.000588656, achieved: 0.000588656
dataset 770, input: 0.000575513, achieved: 0.000575513
dataset 771, input: 0.000551945, achieved: 0.000551945
dataset 772, input: 0.000562606, achieved: 0.000562606
dataset 773, input: 0.000580199, achieved: 0.000580199
dataset 774, input: 0.000539078, achieved: 0.000539078
dataset 775, input: 0.000576227, achieved: 0.000576227
dataset 776, input: 0.000571069, achieved: 0.000571069
dataset 777, input: 0.000553971, achieved: 0.000553971
dataset 778, input: 0.000556343, achieved: 0.000556343
dataset 779, input: 0.000565208, achieved: 0.000565208
dataset 780, input: 0.000578202, achieved: 0.000578202
dataset 781, input: 0.00055328, achieved: 0.00055328
dataset 782, input: 0.000540569, achieved: 0.000540569
dataset 783, input: 0.000950008, achieved: 0.000950008
dataset 784, input: 0.000960895, achieved: 0.000960895
dataset 785, input: 0.000938356, achieved: 0.000938356
dataset 786, input: 0.000923251, achieved: 0.000923251
dataset 787, input: 0.000889642, achieved: 0.000889642
dataset 788, input: 0.000852568, achieved: 0.000852568
dataset 789, input: 0.000825833, achieved: 0.000825833
dataset 790, input: 0.00083254, achieved: 0.00083254
dataset 791, input: 0.000845838, achieved: 0.000845838
dataset 792, input: 0.000820208, achieved: 0.000820208
dataset 793, input: 0.0008303, achieved: 0.0008303
dataset 794, input: 0.000823841, achieved: 0.000823841
dataset 795, input: 0.000831118, achieved: 0.000831118
dataset 796, input: 0.000809506, achieved: 0.000809506
dataset 797, input: 0.000598662, achieved: 0.000598662
dataset 798, input: 0.000603699, achieved: 0.000603699
dataset 799, input: 0.000610561, achieved: 0.000610561
dataset 800, input: 0.00062835, achieved: 0.00062835
dataset 801, input: 0.0004126, achieved: 0.0004126
dataset 802, input: 0.000623502, achieved: 0.000623502
dataset 803, input: 0.00061428, achieved: 0.00061428
dataset 804, input: 0.000602461, achieved: 0.000602461
dataset 805, input: 0.00060013, achieved: 0.00060013
dataset 806, input: 0.000587954, achieved: 0.000587954
dataset 807, input: 0.00059572, achieved: 0.00059572
dataset 808, input: 0.000584413, achieved: 0.000584413
dataset 809, input: 0.000590314, achieved: 0.000590314
dataset 810, input: 0.000585968, achieved: 0.000585968
dataset 811, input: 0.000584528, achieved: 0.000584528
dataset 812, input: 0.000583216, achieved: 0.000583216
dataset 813, input: 0.000574414, achieved: 0.000574414
dataset 814, input: 0.00100597, achieved: 0.00100597
dataset 815, input: 0.000929882, achieved: 0.000929882
dataset 816, input: 0.000955161, achieved: 0.000955161
dataset 817, input: 0.00095287, achieved: 0.00095287
dataset 818, input: 0.000947809, achieved: 0.000947809
dataset 819, input: 0.000965408, achieved: 0.000965408
dataset 820, input: 0.000936514, achieved: 0.000936514
dataset 821, input: 0.000923797, achieved: 0.000923797
dataset 822, input: 0.000920545, achieved: 0.000920545
dataset 823, input: 0.000888013, achieved: 0.000888013
dataset 824, input: 0.00089427, achieved: 0.00089427
dataset 825, input: 0.000868382, achieved: 0.000868382
dataset 826, input: 0.000871191, achieved: 0.000871191
dataset 827, input: 0.000862648, achieved: 0.000862648
dataset 828, input: 0.000899383, achieved: 0.000899383
dataset 829, input: 0.000890776, achieved: 0.000890776
dataset 830, input: 0.000872118, achieved: 0.000872118
dataset 831, input: 0.000858716, achieved: 0.000858716
dataset 832, input: 0.000942375, achieved: 0.000942375
dataset 833, input: 0.00097664, achieved: 0.00097664
dataset 834, input: 0.000976415, achieved: 0.000976415
dataset 835, input: 0.000943877, achieved: 0.000943877
dataset 836, input: 0.000972915, achieved: 0.000972915
dataset 837, input: 0.000950475, achieved: 0.000950475
dataset 838, input: 0.000932116, achieved: 0.000932116
dataset 839, input: 0.000936877, achieved: 0.000936877
dataset 840, input: 0.000922053, achieved: 0.000922053
dataset 841, input: 0.000921414, achieved: 0.000921414
dataset 842, input: 0.000905502, achieved: 0.000905502
dataset 843, input: 0.000902296, achieved: 0.000902296
dataset 844, input: 0.000892515, achieved: 0.000892515
dataset 845, input: 0.000902002, achieved: 0.000902002
dataset 846, input: 0.000838193, achieved: 0.000838193
dataset 847, input: 0.000579629, achieved: 0.000579629
dataset 848, input: 0.000568634, achieved: 0.000568634
dataset 849, input: 0.000555479, achieved: 0.000555479
dataset 850, input: 0.000559682, achieved: 0.000559682
dataset 851, input: 0.000556631, achieved: 0.000556631
dataset 852, input: 0.000539009, achieved: 0.000539009
dataset 853, input: 0.0005394, achieved: 0.0005394
dataset 854, input: 0.000531801, achieved: 0.000531801
dataset 855, input: 0.000561173, achieved: 0.000561173
dataset 856, input: 0.000531629, achieved: 0.000531629
dataset 857, input: 0.000533246, achieved: 0.000533246
dataset 858, input: 0.000515049, achieved: 0.000515049
dataset 859, input: 0.000518376, achieved: 0.000518376
dataset 860, input: 0.000511019, achieved: 0.000511019
dataset 861, input: 0.000516874, achieved: 0.000516874
dataset 862, input: 0.000529982, achieved: 0.000529982
dataset 863, input: 0.000929583, achieved: 0.000929583
dataset 864, input: 0.000910326, achieved: 0.000910326
dataset 865, input: 0.000883459, achieved: 0.000883459
dataset 866, input: 0.000893321, achieved: 0.000893321
dataset 867, input: 0.000880621, achieved: 0.000880621
dataset 868, input: 0.000853667, achieved: 0.000853667
dataset 869, input: 0.000866125, achieved: 0.000866125
dataset 870, input: 0.000823737, achieved: 0.000823737
dataset 871, input: 0.00083064, achieved: 0.00083064
dataset 872, input: 0.000821176, achieved: 0.000821176
dataset 873, input: 0.00080596, achieved: 0.00080596
dataset 874, input: 0.000800526, achieved: 0.000800526
dataset 875, input: 0.000789501, achieved: 0.000789501
dataset 876, input: 0.000612507, achieved: 0.000612507
dataset 877, input: 0.000419422, achieved: 0.000419422
dataset 878, input: 0.000620112, achieved: 0.000620112
dataset 879, input: 0.000616053, achieved: 0.000616053
dataset 880, input: 0.00061641, achieved: 0.00061641
dataset 881, input: 0.00061097, achieved: 0.00061097
dataset 882, input: 0.00041374, achieved: 0.00041374
dataset 883, input: 0.000605483, achieved: 0.000605483
dataset 884, input: 0.000603071, achieved: 0.000603071
dataset 885, input: 0.000604476, achieved: 0.000604476
dataset 886, input: 0.000586445, achieved: 0.000586445
dataset 887, input: 0.000584419, achieved: 0.000584419
dataset 888, input: 0.000582807, achieved: 0.000582807
dataset 889, input: 0.000590763, achieved: 0.000590763
dataset 890, input: 0.000586969, achieved: 0.000586969
dataset 891, input: 0.000585047, achieved: 0.000585047
dataset 892, input: 0.000585812, achieved: 0.000585812
dataset 893, input: 0.000566763, achieved: 0.000566763
dataset 894, input: 0.000584592, achieved: 0.000584592
dataset 895, input: 0.000597522, achieved: 0.000597522
dataset 896, input: 0.000563038, achieved: 0.000563038
dataset 897, input: 0.000609916, achieved: 0.000609916
dataset 898, input: 0.000603687, achieved: 0.000603687
dataset 899, input: 0.00060656, achieved: 0.00060656
dataset 900, input: 0.000595449, achieved: 0.000595449
dataset 901, input: 0.0005934, achieved: 0.0005934
dataset 902, input: 0.000598495, achieved: 0.000598495
dataset 903, input: 0.000585409, achieved: 0.000585409
dataset 904, input: 0.000573487, achieved: 0.000573487
dataset 905, input: 0.000561046, achieved: 0.000561046
dataset 906, input: 0.000564195, achieved: 0.000564195
dataset 907, input: 0.00055229, achieved: 0.00055229
dataset 908, input: 0.000555117, achieved: 0.000555117
dataset 909, input: 0.000554253, achieved: 0.000554253
dataset 910, input: 0.000550684, achieved: 0.000550684
dataset 911, input: 0.000550862, achieved: 0.000550862
dataset 912, input: 0.000543902, achieved: 0.000543902
dataset 913, input: 0.00054921, achieved: 0.00054921
dataset 914, input: 0.000529614, achieved: 0.000529614
dataset 915, input: 0.000530984, achieved: 0.000530984
dataset 916, input: 0.000537909, achieved: 0.000537909
dataset 917, input: 0.000528825, achieved: 0.000528825
dataset 918, input: 0.000568196, achieved: 0.000568196
dataset 919, input: 0.000579606, achieved: 0.000579606
dataset 920, input: 0.000583786, achieved: 0.000583786
dataset 921, input: 0.000594424, achieved: 0.000594424
dataset 922, input: 0.00057625, achieved: 0.00057625
dataset 923, input: 0.000576302, achieved: 0.000576302
dataset 924, input: 0.000581754, achieved: 0.000581754
dataset 925, input: 0.000562393, achieved: 0.000562393
dataset 926, input: 0.000557368, achieved: 0.000557368
dataset 927, input: 0.000563827, achieved: 0.000563827
dataset 928, input: 0.000560781, achieved: 0.000560781
dataset 929, input: 0.000570775, achieved: 0.000570775
dataset 930, input: 0.000565093, achieved: 0.000565093
dataset 931, input: 0.000560223, achieved: 0.000560223
dataset 932, input: 0.000555491, achieved: 0.000555491
dataset 933, input: 0.000551185, achieved: 0.000551185
dataset 934, input: 0.000529211, achieved: 0.000529211
dataset 935, input: 0.00054556, achieved: 0.00054556
dataset 936, input: 0.000978931, achieved: 0.000978931
dataset 937, input: 0.000794878, achieved: 0.000794878
dataset 938, input: 0.000782467, achieved: 0.000782467
dataset 939, input: 0.000778414, achieved: 0.000778414
dataset 940, input: 0.000757557, achieved: 0.000757557
dataset 941, input: 0.000761224, achieved: 0.000761224
dataset 942, input: 0.000739296, achieved: 0.000739296
dataset 943, input: 0.000746855, achieved: 0.000746855
dataset 944, input: 0.000745041, achieved: 0.000745041
dataset 945, input: 0.000748403, achieved: 0.000748403
dataset 946, input: 0.000720511, achieved: 0.000720511
dataset 947, input: 0.00071898, achieved: 0.00071898
dataset 948, input: 0.000719446, achieved: 0.000719446
dataset 949, input: 0.000721668, achieved: 0.000721668
dataset 950, input: 0.000742105, achieved: 0.000742105
dataset 951, input: 0.000708859, achieved: 0.000708859
dataset 952, input: 0.000731443, achieved: 0.000731443
dataset 953, input: 0.000681008, achieved: 0.000681008
dataset 954, input: 0.000701537, achieved: 0.000701537
dataset 955, input: 0.000674882, achieved: 0.000674882
dataset 956, input: 0.000678187, achieved: 0.000678187
dataset 957, input: 0.000671376, achieved: 0.000671376
dataset 958, input: 0.000651492, achieved: 0.000651492
dataset 959, input: 0.000704962, achieved: 0.000704962
dataset 960, input: 0.000504583, achieved: 0.000504583
dataset 961, input: 0.000503161, achieved: 0.000503161
dataset 962, input: 0.000497197, achieved: 0.000497197
dataset 963, input: 0.000493328, achieved: 0.000493328
dataset 964, input: 0.000821043, achieved: 0.000821043
dataset 965, input: 0.000953791, achieved: 0.000953791
dataset 966, input: 0.000485079, achieved: 0.000485079
dataset 967, input: 0.000487566, achieved: 0.000487566
dataset 968, input: 0.000795356, achieved: 0.000795356
dataset 969, input: 0.000846615, achieved: 0.000846615
dataset 970, input: 0.00077272, achieved: 0.00077272
dataset 971, input: 0.000839523, achieved: 0.000839523
dataset 972, input: 0.000831319, achieved: 0.000831319
dataset 973, input: 0.000938535, achieved: 0.000938535
dataset 974, input: 0.000954499, achieved: 0.000954499
dataset 975, input: 0.000517185, achieved: 0.000517185
dataset 976, input: 0.000959271, achieved: 0.000959271
dataset 977, input: 0.000892843, achieved: 0.000892843
dataset 978, input: 0.000813767, achieved: 0.000813767
dataset 979, input: 0.000743331, achieved: 0.000743331
dataset 980, input: 0.000970877, achieved: 0.000970877
dataset 981, input: 0.000809443, achieved: 0.000809443
dataset 982, input: 0.00086548, achieved: 0.00086548
dataset 983, input: 0.000802667, achieved: 0.000802667
dataset 984, input: 0.000792725, achieved: 0.000792725
dataset 985, input: 0.000919405, achieved: 0.000919405
dataset 986, input: 0.000747833, achieved: 0.000747833
dataset 987, input: 0.000890626, achieved: 0.000890626
dataset 988, input: 0.000862942, achieved: 0.000862942
dataset 989, input: 0.000880938, achieved: 0.000880938
dataset 990, input: 0.000513448, achieved: 0.000513448
dataset 991, input: 0.000500029, achieved: 0.000500029
dataset 992, input: 0.00051479, achieved: 0.00051479
dataset 993, input: 0.000518336, achieved: 0.000518336
dataset 994, input: 0.000511537, achieved: 0.000511537
dataset 995, input: 0.000871421, achieved: 0.000871421
dataset 996, input: 0.00083969, achieved: 0.00083969
dataset 997, input: 0.000492499, achieved: 0.000492499
dataset 998, input: 0.000926595, achieved: 0.000926595
dataset 999, input: 0.000917632, achieved: 0.000917632
dataset 1000, input: 0.000902411, achieved: 0.000902411
dataset 1001, input: 0.00083665, achieved: 0.00083665
dataset 1002, input: 0.000817319, achieved: 0.000817319
dataset 1003, input: 0.00080493, achieved: 0.00080493
dataset 1004, input: 0.000801856, achieved: 0.000801856
dataset 1005, input: 0.000793059, achieved: 0.000793059
dataset 1006, input: 0.000802984, achieved: 0.000802984
dataset 1007, input: 0.000785046, achieved: 0.000785046
dataset 1008, input: 0.000782996, achieved: 0.000782996
dataset 1009, input: 0.000772167, achieved: 0.000772167
dataset 1010, input: 0.000744748, achieved: 0.000744748
dataset 1011, input: 0.000744771, achieved: 0.000744771
dataset 1012, input: 0.000755432, achieved: 0.000755432
dataset 1013, input: 0.000747724, achieved: 0.000747724
dataset 1014, input: 0.000742577, achieved: 0.000742577
dataset 1015, input: 0.000747154, achieved: 0.000747154
dataset 1016, input: 0.000741242, achieved: 0.000741242
dataset 1017, input: 0.000737016, achieved: 0.000737016
dataset 1018, input: 0.000725203, achieved: 0.000725203
dataset 1019, input: 0.000532728, achieved: 0.000532728
dataset 1020, input: 0.000537788, achieved: 0.000537788
dataset 1021, input: 0.000518359, achieved: 0.000518359
dataset 1022, input: 0.0005224, achieved: 0.0005224
dataset 1023, input: 0.00053407, achieved: 0.00053407
dataset 1024, input: 0.000529867, achieved: 0.000529867
dataset 1025, input: 0.000519608, achieved: 0.000519608
dataset 1026, input: 0.000520368, achieved: 0.000520368
dataset 1027, input: 0.000528278, achieved: 0.000528278
dataset 1028, input: 0.00051631, achieved: 0.00051631
dataset 1029, input: 0.000499977, achieved: 0.000499977
dataset 1030, input: 0.000511462, achieved: 0.000511462
dataset 1031, input: 0.000501676, achieved: 0.000501676
dataset 1032, input: 0.000483968, achieved: 0.000483968
dataset 1033, input: 0.000497047, achieved: 0.000497047
dataset 1034, input: 0.000965241, achieved: 0.000965241
dataset 1035, input: 0.000893631, achieved: 0.000893631
dataset 1036, input: 0.000792276, achieved: 0.000792276
dataset 1037, input: 0.000812149, achieved: 0.000812149
dataset 1038, input: 0.000779801, achieved: 0.000779801
dataset 1039, input: 0.000767389, achieved: 0.000767389
dataset 1040, input: 0.000780768, achieved: 0.000780768
dataset 1041, input: 0.000748691, achieved: 0.000748691
dataset 1042, input: 0.000746446, achieved: 0.000746446
dataset 1043, input: 0.000758898, achieved: 0.000758898
dataset 1044, input: 0.000737569, achieved: 0.000737569
dataset 1045, input: 0.000737258, achieved: 0.000737258
dataset 1046, input: 0.000734339, achieved: 0.000734339
dataset 1047, input: 0.000829788, achieved: 0.000829788
dataset 1048, input: 0.000944067, achieved: 0.000944067
dataset 1049, input: 0.000740344, achieved: 0.000740344
dataset 1050, input: 0.0008554, achieved: 0.0008554
dataset 1051, input: 0.000902428, achieved: 0.000902428
dataset 1052, input: 0.000537483, achieved: 0.000537483
dataset 1053, input: 0.000527743, achieved: 0.000527743
dataset 1054, input: 0.000966605, achieved: 0.000966605
dataset 1055, input: 0.00052597, achieved: 0.00052597
dataset 1056, input: 0.000525624, achieved: 0.000525624
dataset 1057, input: 0.000511917, achieved: 0.000511917
dataset 1058, input: 0.000517409, achieved: 0.000517409
dataset 1059, input: 0.000518751, achieved: 0.000518751
dataset 1060, input: 0.000502862, achieved: 0.000502862
dataset 1061, input: 0.000940636, achieved: 0.000940636
dataset 1062, input: 0.000509183, achieved: 0.000509183
dataset 1063, input: 0.000489868, achieved: 0.000489868
dataset 1064, input: 0.000501607, achieved: 0.000501607
dataset 1065, input: 0.000520357, achieved: 0.000520357
dataset 1066, input: 0.000504819, achieved: 0.000504819
dataset 1067, input: 0.000495124, achieved: 0.000495124
dataset 1068, input: 0.00050433, achieved: 0.00050433
dataset 1069, input: 0.000496259, achieved: 0.000496259
dataset 1070, input: 0.000496921, achieved: 0.000496921
dataset 1071, input: 0.000503748, achieved: 0.000503748
dataset 1072, input: 0.000503909, achieved: 0.000503909
dataset 1073, input: 0.00051418, achieved: 0.00051418
dataset 1074, input: 0.000507133, achieved: 0.000507133
dataset 1075, input: 0.00096447, achieved: 0.00096447
dataset 1076, input: 0.000485263, achieved: 0.000485263
dataset 1077, input: 0.000972345, achieved: 0.000972345
dataset 1078, input: 0.000958966, achieved: 0.000958966
dataset 1079, input: 0.000914431, achieved: 0.000914431
dataset 1080, input: 0.00046318, achieved: 0.00046318
dataset 1081, input: 0.000540638, achieved: 0.000540638
dataset 1082, input: 0.000524755, achieved: 0.000524755
dataset 1083, input: 0.000541979, achieved: 0.000541979
dataset 1084, input: 0.000535797, achieved: 0.000535797
dataset 1085, input: 0.000507628, achieved: 0.000507628
dataset 1086, input: 0.000522061, achieved: 0.000522061
dataset 1087, input: 0.00051236, achieved: 0.00051236
dataset 1088, input: 0.000476426, achieved: 0.000476426
dataset 1089, input: 0.000523085, achieved: 0.000523085
dataset 1090, input: 0.000883776, achieved: 0.000883776
dataset 1091, input: 0.000917741, achieved: 0.000917741
dataset 1092, input: 0.000892238, achieved: 0.000892238
dataset 1093, input: 0.000882492, achieved: 0.000882492
dataset 1094, input: 0.000867708, achieved: 0.000867708
dataset 1095, input: 0.000831797, achieved: 0.000831797
dataset 1096, input: 0.000831331, achieved: 0.000831331
dataset 1097, input: 0.000664307, achieved: 0.000664307
dataset 1098, input: 0.00586935, achieved: 0.00586935
dataset 1099, input: 0.00614957, achieved: 0.00614957
dataset 1100, input: 0.00595225, achieved: 0.00595225
dataset 1101, input: 0.00596338, achieved: 0.00596338
dataset 1102, input: 0.00481169, achieved: 0.00481169
dataset 1103, input: 0.00538109, achieved: 0.00538109
dataset 1104, input: 0.00548131, achieved: 0.00548131
dataset 1105, input: 0.00176982, achieved: 0.00176982
dataset 1106, input: 0.00505266, achieved: 0.00505266
[2025-03-12 09:18:56][I][data/gpt_dataset:191:megatron.data.gpt_dataset] [BuildConcatDataset] Caught args.shuffle_sample_in_corpus=True across 173705836 samples
> building indices for blendable datasets ...
> sample ratios:
dataset 0, input: 0.00201113, achieved: 0.00201113
dataset 1, input: 0.00203688, achieved: 0.00203688
dataset 2, input: 0.00202405, achieved: 0.00202405
dataset 3, input: 0.00200025, achieved: 0.00200025
dataset 4, input: 0.00203933, achieved: 0.00203933
dataset 5, input: 0.00203023, achieved: 0.00203023
dataset 6, input: 0.00201149, achieved: 0.00201149
dataset 7, input: 0.00206305, achieved: 0.00206305
dataset 8, input: 0.00200524, achieved: 0.00200524
dataset 9, input: 0.00212208, achieved: 0.00212208
dataset 10, input: 0.00200116, achieved: 0.00200116
dataset 11, input: 0.00201295, achieved: 0.00201295
dataset 12, input: 0.00205124, achieved: 0.00205124
dataset 13, input: 0.00199443, achieved: 0.00199443
dataset 14, input: 0.00200746, achieved: 0.00200746
dataset 15, input: 0.00205857, achieved: 0.00205857
dataset 16, input: 0.00202633, achieved: 0.00202633
dataset 17, input: 0.0020443, achieved: 0.0020443
dataset 18, input: 0.00205557, achieved: 0.00205557
dataset 19, input: 0.00200902, achieved: 0.00200902
dataset 20, input: 0.00203939, achieved: 0.00203939
dataset 21, input: 0.00203243, achieved: 0.00203243
dataset 22, input: 0.00204173, achieved: 0.00204173
dataset 23, input: 0.00198551, achieved: 0.00198551
dataset 24, input: 0.00204998, achieved: 0.00204998
dataset 25, input: 0.00205231, achieved: 0.00205231
dataset 26, input: 0.00201839, achieved: 0.00201839
dataset 27, input: 0.00201409, achieved: 0.00201409
dataset 28, input: 0.00203525, achieved: 0.00203525
dataset 29, input: 0.00196943, achieved: 0.00196943
dataset 30, input: 0.00201016, achieved: 0.00201016
dataset 31, input: 0.0019708, achieved: 0.0019708
dataset 32, input: 0.00202278, achieved: 0.00202278
dataset 33, input: 0.00201717, achieved: 0.00201717
dataset 34, input: 0.00199786, achieved: 0.00199786
dataset 35, input: 0.00199099, achieved: 0.00199099
dataset 36, input: 0.00203597, achieved: 0.00203597
dataset 37, input: 0.00199189, achieved: 0.00199189
dataset 38, input: 0.00203205, achieved: 0.00203205
dataset 39, input: 0.00199151, achieved: 0.00199151
dataset 40, input: 0.00204107, achieved: 0.00204107
dataset 41, input: 0.00202455, achieved: 0.00202455
dataset 42, input: 0.00202748, achieved: 0.00202748
dataset 43, input: 0.00203717, achieved: 0.00203717
dataset 44, input: 0.00199452, achieved: 0.00199452
dataset 45, input: 0.00200821, achieved: 0.00200821
dataset 46, input: 0.00203652, achieved: 0.00203652
dataset 47, input: 0.00201588, achieved: 0.00201588
dataset 48, input: 0.00200612, achieved: 0.00200612
dataset 49, input: 0.00201153, achieved: 0.00201153
dataset 50, input: 0.00197638, achieved: 0.00197638
dataset 51, input: 0.00199369, achieved: 0.00199369
dataset 52, input: 0.00197681, achieved: 0.00197681
dataset 53, input: 0.00201665, achieved: 0.00201665
dataset 54, input: 0.00201861, achieved: 0.00201861
dataset 55, input: 0.00200126, achieved: 0.00200126
dataset 56, input: 0.00202826, achieved: 0.00202826
dataset 57, input: 0.00201311, achieved: 0.00201311
dataset 58, input: 0.00197356, achieved: 0.00197356
dataset 59, input: 0.00199114, achieved: 0.00199114
dataset 60, input: 0.00197452, achieved: 0.00197452
dataset 61, input: 0.00202484, achieved: 0.00202484
dataset 62, input: 0.00199182, achieved: 0.00199182
dataset 63, input: 0.00201434, achieved: 0.00201434
dataset 64, input: 0.00199537, achieved: 0.00199537
dataset 65, input: 0.00199465, achieved: 0.00199465
dataset 66, input: 0.00203359, achieved: 0.00203359
dataset 67, input: 0.0020232, achieved: 0.0020232
dataset 68, input: 0.00207336, achieved: 0.00207336
dataset 69, input: 0.00200639, achieved: 0.00200639
dataset 70, input: 0.00205115, achieved: 0.00205115
dataset 71, input: 0.00199221, achieved: 0.00199221
dataset 72, input: 0.00204669, achieved: 0.00204669
dataset 73, input: 0.00198504, achieved: 0.00198504
dataset 74, input: 0.00201612, achieved: 0.00201612
dataset 75, input: 0.00198663, achieved: 0.00198663
dataset 76, input: 0.00203672, achieved: 0.00203672
dataset 77, input: 0.00198562, achieved: 0.00198562
dataset 78, input: 0.00200258, achieved: 0.00200258
dataset 79, input: 0.00203129, achieved: 0.00203129
dataset 80, input: 0.00202148, achieved: 0.00202148
dataset 81, input: 0.00196622, achieved: 0.00196622
dataset 82, input: 0.0020394, achieved: 0.0020394
dataset 83, input: 0.00199123, achieved: 0.00199123
dataset 84, input: 0.00203435, achieved: 0.00203435
dataset 85, input: 0.00199754, achieved: 0.00199754
dataset 86, input: 0.00199452, achieved: 0.00199452
dataset 87, input: 0.00203307, achieved: 0.00203307
dataset 88, input: 0.00195825, achieved: 0.00195825
dataset 89, input: 0.00200366, achieved: 0.00200366
dataset 90, input: 0.00200647, achieved: 0.00200647
dataset 91, input: 0.00199224, achieved: 0.00199224
dataset 92, input: 0.00200645, achieved: 0.00200645
dataset 93, input: 0.00199574, achieved: 0.00199574
dataset 94, input: 0.00197874, achieved: 0.00197874
dataset 95, input: 0.00200092, achieved: 0.00200092
dataset 96, input: 0.00197647, achieved: 0.00197647
dataset 97, input: 0.0020373, achieved: 0.0020373
dataset 98, input: 0.00200963, achieved: 0.00200963
dataset 99, input: 0.00200034, achieved: 0.00200034
dataset 100, input: 0.00201418, achieved: 0.00201418
dataset 101, input: 0.00206569, achieved: 0.00206569
dataset 102, input: 0.00199721, achieved: 0.00199721
dataset 103, input: 0.00203849, achieved: 0.00203849
dataset 104, input: 0.00200207, achieved: 0.00200207
dataset 105, input: 0.00201067, achieved: 0.00201067
dataset 106, input: 0.00202678, achieved: 0.00202678
dataset 107, input: 0.00199882, achieved: 0.00199882
dataset 108, input: 0.00205433, achieved: 0.00205433
dataset 109, input: 0.00197736, achieved: 0.00197736
dataset 110, input: 0.00203965, achieved: 0.00203965
dataset 111, input: 0.00201698, achieved: 0.00201698
dataset 112, input: 0.00200679, achieved: 0.00200679
dataset 113, input: 0.00204826, achieved: 0.00204826
dataset 114, input: 0.0020257, achieved: 0.0020257
dataset 115, input: 0.00202766, achieved: 0.00202766
dataset 116, input: 0.00199577, achieved: 0.00199577
dataset 117, input: 0.00204043, achieved: 0.00204043
dataset 118, input: 0.00200747, achieved: 0.00200747
dataset 119, input: 0.00206065, achieved: 0.00206065
dataset 120, input: 0.00200509, achieved: 0.00200509
dataset 121, input: 0.00204367, achieved: 0.00204367
dataset 122, input: 0.00199742, achieved: 0.00199742
dataset 123, input: 0.00204939, achieved: 0.00204939
dataset 124, input: 0.00203831, achieved: 0.00203831
dataset 125, input: 0.00203946, achieved: 0.00203946
dataset 126, input: 0.0020197, achieved: 0.0020197
dataset 127, input: 0.00203092, achieved: 0.00203092
dataset 128, input: 0.00198265, achieved: 0.00198265
dataset 129, input: 0.00201218, achieved: 0.00201218
dataset 130, input: 0.00200084, achieved: 0.00200084
dataset 131, input: 0.00196281, achieved: 0.00196281
dataset 132, input: 0.00201619, achieved: 0.00201619
dataset 133, input: 0.00200941, achieved: 0.00200941
dataset 134, input: 0.00202944, achieved: 0.00202944
dataset 135, input: 0.00205041, achieved: 0.00205041
dataset 136, input: 0.00198587, achieved: 0.00198587
dataset 137, input: 0.00199012, achieved: 0.00199012
dataset 138, input: 0.00199975, achieved: 0.00199975
dataset 139, input: 0.00198554, achieved: 0.00198554
dataset 140, input: 0.0020279, achieved: 0.0020279
dataset 141, input: 0.00195539, achieved: 0.00195539
dataset 142, input: 0.00203125, achieved: 0.00203125
dataset 143, input: 0.00197089, achieved: 0.00197089
dataset 144, input: 0.00200413, achieved: 0.00200413
dataset 145, input: 0.00202022, achieved: 0.00202022
dataset 146, input: 0.00205043, achieved: 0.00205043
dataset 147, input: 0.00200325, achieved: 0.00200325
dataset 148, input: 0.00201247, achieved: 0.00201247
dataset 149, input: 0.00200451, achieved: 0.00200451
dataset 150, input: 0.00200506, achieved: 0.00200506
dataset 151, input: 0.00198113, achieved: 0.00198113
dataset 152, input: 0.00198442, achieved: 0.00198442
dataset 153, input: 0.00201764, achieved: 0.00201764
dataset 154, input: 0.00199903, achieved: 0.00199903
dataset 155, input: 0.00199982, achieved: 0.00199982
dataset 156, input: 0.00202104, achieved: 0.00202104
dataset 157, input: 0.0020341, achieved: 0.0020341
dataset 158, input: 0.00201057, achieved: 0.00201057
dataset 159, input: 0.00196761, achieved: 0.00196761
dataset 160, input: 0.00197657, achieved: 0.00197657
dataset 161, input: 0.00198405, achieved: 0.00198405
dataset 162, input: 0.00199077, achieved: 0.00199077
dataset 163, input: 0.00200681, achieved: 0.00200681
dataset 164, input: 0.00203225, achieved: 0.00203225
dataset 165, input: 0.00200364, achieved: 0.00200364
dataset 166, input: 0.00201747, achieved: 0.00201747
dataset 167, input: 0.00197556, achieved: 0.00197556
dataset 168, input: 0.00200294, achieved: 0.00200294
dataset 169, input: 0.00201973, achieved: 0.00201973
dataset 170, input: 0.00197594, achieved: 0.00197594
dataset 171, input: 0.00203594, achieved: 0.00203594
dataset 172, input: 0.00197428, achieved: 0.00197428
dataset 173, input: 0.00201685, achieved: 0.00201685
dataset 174, input: 0.00197956, achieved: 0.00197956
dataset 175, input: 0.00198333, achieved: 0.00198333
dataset 176, input: 0.00200983, achieved: 0.00200983
dataset 177, input: 0.00196253, achieved: 0.00196253
dataset 178, input: 0.00204462, achieved: 0.00204462
dataset 179, input: 0.00201332, achieved: 0.00201332
dataset 180, input: 0.00199941, achieved: 0.00199941
dataset 181, input: 0.00201077, achieved: 0.00201077
dataset 182, input: 0.00198192, achieved: 0.00198192
dataset 183, input: 0.00200514, achieved: 0.00200514
dataset 184, input: 0.00197811, achieved: 0.00197811
dataset 185, input: 0.00198718, achieved: 0.00198718
dataset 186, input: 0.00198823, achieved: 0.00198823
dataset 187, input: 0.00201967, achieved: 0.00201967
dataset 188, input: 0.00201973, achieved: 0.00201973
dataset 189, input: 0.00197839, achieved: 0.00197839
dataset 190, input: 0.00202711, achieved: 0.00202711
dataset 191, input: 0.00198607, achieved: 0.00198607
dataset 192, input: 0.00200322, achieved: 0.00200322
dataset 193, input: 0.00195696, achieved: 0.00195696
dataset 194, input: 0.00201389, achieved: 0.00201389
dataset 195, input: 0.00197174, achieved: 0.00197174
dataset 196, input: 0.00197988, achieved: 0.00197988
dataset 197, input: 0.00198332, achieved: 0.00198332
dataset 198, input: 0.00193868, achieved: 0.00193868
dataset 199, input: 0.00200076, achieved: 0.00200076
dataset 200, input: 0.00196373, achieved: 0.00196373
dataset 201, input: 0.00199055, achieved: 0.00199055
dataset 202, input: 0.00197423, achieved: 0.00197423
dataset 203, input: 0.00198089, achieved: 0.00198089
dataset 204, input: 0.00196067, achieved: 0.00196067
dataset 205, input: 0.00200831, achieved: 0.00200831
dataset 206, input: 0.00197001, achieved: 0.00197001
dataset 207, input: 0.00203996, achieved: 0.00203996
dataset 208, input: 0.00198582, achieved: 0.00198582
dataset 209, input: 0.00203606, achieved: 0.00203606
dataset 210, input: 0.00202745, achieved: 0.00202745
dataset 211, input: 0.00199511, achieved: 0.00199511
dataset 212, input: 0.00201206, achieved: 0.00201206
dataset 213, input: 0.00202258, achieved: 0.00202258
dataset 214, input: 0.0019911, achieved: 0.0019911
dataset 215, input: 0.00203567, achieved: 0.00203567
dataset 216, input: 0.00197059, achieved: 0.00197059
dataset 217, input: 0.00199777, achieved: 0.00199777
dataset 218, input: 0.0020007, achieved: 0.0020007
dataset 219, input: 0.00199421, achieved: 0.00199421
dataset 220, input: 0.00201738, achieved: 0.00201738
dataset 221, input: 0.00197962, achieved: 0.00197962
dataset 222, input: 0.00196012, achieved: 0.00196012
dataset 223, input: 0.00201847, achieved: 0.00201847
dataset 224, input: 0.00200071, achieved: 0.00200071
dataset 225, input: 0.00199779, achieved: 0.00199779
dataset 226, input: 0.00194927, achieved: 0.00194927
dataset 227, input: 0.00203959, achieved: 0.00203959
dataset 228, input: 0.00195352, achieved: 0.00195352
dataset 229, input: 0.00201395, achieved: 0.00201395
dataset 230, input: 0.00197575, achieved: 0.00197575
dataset 231, input: 0.00198012, achieved: 0.00198012
dataset 232, input: 0.00202959, achieved: 0.00202959
dataset 233, input: 0.00198276, achieved: 0.00198276
dataset 234, input: 0.00202782, achieved: 0.00202782
dataset 235, input: 0.00201818, achieved: 0.00201818
dataset 236, input: 0.00198894, achieved: 0.00198894
dataset 237, input: 0.00202542, achieved: 0.00202542
dataset 238, input: 0.00201675, achieved: 0.00201675
dataset 239, input: 0.00198354, achieved: 0.00198354
dataset 240, input: 0.00204488, achieved: 0.00204488
dataset 241, input: 0.00195691, achieved: 0.00195691
dataset 242, input: 0.00203593, achieved: 0.00203593
dataset 243, input: 0.0019985, achieved: 0.0019985
dataset 244, input: 0.00200537, achieved: 0.00200537
dataset 245, input: 0.00198656, achieved: 0.00198656
dataset 246, input: 0.00198817, achieved: 0.00198817
dataset 247, input: 0.0019854, achieved: 0.0019854
dataset 248, input: 0.00200875, achieved: 0.00200875
dataset 249, input: 0.00199226, achieved: 0.00199226
dataset 250, input: 0.00200942, achieved: 0.00200942
dataset 251, input: 0.00194812, achieved: 0.00194812
dataset 252, input: 0.00199182, achieved: 0.00199182
dataset 253, input: 0.00198928, achieved: 0.00198928
dataset 254, input: 0.00194932, achieved: 0.00194932
dataset 255, input: 0.00198438, achieved: 0.00198438
dataset 256, input: 0.00193166, achieved: 0.00193166
dataset 257, input: 0.00203037, achieved: 0.00203037
dataset 258, input: 0.0019644, achieved: 0.0019644
dataset 259, input: 0.00196128, achieved: 0.00196128
dataset 260, input: 0.00195087, achieved: 0.00195087
dataset 261, input: 0.00199522, achieved: 0.00199522
dataset 262, input: 0.00194635, achieved: 0.00194635
dataset 263, input: 0.00200943, achieved: 0.00200943
dataset 264, input: 0.00198645, achieved: 0.00198645
dataset 265, input: 0.00197595, achieved: 0.00197595
dataset 266, input: 0.00200411, achieved: 0.00200411
dataset 267, input: 0.001968, achieved: 0.001968
dataset 268, input: 0.00201966, achieved: 0.00201966
dataset 269, input: 0.00197707, achieved: 0.00197707
dataset 270, input: 0.0019676, achieved: 0.0019676
dataset 271, input: 0.00200136, achieved: 0.00200136
dataset 272, input: 0.00199096, achieved: 0.00199096
dataset 273, input: 0.00199364, achieved: 0.00199364
dataset 274, input: 0.00199713, achieved: 0.00199713
dataset 275, input: 0.00199779, achieved: 0.00199779
dataset 276, input: 0.00199867, achieved: 0.00199867
dataset 277, input: 0.0019876, achieved: 0.0019876
dataset 278, input: 0.00200159, achieved: 0.00200159
dataset 279, input: 0.00198123, achieved: 0.00198123
dataset 280, input: 0.00200744, achieved: 0.00200744
dataset 281, input: 0.00200768, achieved: 0.00200768
dataset 282, input: 0.00200034, achieved: 0.00200034
dataset 283, input: 0.00200793, achieved: 0.00200793
dataset 284, input: 0.00198041, achieved: 0.00198041
dataset 285, input: 0.00199651, achieved: 0.00199651
dataset 286, input: 0.00198473, achieved: 0.00198473
dataset 287, input: 0.00198241, achieved: 0.00198241
dataset 288, input: 0.00197559, achieved: 0.00197559
dataset 289, input: 0.0019839, achieved: 0.0019839
dataset 290, input: 0.00202364, achieved: 0.00202364
dataset 291, input: 0.00195941, achieved: 0.00195941
dataset 292, input: 0.00201392, achieved: 0.00201392
dataset 293, input: 0.00198147, achieved: 0.00198147
dataset 294, input: 0.00198221, achieved: 0.00198221
dataset 295, input: 0.00196622, achieved: 0.00196622
dataset 296, input: 0.00198548, achieved: 0.00198548
dataset 297, input: 0.00201581, achieved: 0.00201581
dataset 298, input: 0.0019925, achieved: 0.0019925
dataset 299, input: 0.00201974, achieved: 0.00201974
dataset 300, input: 0.00198622, achieved: 0.00198622
dataset 301, input: 0.0019734, achieved: 0.0019734
dataset 302, input: 0.00205455, achieved: 0.00205455
dataset 303, input: 0.00199679, achieved: 0.00199679
dataset 304, input: 0.00200021, achieved: 0.00200021
dataset 305, input: 0.00198209, achieved: 0.00198209
dataset 306, input: 0.00199429, achieved: 0.00199429
dataset 307, input: 0.00199805, achieved: 0.00199805
dataset 308, input: 0.00198826, achieved: 0.00198826
dataset 309, input: 0.00205141, achieved: 0.00205141
dataset 310, input: 0.00198975, achieved: 0.00198975
dataset 311, input: 0.00199753, achieved: 0.00199753
dataset 312, input: 0.00200083, achieved: 0.00200083
dataset 313, input: 0.00197186, achieved: 0.00197186
dataset 314, input: 0.0019857, achieved: 0.0019857
dataset 315, input: 0.00199205, achieved: 0.00199205
dataset 316, input: 0.00197297, achieved: 0.00197297
dataset 317, input: 0.00202313, achieved: 0.00202313
dataset 318, input: 0.00197722, achieved: 0.00197722
dataset 319, input: 0.00199539, achieved: 0.00199539
dataset 320, input: 0.00197916, achieved: 0.00197916
dataset 321, input: 0.00199966, achieved: 0.00199966
dataset 322, input: 0.00199809, achieved: 0.00199809
dataset 323, input: 0.00198669, achieved: 0.00198669
dataset 324, input: 0.00196921, achieved: 0.00196921
dataset 325, input: 0.00198212, achieved: 0.00198212
dataset 326, input: 0.00198042, achieved: 0.00198042
dataset 327, input: 0.00197724, achieved: 0.00197724
dataset 328, input: 0.00199837, achieved: 0.00199837
dataset 329, input: 0.00197848, achieved: 0.00197848
dataset 330, input: 0.00201437, achieved: 0.00201437
dataset 331, input: 0.00197589, achieved: 0.00197589
dataset 332, input: 0.00198677, achieved: 0.00198677
dataset 333, input: 0.00200253, achieved: 0.00200253
dataset 334, input: 0.0019858, achieved: 0.0019858
dataset 335, input: 0.00203128, achieved: 0.00203128
dataset 336, input: 0.00198225, achieved: 0.00198225
dataset 337, input: 0.00202889, achieved: 0.00202889
dataset 338, input: 0.0019937, achieved: 0.0019937
dataset 339, input: 0.00204612, achieved: 0.00204612
dataset 340, input: 0.00198548, achieved: 0.00198548
dataset 341, input: 0.00202937, achieved: 0.00202937
dataset 342, input: 0.00202249, achieved: 0.00202249
dataset 343, input: 0.00204788, achieved: 0.00204788
dataset 344, input: 0.00201989, achieved: 0.00201989
dataset 345, input: 0.00201566, achieved: 0.00201566
dataset 346, input: 0.00198901, achieved: 0.00198901
dataset 347, input: 0.00203753, achieved: 0.00203753
dataset 348, input: 0.00201961, achieved: 0.00201961
dataset 349, input: 0.0020532, achieved: 0.0020532
dataset 350, input: 0.00200715, achieved: 0.00200715
dataset 351, input: 0.00203764, achieved: 0.00203764
dataset 352, input: 0.00202134, achieved: 0.00202134
dataset 353, input: 0.00201916, achieved: 0.00201916
dataset 354, input: 0.00202358, achieved: 0.00202358
dataset 355, input: 0.00199338, achieved: 0.00199338
dataset 356, input: 0.00198666, achieved: 0.00198666
dataset 357, input: 0.00201884, achieved: 0.00201884
dataset 358, input: 0.00201062, achieved: 0.00201062
dataset 359, input: 0.00193955, achieved: 0.00193955
dataset 360, input: 0.00201229, achieved: 0.00201229
dataset 361, input: 0.00197716, achieved: 0.00197716
dataset 362, input: 0.0019915, achieved: 0.0019915
dataset 363, input: 0.00195477, achieved: 0.00195477
dataset 364, input: 0.00196181, achieved: 0.00196181
dataset 365, input: 0.00197722, achieved: 0.00197722
dataset 366, input: 0.00195339, achieved: 0.00195339
dataset 367, input: 0.00199259, achieved: 0.00199259
dataset 368, input: 0.00202141, achieved: 0.00202141
dataset 369, input: 0.00201304, achieved: 0.00201304
dataset 370, input: 0.00197201, achieved: 0.00197201
dataset 371, input: 0.00196272, achieved: 0.00196272
dataset 372, input: 0.00199954, achieved: 0.00199954
dataset 373, input: 0.00197394, achieved: 0.00197394
dataset 374, input: 0.00197392, achieved: 0.00197392
dataset 375, input: 0.00199316, achieved: 0.00199316
dataset 376, input: 0.00197107, achieved: 0.00197107
dataset 377, input: 0.00195359, achieved: 0.00195359
dataset 378, input: 0.00197163, achieved: 0.00197163
dataset 379, input: 0.00199371, achieved: 0.00199371
dataset 380, input: 0.00195553, achieved: 0.00195553
dataset 381, input: 0.00197494, achieved: 0.00197494
dataset 382, input: 0.00196467, achieved: 0.00196467
dataset 383, input: 0.00197625, achieved: 0.00197625
dataset 384, input: 0.00197162, achieved: 0.00197162
dataset 385, input: 0.00198161, achieved: 0.00198161
dataset 386, input: 0.00197809, achieved: 0.00197809
dataset 387, input: 0.00197793, achieved: 0.00197793
dataset 388, input: 0.00196932, achieved: 0.00196932
dataset 389, input: 0.00196275, achieved: 0.00196275
dataset 390, input: 0.00207994, achieved: 0.00207994
dataset 391, input: 0.00196643, achieved: 0.00196643
dataset 392, input: 0.00199967, achieved: 0.00199967
dataset 393, input: 0.00196605, achieved: 0.00196605
dataset 394, input: 0.001958, achieved: 0.001958
dataset 395, input: 0.00200748, achieved: 0.00200748
dataset 396, input: 0.00195977, achieved: 0.00195977
dataset 397, input: 0.00199792, achieved: 0.00199792
dataset 398, input: 0.00195095, achieved: 0.00195095
dataset 399, input: 0.00199475, achieved: 0.00199475
dataset 400, input: 0.00198847, achieved: 0.00198847
dataset 401, input: 0.0020001, achieved: 0.0020001
dataset 402, input: 0.00194736, achieved: 0.00194736
dataset 403, input: 0.00197202, achieved: 0.00197202
dataset 404, input: 0.00194803, achieved: 0.00194803
dataset 405, input: 0.00197877, achieved: 0.00197877
dataset 406, input: 0.00194879, achieved: 0.00194879
dataset 407, input: 0.00197684, achieved: 0.00197684
dataset 408, input: 0.00195959, achieved: 0.00195959
dataset 409, input: 0.00196377, achieved: 0.00196377
dataset 410, input: 0.0019844, achieved: 0.0019844
dataset 411, input: 0.00196261, achieved: 0.00196261
dataset 412, input: 0.00205223, achieved: 0.00205223
dataset 413, input: 0.00197666, achieved: 0.00197666
dataset 414, input: 0.00194823, achieved: 0.00194823
dataset 415, input: 0.0020134, achieved: 0.0020134
dataset 416, input: 0.001984, achieved: 0.001984
dataset 417, input: 0.00197547, achieved: 0.00197547
dataset 418, input: 0.00198856, achieved: 0.00198856
dataset 419, input: 0.00200158, achieved: 0.00200158
dataset 420, input: 0.00198087, achieved: 0.00198087
dataset 421, input: 0.00196932, achieved: 0.00196932
dataset 422, input: 0.00200459, achieved: 0.00200459
dataset 423, input: 0.00201814, achieved: 0.00201814
dataset 424, input: 0.00198034, achieved: 0.00198034
dataset 425, input: 0.00200004, achieved: 0.00200004
dataset 426, input: 0.00199841, achieved: 0.00199841
dataset 427, input: 0.00197246, achieved: 0.00197246
dataset 428, input: 0.00200993, achieved: 0.00200993
dataset 429, input: 0.00196069, achieved: 0.00196069
dataset 430, input: 0.00199288, achieved: 0.00199288
dataset 431, input: 0.00196902, achieved: 0.00196902
dataset 432, input: 0.00200046, achieved: 0.00200046
dataset 433, input: 0.00196048, achieved: 0.00196048
dataset 434, input: 0.00202402, achieved: 0.00202402
dataset 435, input: 0.00198936, achieved: 0.00198936
dataset 436, input: 0.00199778, achieved: 0.00199778
dataset 437, input: 0.00196553, achieved: 0.00196553
dataset 438, input: 0.00197938, achieved: 0.00197938
dataset 439, input: 0.00198927, achieved: 0.00198927
dataset 440, input: 0.00197504, achieved: 0.00197504
dataset 441, input: 0.00197566, achieved: 0.00197566
dataset 442, input: 0.0019837, achieved: 0.0019837
dataset 443, input: 0.00197988, achieved: 0.00197988
dataset 444, input: 0.00200077, achieved: 0.00200077
dataset 445, input: 0.00202142, achieved: 0.00202142
dataset 446, input: 0.00206867, achieved: 0.00206867
dataset 447, input: 0.00201977, achieved: 0.00201977
dataset 448, input: 0.00201878, achieved: 0.00201878
dataset 449, input: 0.00200204, achieved: 0.00200204
dataset 450, input: 0.00200432, achieved: 0.00200432
dataset 451, input: 0.00200904, achieved: 0.00200904
dataset 452, input: 0.00199522, achieved: 0.00199522
dataset 453, input: 0.00202661, achieved: 0.00202661
dataset 454, input: 0.00199262, achieved: 0.00199262
dataset 455, input: 0.0019765, achieved: 0.0019765
dataset 456, input: 0.00195362, achieved: 0.00195362
dataset 457, input: 0.00198928, achieved: 0.00198928
dataset 458, input: 0.00200292, achieved: 0.00200292
dataset 459, input: 0.00197836, achieved: 0.00197836
dataset 460, input: 0.00199555, achieved: 0.00199555
dataset 461, input: 0.00201214, achieved: 0.00201214
dataset 462, input: 0.00198803, achieved: 0.00198803
dataset 463, input: 0.002008, achieved: 0.002008
dataset 464, input: 0.00199764, achieved: 0.00199764
dataset 465, input: 0.0020095, achieved: 0.0020095
dataset 466, input: 0.00198134, achieved: 0.00198134
dataset 467, input: 0.00200822, achieved: 0.00200822
dataset 468, input: 0.00199996, achieved: 0.00199996
dataset 469, input: 0.00200902, achieved: 0.00200902
dataset 470, input: 0.00198222, achieved: 0.00198222
dataset 471, input: 0.00201403, achieved: 0.00201403
dataset 472, input: 0.00198319, achieved: 0.00198319
dataset 473, input: 0.00199601, achieved: 0.00199601
dataset 474, input: 0.00199925, achieved: 0.00199925
dataset 475, input: 0.00197859, achieved: 0.00197859
dataset 476, input: 0.00203123, achieved: 0.00203123
dataset 477, input: 0.00195045, achieved: 0.00195045
dataset 478, input: 0.00197523, achieved: 0.00197523
dataset 479, input: 0.00201481, achieved: 0.00201481
dataset 480, input: 0.00200153, achieved: 0.00200153
dataset 481, input: 0.0019841, achieved: 0.0019841
dataset 482, input: 0.00198675, achieved: 0.00198675
dataset 483, input: 0.00199473, achieved: 0.00199473
dataset 484, input: 0.00199132, achieved: 0.00199132
dataset 485, input: 0.00205589, achieved: 0.00205589
dataset 486, input: 0.00198399, achieved: 0.00198399
dataset 487, input: 0.002051, achieved: 0.002051
dataset 488, input: 0.00200399, achieved: 0.00200399
dataset 489, input: 0.00197661, achieved: 0.00197661
dataset 490, input: 0.00205123, achieved: 0.00205123
dataset 491, input: 0.00198621, achieved: 0.00198621
dataset 492, input: 0.00198749, achieved: 0.00198749
dataset 493, input: 0.00198305, achieved: 0.00198305
dataset 494, input: 0.0020149, achieved: 0.0020149
dataset 495, input: 0.0019891, achieved: 0.0019891
dataset 496, input: 0.00198587, achieved: 0.00198587
dataset 497, input: 0.00200655, achieved: 0.00200655
dataset 498, input: 0.00201202, achieved: 0.00201202
dataset 499, input: 0.00204447, achieved: 0.00204447
[2025-03-12 09:19:59][I][data/gpt_dataset:191:megatron.data.gpt_dataset] [BuildConcatDataset] Caught args.shuffle_sample_in_corpus=True across 86572112 samples
> building indices for blendable datasets ...
> sample ratios:
dataset 0, input: 0.000965894, achieved: 0.000965894
dataset 1, input: 0.00373222, achieved: 0.00373222
dataset 2, input: 0.000860371, achieved: 0.000860371
dataset 3, input: 0.00369798, achieved: 0.00369798
dataset 4, input: 0.00355292, achieved: 0.00355292
dataset 5, input: 0.00366778, achieved: 0.00366778
dataset 6, input: 0.000820216, achieved: 0.000820216
dataset 7, input: 0.0020784, achieved: 0.0020784
dataset 8, input: 0.00199747, achieved: 0.00199747
dataset 9, input: 0.00204509, achieved: 0.00204509
dataset 10, input: 0.00190938, achieved: 0.00190938
dataset 11, input: 0.00214439, achieved: 0.00214439
dataset 12, input: 0.00208307, achieved: 0.00208307
dataset 13, input: 0.00187015, achieved: 0.00187015
dataset 14, input: 0.00189412, achieved: 0.00189412
dataset 15, input: 0.00198159, achieved: 0.00198159
dataset 16, input: 0.00213069, achieved: 0.00213069
dataset 17, input: 0.00200805, achieved: 0.00200805
dataset 18, input: 0.00178642, achieved: 0.00178642
dataset 19, input: 0.00208929, achieved: 0.00208929
dataset 20, input: 0.00212976, achieved: 0.00212976
dataset 21, input: 0.00217272, achieved: 0.00217272
dataset 22, input: 0.00201677, achieved: 0.00201677
dataset 23, input: 0.00174471, achieved: 0.00174471
dataset 24, input: 0.00196914, achieved: 0.00196914
dataset 25, input: 0.00208525, achieved: 0.00208525
dataset 26, input: 0.00189443, achieved: 0.00189443
dataset 27, input: 0.00176557, achieved: 0.00176557
dataset 28, input: 0.00187825, achieved: 0.00187825
dataset 29, input: 0.00185428, achieved: 0.00185428
dataset 30, input: 0.00203171, achieved: 0.00203171
dataset 31, input: 0.00190377, achieved: 0.00190377
dataset 32, input: 0.00224058, achieved: 0.00224058
dataset 33, input: 0.00181444, achieved: 0.00181444
dataset 34, input: 0.00197661, achieved: 0.00197661
dataset 35, input: 0.00211731, achieved: 0.00211731
dataset 36, input: 0.00201334, achieved: 0.00201334
dataset 37, input: 0.00173164, achieved: 0.00173164
dataset 38, input: 0.00196914, achieved: 0.00196914
dataset 39, input: 0.00205786, achieved: 0.00205786
dataset 40, input: 0.00186704, achieved: 0.00186704
dataset 41, input: 0.00192992, achieved: 0.00192992
dataset 42, input: 0.00195949, achieved: 0.00195949
dataset 43, input: 0.00208089, achieved: 0.00208089
dataset 44, input: 0.00189661, achieved: 0.00189661
dataset 45, input: 0.00201241, achieved: 0.00201241
dataset 46, input: 0.0019791, achieved: 0.0019791
dataset 47, input: 0.00189848, achieved: 0.00189848
dataset 48, input: 0.00189692, achieved: 0.00189692
dataset 49, input: 0.00185366, achieved: 0.00185366
dataset 50, input: 0.00199965, achieved: 0.00199965
dataset 51, input: 0.00194206, achieved: 0.00194206
dataset 52, input: 0.00851593, achieved: 0.00851593
dataset 53, input: 0.00882534, achieved: 0.00882534
dataset 54, input: 0.00817197, achieved: 0.00817197
dataset 55, input: 0.00866908, achieved: 0.00866908
dataset 56, input: 0.00837492, achieved: 0.00837492
dataset 57, input: 0.00773774, achieved: 0.00773774
dataset 58, input: 0.00822551, achieved: 0.00822551
dataset 59, input: 0.00744233, achieved: 0.00744233
dataset 60, input: 0.00692406, achieved: 0.00692406
dataset 61, input: 0.00874627, achieved: 0.00874627
dataset 62, input: 0.00748716, achieved: 0.00748716
dataset 63, input: 0.00874192, achieved: 0.00874192
dataset 64, input: 0.00939404, achieved: 0.00939404
dataset 65, input: 0.00797026, achieved: 0.00797026
dataset 66, input: 0.00762287, achieved: 0.00762287
dataset 67, input: 0.00896074, achieved: 0.00896074
dataset 68, input: 0.00764404, achieved: 0.00764404
dataset 69, input: 0.00732311, achieved: 0.00732311
dataset 70, input: 0.00830239, achieved: 0.00830239
dataset 71, input: 0.0076926, achieved: 0.0076926
dataset 72, input: 0.00858846, achieved: 0.00858846
dataset 73, input: 0.00768171, achieved: 0.00768171
dataset 74, input: 0.00854861, achieved: 0.00854861
dataset 75, input: 0.00920759, achieved: 0.00920759
dataset 76, input: 0.00845616, achieved: 0.00845616
dataset 77, input: 0.00914066, achieved: 0.00914066
dataset 78, input: 0.00782427, achieved: 0.00782427
dataset 79, input: 0.00842784, achieved: 0.00842784
dataset 80, input: 0.00789555, achieved: 0.00789555
dataset 81, input: 0.00868277, achieved: 0.00868277
dataset 82, input: 0.00796248, achieved: 0.00796248
dataset 83, input: 0.0074165, achieved: 0.0074165
dataset 84, input: 0.00838457, achieved: 0.00838457
dataset 85, input: 0.00816481, achieved: 0.00816481
dataset 86, input: 0.00740965, achieved: 0.00740965
dataset 87, input: 0.00783952, achieved: 0.00783952
dataset 88, input: 0.00873662, achieved: 0.00873662
dataset 89, input: 0.00877553, achieved: 0.00877553
dataset 90, input: 0.00749525, achieved: 0.00749525
dataset 91, input: 0.00822271, achieved: 0.00822271
dataset 92, input: 0.00737634, achieved: 0.00737634
dataset 93, input: 0.00892339, achieved: 0.00892339
dataset 94, input: 0.00788061, achieved: 0.00788061
dataset 95, input: 0.00921288, achieved: 0.00921288
dataset 96, input: 0.00833383, achieved: 0.00833383
dataset 97, input: 0.00815983, achieved: 0.00815983
dataset 98, input: 0.00785197, achieved: 0.00785197
dataset 99, input: 0.0087609, achieved: 0.0087609
dataset 100, input: 0.00836122, achieved: 0.00836122
dataset 101, input: 0.00835687, achieved: 0.00835687
dataset 102, input: 0.0075326, achieved: 0.0075326
dataset 103, input: 0.00779688, achieved: 0.00779688
dataset 104, input: 0.00841694, achieved: 0.00841694
dataset 105, input: 0.00863453, achieved: 0.00863453
dataset 106, input: 0.00874876, achieved: 0.00874876
dataset 107, input: 0.00861274, achieved: 0.00861274
dataset 108, input: 0.00857383, achieved: 0.00857383
dataset 109, input: 0.00764248, achieved: 0.00764248
dataset 110, input: 0.0085483, achieved: 0.0085483
dataset 111, input: 0.00844433, achieved: 0.00844433
dataset 112, input: 0.00797898, achieved: 0.00797898
dataset 113, input: 0.00808792, achieved: 0.00808792
dataset 114, input: 0.00336927, achieved: 0.00336927
dataset 115, input: 0.00219482, achieved: 0.00219482
dataset 116, input: 0.0054237, achieved: 0.0054237
dataset 117, input: 0.00465049, achieved: 0.00465049
dataset 118, input: 0.00344304, achieved: 0.00344304
dataset 119, input: 0.00410606, achieved: 0.00410606
dataset 120, input: 0.00387167, achieved: 0.00387167
dataset 121, input: 0.0040575, achieved: 0.0040575
dataset 122, input: 0.00389159, achieved: 0.00389159
dataset 123, input: 0.00385704, achieved: 0.00385704
dataset 124, input: 0.00394544, achieved: 0.00394544
dataset 125, input: 0.00367027, achieved: 0.00367027
dataset 126, input: 0.00365564, achieved: 0.00365564
dataset 127, input: 0.00382467, achieved: 0.00382467
dataset 128, input: 0.00407805, achieved: 0.00407805
dataset 129, input: 0.00326966, achieved: 0.00326966
dataset 130, input: 0.00399805, achieved: 0.00399805
dataset 131, input: 0.00235014, achieved: 0.00235014
dataset 132, input: 0.00196634, achieved: 0.00196634
dataset 133, input: 0.00344491, achieved: 0.00344491
dataset 134, input: 0.00379323, achieved: 0.00379323
dataset 135, input: 0.00142129, achieved: 0.00142129
dataset 136, input: 0.00332942, achieved: 0.00332942
dataset 137, input: 0.00301721, achieved: 0.00301721
dataset 138, input: 0.00423742, achieved: 0.00423742
dataset 139, input: 0.00424863, achieved: 0.00424863
dataset 140, input: 0.0063538, achieved: 0.0063538
dataset 141, input: 0.00418762, achieved: 0.00418762
dataset 142, input: 0.00323448, achieved: 0.00323448
dataset 143, input: 0.00261722, achieved: 0.00261722
dataset 144, input: 0.00267201, achieved: 0.00267201
dataset 145, input: 0.00267325, achieved: 0.00267325
dataset 146, input: 0.00254065, achieved: 0.00254065
dataset 147, input: 0.00254781, achieved: 0.00254781
dataset 148, input: 0.00237069, achieved: 0.00237069
dataset 149, input: 0.00252415, achieved: 0.00252415
dataset 150, input: 0.00246252, achieved: 0.00246252
dataset 151, input: 0.00254189, achieved: 0.00254189
dataset 152, input: 0.00262718, achieved: 0.00262718
dataset 153, input: 0.00240275, achieved: 0.00240275
dataset 154, input: 0.0023212, achieved: 0.0023212
dataset 155, input: 0.00254781, achieved: 0.00254781
dataset 156, input: 0.00243761, achieved: 0.00243761
dataset 157, input: 0.00256991, achieved: 0.00256991
dataset 158, input: 0.00254189, achieved: 0.00254189
dataset 159, input: 0.00261255, achieved: 0.00261255
dataset 160, input: 0.00240337, achieved: 0.00240337
dataset 161, input: 0.00259263, achieved: 0.00259263
dataset 162, input: 0.00253722, achieved: 0.00253722
dataset 163, input: 0.00246096, achieved: 0.00246096
dataset 164, input: 0.00225925, achieved: 0.00225925
dataset 165, input: 0.00250423, achieved: 0.00250423
dataset 166, input: 0.00202268, achieved: 0.00202268
dataset 167, input: 0.0013111, achieved: 0.0013111
dataset 168, input: 0.00125476, achieved: 0.00125476
dataset 169, input: 0.00145896, achieved: 0.00145896
dataset 170, input: 0.00140137, achieved: 0.00140137
dataset 171, input: 0.00116605, achieved: 0.00116605
dataset 172, input: 0.00132511, achieved: 0.00132511
dataset 173, input: 0.0013939, achieved: 0.0013939
dataset 174, input: 0.00127468, achieved: 0.00127468
dataset 175, input: 0.00137367, achieved: 0.00137367
dataset 176, input: 0.00133943, achieved: 0.00133943
dataset 177, input: 0.00121803, achieved: 0.00121803
dataset 178, input: 0.00137585, achieved: 0.00137585
dataset 179, input: 0.00133289, achieved: 0.00133289
dataset 180, input: 0.00185677, achieved: 0.00185677
dataset 181, input: 0.000504892, achieved: 0.000504892
dataset 182, input: 0.00314577, achieved: 0.00314577
dataset 183, input: 0.0028927, achieved: 0.0028927
dataset 184, input: 0.00265364, achieved: 0.00265364
dataset 185, input: 0.00227855, achieved: 0.00227855
dataset 186, input: 0.0034972, achieved: 0.0034972
dataset 187, input: 0.00437127, achieved: 0.00437127
dataset 188, input: 0.00135157, achieved: 0.00135157
dataset 189, input: 0.00277659, achieved: 0.00277659
dataset 190, input: 0.00277286, achieved: 0.00277286
dataset 191, input: 0.00273302, achieved: 0.00273302
dataset 192, input: 0.00270251, achieved: 0.00270251
dataset 193, input: 0.00280056, achieved: 0.00280056
dataset 194, input: 0.00287091, achieved: 0.00287091
dataset 195, input: 0.00263029, achieved: 0.00263029
dataset 196, input: 0.00314857, achieved: 0.00314857
dataset 197, input: 0.0029478, achieved: 0.0029478
dataset 198, input: 0.00308383, achieved: 0.00308383
dataset 199, input: 0.000790645, achieved: 0.000790645
dataset 200, input: 0.00204976, achieved: 0.00204976
dataset 201, input: 0.00168214, achieved: 0.00168214
dataset 202, input: 0.00171887, achieved: 0.00171887
dataset 203, input: 0.00195576, achieved: 0.00195576
dataset 204, input: 0.00199218, achieved: 0.00199218
dataset 205, input: 0.00205132, achieved: 0.00205132
dataset 206, input: 0.0020177, achieved: 0.0020177
dataset 207, input: 0.00205474, achieved: 0.00205474
dataset 208, input: 0.00187638, achieved: 0.00187638
dataset 209, input: 0.00189599, achieved: 0.00189599
dataset 210, input: 0.00211046, achieved: 0.00211046
dataset 211, input: 0.0021226, achieved: 0.0021226
dataset 212, input: 0.00188012, achieved: 0.00188012
dataset 213, input: 0.0020314, achieved: 0.0020314
dataset 214, input: 0.00168245, achieved: 0.00168245
dataset 215, input: 0.00231155, achieved: 0.00231155
dataset 216, input: 0.00169179, achieved: 0.00169179
dataset 217, input: 0.00196758, achieved: 0.00196758
dataset 218, input: 0.00146954, achieved: 0.00146954
dataset 219, input: 0.00189194, achieved: 0.00189194
dataset 220, input: 0.00179763, achieved: 0.00179763
dataset 221, input: 0.00196634, achieved: 0.00196634
dataset 222, input: 0.00183467, achieved: 0.00183467
dataset 223, input: 0.00213443, achieved: 0.00213443
dataset 224, input: 0.00187638, achieved: 0.00187638
dataset 225, input: 0.00193397, achieved: 0.00193397
dataset 226, input: 0.00237038, achieved: 0.00237038
dataset 227, input: 0.00170642, achieved: 0.00170642
dataset 228, input: 0.00174658, achieved: 0.00174658
dataset 229, input: 0.000758583, achieved: 0.000758583
dataset 230, input: 0.00167965, achieved: 0.00167965
dataset 231, input: 0.00197941, achieved: 0.00197941
dataset 232, input: 0.00421439, achieved: 0.00421439
dataset 233, input: 0.00431835, achieved: 0.00431835
dataset 234, input: 0.0035579, achieved: 0.0035579
dataset 235, input: 0.00370171, achieved: 0.00370171
dataset 236, input: 0.00189942, achieved: 0.00189942
dataset 237, input: 0.00250329, achieved: 0.00250329
dataset 238, input: 0.00499974, achieved: 0.00499974
dataset 239, input: 0.00092605, achieved: 0.00092605
dataset 240, input: 0.0019044, achieved: 0.0019044
dataset 241, input: 0.0019458, achieved: 0.0019458
dataset 242, input: 0.00158627, achieved: 0.00158627
dataset 243, input: 0.00399711, achieved: 0.00399711
dataset 244, input: 0.00557654, achieved: 0.00557654
dataset 245, input: 0.00230159, achieved: 0.00230159
dataset 246, input: 0.0017058, achieved: 0.0017058
dataset 247, input: 0.00237505, achieved: 0.00237505
dataset 248, input: 0.00192214, achieved: 0.00192214
dataset 249, input: 0.00204696, achieved: 0.00204696
dataset 250, input: 0.00197755, achieved: 0.00197755
dataset 251, input: 0.00167592, achieved: 0.00167592
dataset 252, input: 0.00195358, achieved: 0.00195358
dataset 253, input: 0.00214626, achieved: 0.00214626
dataset 254, input: 0.00203856, achieved: 0.00203856
dataset 255, input: 0.000746443, achieved: 0.000746443
dataset 256, input: 0.00500628, achieved: 0.00500628
dataset 257, input: 0.00535522, achieved: 0.00535522
dataset 258, input: 0.00502215, achieved: 0.00502215
dataset 259, input: 0.00479181, achieved: 0.00479181
dataset 260, input: 0.00486682, achieved: 0.00486682
dataset 261, input: 0.00111095, achieved: 0.00111095
[2025-03-12 09:20:01][I][data/gpt_dataset:191:megatron.data.gpt_dataset] [BuildConcatDataset] Caught args.shuffle_sample_in_corpus=True across 3212568 samples
> building indices for blendable datasets ...
> sample ratios:
dataset 0, input: 0.0835781, achieved: 0.0835781
dataset 1, input: 0.0834322, achieved: 0.0834322
dataset 2, input: 0.0510322, achieved: 0.0510322
dataset 3, input: 0.104354, achieved: 0.104354
dataset 4, input: 0.0513543, achieved: 0.0513543
dataset 5, input: 0.00400847, achieved: 0.00400847
dataset 6, input: 0.115667, achieved: 0.115667
dataset 7, input: 0.0827875, achieved: 0.0827875
dataset 8, input: 0.103788, achieved: 0.103788
dataset 9, input: 0.11266, achieved: 0.11266
dataset 10, input: 0.0508509, achieved: 0.0508509
dataset 11, input: 0.0513192, achieved: 0.0513192
dataset 12, input: 0.105168, achieved: 0.105168
[2025-03-12 09:20:01][I][data/gpt_dataset:191:megatron.data.gpt_dataset] [BuildConcatDataset] Caught args.shuffle_sample_in_corpus=True across 8520716 samples
> building indices for blendable datasets ...
> sample ratios:
dataset 0, input: 0.0182376, achieved: 0.0182376
dataset 1, input: 0.0182962, achieved: 0.0182962
dataset 2, input: 0.018299, achieved: 0.018299
dataset 3, input: 0.0182779, achieved: 0.0182779
dataset 4, input: 0.0182861, achieved: 0.0182861
dataset 5, input: 0.0181745, achieved: 0.0181745
dataset 6, input: 0.0183693, achieved: 0.0183693
dataset 7, input: 0.0220028, achieved: 0.0220028
dataset 8, input: 0.0486005, achieved: 0.0486005
dataset 9, input: 0.0484891, achieved: 0.0484891
dataset 10, input: 0.0512474, achieved: 0.0512474
dataset 11, input: 0.0512, achieved: 0.0512
dataset 12, input: 0.0512732, achieved: 0.0512732
dataset 13, input: 0.0485441, achieved: 0.0485441
dataset 14, input: 0.0485733, achieved: 0.0485733
dataset 15, input: 0.0511485, achieved: 0.0511485
dataset 16, input: 0.0485108, achieved: 0.0485108
dataset 17, input: 0.0485108, achieved: 0.0485108
dataset 18, input: 0.0487117, achieved: 0.0487117
dataset 19, input: 0.0511297, achieved: 0.0511297
dataset 20, input: 0.0487391, achieved: 0.0487391
dataset 21, input: 0.0512226, achieved: 0.0512226
dataset 22, input: 0.0486001, achieved: 0.0486001
dataset 23, input: 0.0487371, achieved: 0.0487371
dataset 24, input: 0.0511531, achieved: 0.0511531
dataset 25, input: 0.00566538, achieved: 0.00566538
[2025-03-12 09:20:04][I][data/gpt_dataset:191:megatron.data.gpt_dataset] [BuildConcatDataset] Caught args.shuffle_sample_in_corpus=True across 33633053 samples
> building indices for blendable datasets ...
> sample ratios:
dataset 0, input: 0.0130268, achieved: 0.0130268
dataset 1, input: 0.0134792, achieved: 0.0134792
dataset 2, input: 0.0136289, achieved: 0.0136289
dataset 3, input: 0.013172, achieved: 0.013172
dataset 4, input: 0.0132409, achieved: 0.0132409
dataset 5, input: 0.0133521, achieved: 0.0133521
dataset 6, input: 0.0134121, achieved: 0.0134121
dataset 7, input: 0.0128782, achieved: 0.0128782
dataset 8, input: 0.0128348, achieved: 0.0128348
dataset 9, input: 0.013055, achieved: 0.013055
dataset 10, input: 0.0128415, achieved: 0.0128415
dataset 11, input: 0.0128921, achieved: 0.0128921
dataset 12, input: 0.0128222, achieved: 0.0128222
dataset 13, input: 0.0128935, achieved: 0.0128935
dataset 14, input: 0.0131081, achieved: 0.0131081
dataset 15, input: 0.0129717, achieved: 0.0129717
dataset 16, input: 0.013004, achieved: 0.013004
dataset 17, input: 0.0128759, achieved: 0.0128759
dataset 18, input: 0.0129287, achieved: 0.0129287
dataset 19, input: 0.0130294, achieved: 0.0130294
dataset 20, input: 0.0128648, achieved: 0.0128648
dataset 21, input: 0.0131353, achieved: 0.0131353
dataset 22, input: 0.0129143, achieved: 0.0129143
dataset 23, input: 0.0129005, achieved: 0.0129005
dataset 24, input: 0.0132579, achieved: 0.0132579
dataset 25, input: 0.0129311, achieved: 0.0129311
dataset 26, input: 0.0132489, achieved: 0.0132489
dataset 27, input: 0.0131445, achieved: 0.0131445
dataset 28, input: 0.0131264, achieved: 0.0131264
dataset 29, input: 0.0128914, achieved: 0.0128914
dataset 30, input: 0.0129347, achieved: 0.0129347
dataset 31, input: 0.0132695, achieved: 0.0132695
dataset 32, input: 0.0129615, achieved: 0.0129615
dataset 33, input: 0.0129188, achieved: 0.0129188
dataset 34, input: 0.0128966, achieved: 0.0128966
dataset 35, input: 0.0128921, achieved: 0.0128921
dataset 36, input: 0.0131809, achieved: 0.0131809
dataset 37, input: 0.0130498, achieved: 0.0130498
dataset 38, input: 0.0129444, achieved: 0.0129444
dataset 39, input: 0.0130167, achieved: 0.0130167
dataset 40, input: 0.0127474, achieved: 0.0127474
dataset 41, input: 0.0127562, achieved: 0.0127562
dataset 42, input: 0.01274, achieved: 0.01274
dataset 43, input: 0.012751, achieved: 0.012751
dataset 44, input: 0.012733, achieved: 0.012733
dataset 45, input: 0.012737, achieved: 0.012737
dataset 46, input: 0.0127356, achieved: 0.0127356
dataset 47, input: 0.0127288, achieved: 0.0127288
dataset 48, input: 0.0127203, achieved: 0.0127203
dataset 49, input: 0.0127124, achieved: 0.0127124
dataset 50, input: 0.0126972, achieved: 0.0126972
dataset 51, input: 0.0127032, achieved: 0.0127032
dataset 52, input: 0.0126907, achieved: 0.0126907
dataset 53, input: 0.0126746, achieved: 0.0126746
dataset 54, input: 0.0126725, achieved: 0.0126725
dataset 55, input: 0.0126815, achieved: 0.0126815
dataset 56, input: 0.0126769, achieved: 0.0126769
dataset 57, input: 0.0126962, achieved: 0.0126962
dataset 58, input: 0.0126882, achieved: 0.0126882
dataset 59, input: 0.012665, achieved: 0.012665
dataset 60, input: 0.0126732, achieved: 0.0126732
dataset 61, input: 0.0126632, achieved: 0.0126632
dataset 62, input: 0.0126542, achieved: 0.0126542
dataset 63, input: 0.0126673, achieved: 0.0126673
dataset 64, input: 0.0126969, achieved: 0.0126969
dataset 65, input: 0.0138538, achieved: 0.0138538
dataset 66, input: 0.0124739, achieved: 0.0124739
dataset 67, input: 0.0124994, achieved: 0.0124994
dataset 68, input: 0.0123766, achieved: 0.0123766
dataset 69, input: 0.0124441, achieved: 0.0124441
dataset 70, input: 0.0122456, achieved: 0.0122456
dataset 71, input: 0.0124694, achieved: 0.0124694
dataset 72, input: 0.0121932, achieved: 0.0121932
dataset 73, input: 0.0122485, achieved: 0.0122485
dataset 74, input: 0.0117788, achieved: 0.0117788
dataset 75, input: 0.0133205, achieved: 0.0133205
dataset 76, input: 0.0131683, achieved: 0.0131683
dataset 77, input: 0.00943809, achieved: 0.00943809
[2025-03-12 09:20:07][I][data/gpt_dataset:191:megatron.data.gpt_dataset] [BuildConcatDataset] Caught args.shuffle_sample_in_corpus=True across 21697919 samples
> building indices for blendable datasets ...
> sample ratios:
dataset 0, input: 0.0380461, achieved: 0.0380461
dataset 1, input: 0.0413854, achieved: 0.0413854
dataset 2, input: 0.0406095, achieved: 0.0406095
dataset 3, input: 0.0365558, achieved: 0.0365558
dataset 4, input: 0.0341426, achieved: 0.0341426
dataset 5, input: 0.0350147, achieved: 0.0350147
dataset 6, input: 0.0358745, achieved: 0.0358745
dataset 7, input: 0.036827, achieved: 0.036827
dataset 8, input: 0.0375283, achieved: 0.0375283
dataset 9, input: 0.0379557, achieved: 0.0379557
dataset 10, input: 0.0381706, achieved: 0.0381706
dataset 11, input: 0.0385559, achieved: 0.0385559
dataset 12, input: 0.0388884, achieved: 0.0388884
dataset 13, input: 0.0391665, achieved: 0.0391665
dataset 14, input: 0.0393856, achieved: 0.0393856
dataset 15, input: 0.0397974, achieved: 0.0397974
dataset 16, input: 0.0400668, achieved: 0.0400668
dataset 17, input: 0.0403879, achieved: 0.0403879
dataset 18, input: 0.0408309, achieved: 0.0408309
dataset 19, input: 0.0411837, achieved: 0.0411837
dataset 20, input: 0.0418468, achieved: 0.0418468
dataset 21, input: 0.0425558, achieved: 0.0425558
dataset 22, input: 0.0428142, achieved: 0.0428142
dataset 23, input: 0.0425711, achieved: 0.0425711
dataset 24, input: 0.0388549, achieved: 0.0388549
dataset 25, input: 0.0209839, achieved: 0.0209839
[2025-03-12 09:20:08][I][data/gpt_dataset:191:megatron.data.gpt_dataset] [BuildConcatDataset] Caught args.shuffle_sample_in_corpus=True across 12890828 samples
> building indices for blendable datasets ...
> sample ratios:
dataset 0, input: 0.0235833, achieved: 0.0235833
dataset 1, input: 0.0216057, achieved: 0.0216057
dataset 2, input: 0.027075, achieved: 0.027075
dataset 3, input: 0.0271066, achieved: 0.0271066
dataset 4, input: 0.0274384, achieved: 0.0274384
dataset 5, input: 0.0257867, achieved: 0.0257867
dataset 6, input: 0.0255337, achieved: 0.0255337
dataset 7, input: 0.027977, achieved: 0.027977
dataset 8, input: 0.0270093, achieved: 0.0270093
dataset 9, input: 0.0285904, achieved: 0.0285904
dataset 10, input: 0.0283677, achieved: 0.0283677
dataset 11, input: 0.0153407, achieved: 0.0153407
dataset 12, input: 0.014138, achieved: 0.014138
dataset 13, input: 0.0141798, achieved: 0.0141798
dataset 14, input: 0.0141663, achieved: 0.0141663
dataset 15, input: 0.0150215, achieved: 0.0150215
dataset 16, input: 0.0280528, achieved: 0.0280528
dataset 17, input: 0.0234551, achieved: 0.0234551
dataset 18, input: 0.0247761, achieved: 0.0247761
dataset 19, input: 0.0205734, achieved: 0.0205734
dataset 20, input: 0.0205842, achieved: 0.0205842
dataset 21, input: 0.020579, achieved: 0.020579
dataset 22, input: 0.0205939, achieved: 0.0205939
dataset 23, input: 0.0203349, achieved: 0.0203349
dataset 24, input: 0.0199823, achieved: 0.0199823
dataset 25, input: 0.0199573, achieved: 0.0199573
dataset 26, input: 0.0199854, achieved: 0.0199854
dataset 27, input: 0.0168266, achieved: 0.0168266
dataset 28, input: 0.0172125, achieved: 0.0172125
dataset 29, input: 0.018342, achieved: 0.018342
dataset 30, input: 0.0149189, achieved: 0.0149189
dataset 31, input: 0.0149787, achieved: 0.0149787
dataset 32, input: 0.0149735, achieved: 0.0149735
dataset 33, input: 0.0149414, achieved: 0.0149414
dataset 34, input: 0.0149689, achieved: 0.0149689
dataset 35, input: 0.0149673, achieved: 0.0149673
dataset 36, input: 0.0230039, achieved: 0.0230039
dataset 37, input: 0.0215731, achieved: 0.0215731
dataset 38, input: 0.0215682, achieved: 0.0215682
dataset 39, input: 0.0211097, achieved: 0.0211097
dataset 40, input: 0.0190818, achieved: 0.0190818
dataset 41, input: 0.0191069, achieved: 0.0191069
dataset 42, input: 0.0189985, achieved: 0.0189985
dataset 43, input: 0.0186603, achieved: 0.0186603
dataset 44, input: 0.0219256, achieved: 0.0219256
dataset 45, input: 0.0310232, achieved: 0.0310232
dataset 46, input: 0.0189782, achieved: 0.0189782
dataset 47, input: 0.0184155, achieved: 0.0184155
dataset 48, input: 0.00263107, achieved: 0.00263107
[2025-03-12 09:20:18][I][data/gpt_dataset:191:megatron.data.gpt_dataset] [BuildConcatDataset] Caught args.shuffle_sample_in_corpus=True across 93109107 samples
> building indices for blendable datasets ...
> sample ratios:
dataset 0, input: 0.0180889, achieved: 0.0180889
dataset 1, input: 0.0157124, achieved: 0.0157124
dataset 2, input: 0.0156303, achieved: 0.0156303
dataset 3, input: 0.0150717, achieved: 0.0150717
dataset 4, input: 0.0142971, achieved: 0.0142971
dataset 5, input: 0.0129064, achieved: 0.0129064
dataset 6, input: 0.0162971, achieved: 0.0162971
dataset 7, input: 0.0174739, achieved: 0.0174739
dataset 8, input: 0.0174769, achieved: 0.0174769
dataset 9, input: 0.0148979, achieved: 0.0148979
dataset 10, input: 0.0158722, achieved: 0.0158722
dataset 11, input: 0.0164827, achieved: 0.0164827
dataset 12, input: 0.0147446, achieved: 0.0147446
dataset 13, input: 0.0160169, achieved: 0.0160169
dataset 14, input: 0.016473, achieved: 0.016473
dataset 15, input: 0.0169843, achieved: 0.0169843
dataset 16, input: 0.0147628, achieved: 0.0147628
dataset 17, input: 0.0152291, achieved: 0.0152291
dataset 18, input: 0.0156107, achieved: 0.0156107
dataset 19, input: 0.0155986, achieved: 0.0155986
dataset 20, input: 0.0157209, achieved: 0.0157209
dataset 21, input: 0.0135195, achieved: 0.0135195
dataset 22, input: 0.0106685, achieved: 0.0106685
dataset 23, input: 0.012068, achieved: 0.012068
dataset 24, input: 0.0143946, achieved: 0.0143946
dataset 25, input: 0.0133679, achieved: 0.0133679
dataset 26, input: 0.0116764, achieved: 0.0116764
dataset 27, input: 0.0121379, achieved: 0.0121379
dataset 28, input: 0.0188352, achieved: 0.0188352
dataset 29, input: 0.0185594, achieved: 0.0185594
dataset 30, input: 0.0184982, achieved: 0.0184982
dataset 31, input: 0.0163945, achieved: 0.0163945
dataset 32, input: 0.0160773, achieved: 0.0160773
dataset 33, input: 0.0161072, achieved: 0.0161072
dataset 34, input: 0.0160858, achieved: 0.0160858
dataset 35, input: 0.0160155, achieved: 0.0160155
dataset 36, input: 0.0157348, achieved: 0.0157348
dataset 37, input: 0.0157353, achieved: 0.0157353
dataset 38, input: 0.0155213, achieved: 0.0155213
dataset 39, input: 0.0154415, achieved: 0.0154415
dataset 40, input: 0.0172908, achieved: 0.0172908
dataset 41, input: 0.0117513, achieved: 0.0117513
dataset 42, input: 0.0169456, achieved: 0.0169456
dataset 43, input: 0.0181807, achieved: 0.0181807
dataset 44, input: 0.0184588, achieved: 0.0184588
dataset 45, input: 0.0184332, achieved: 0.0184332
dataset 46, input: 0.020941, achieved: 0.020941
dataset 47, input: 0.0209187, achieved: 0.0209187
dataset 48, input: 0.0186022, achieved: 0.0186022
dataset 49, input: 0.01186, achieved: 0.01186
dataset 50, input: 0.0118779, achieved: 0.0118779
dataset 51, input: 0.0118403, achieved: 0.0118403
dataset 52, input: 0.0118559, achieved: 0.0118559
dataset 53, input: 0.0118533, achieved: 0.0118533
dataset 54, input: 0.0118465, achieved: 0.0118465
dataset 55, input: 0.011829, achieved: 0.011829
dataset 56, input: 0.0118282, achieved: 0.0118282
dataset 57, input: 0.0155565, achieved: 0.0155565
dataset 58, input: 0.0138154, achieved: 0.0138154
dataset 59, input: 0.0173808, achieved: 0.0173808
dataset 60, input: 0.0151028, achieved: 0.0151028
dataset 61, input: 0.0143709, achieved: 0.0143709
dataset 62, input: 0.0144483, achieved: 0.0144483
dataset 63, input: 0.0170886, achieved: 0.0170886
dataset 64, input: 0.0156728, achieved: 0.0156728
dataset 65, input: 0.00206319, achieved: 0.00206319
[2025-03-12 09:20:19][I][data/gpt_dataset:191:megatron.data.gpt_dataset] [BuildConcatDataset] Caught args.shuffle_sample_in_corpus=True across 8932775 samples
> building indices for blendable datasets ...
> sample ratios:
dataset 0, input: 0.658845, achieved: 0.658845
dataset 1, input: 0.341155, achieved: 0.341155
[2025-03-12 09:20:19][I][data/gpt_dataset:191:megatron.data.gpt_dataset] [BuildConcatDataset] Caught args.shuffle_sample_in_corpus=True across 2642687 samples
[2025-03-12 09:20:19][W][utils/_logger:68:megatron.data.gpt_dataset] > WARNING: could not find index map files, building on rank 0
using:
number of documents: 775
number of epochs: 59
sequence length: 4096
total number of samples: 183996
[2025-03-12 09:20:19][W][utils/_logger:68:megatron.data.gpt_dataset] > WARNING: could not find index map files, building on rank 0
using:
number of documents: 163
number of epochs: 43
sequence length: 4096
total number of samples: 31441
[2025-03-12 09:20:19][W][utils/_logger:68:megatron.data.gpt_dataset] > WARNING: could not find index map files, building on rank 0
using:
number of documents: 140
number of epochs: 37
sequence length: 4096
total number of samples: 28003
[2025-03-12 09:20:19][W][utils/_logger:68:megatron.data.gpt_dataset] > WARNING: could not find index map files, building on rank 0
using:
number of documents: 124
number of epochs: 36
sequence length: 4096
total number of samples: 25067
[2025-03-12 09:20:20][W][utils/_logger:68:megatron.data.gpt_dataset] > WARNING: could not find index map files, building on rank 0
using:
number of documents: 137
number of epochs: 42
sequence length: 4096
total number of samples: 26230
[2025-03-12 09:20:20][W][utils/_logger:68:megatron.data.gpt_dataset] > WARNING: could not find index map files, building on rank 0
using:
number of documents: 15512
number of epochs: 10
sequence length: 4096
total number of samples: 23777
[2025-03-12 09:20:20][W][utils/_logger:68:megatron.data.gpt_dataset] > WARNING: could not find index map files, building on rank 0
using:
number of documents: 15517
number of epochs: 11
sequence length: 4096
total number of samples: 25781
[2025-03-12 09:20:20][W][utils/_logger:68:megatron.data.gpt_dataset] > WARNING: could not find index map files, building on rank 0
using:
number of documents: 15002
number of epochs: 11
sequence length: 4096
total number of samples: 25131
[2025-03-12 09:20:20][W][utils/_logger:68:megatron.data.gpt_dataset] > WARNING: could not find index map files, building on rank 0
using:
number of documents: 14676
number of epochs: 10
sequence length: 4096
total number of samples: 22539
[2025-03-12 09:20:20][W][utils/_logger:68:megatron.data.gpt_dataset] > WARNING: could not find index map files, building on rank 0
using:
number of documents: 14422
number of epochs: 11
sequence length: 4096
total number of samples: 24161
[2025-03-12 09:20:20][W][utils/_logger:68:megatron.data.gpt_dataset] > WARNING: could not find index map files, building on rank 0
using:
number of documents: 14328
number of epochs: 10
sequence length: 4096
total number of samples: 22480
[2025-03-12 09:20:20][W][utils/_logger:68:megatron.data.gpt_dataset] > WARNING: could not find index map files, building on rank 0
using:
number of documents: 14088
number of epochs: 11
sequence length: 4096
total number of samples: 23577
[2025-03-12 09:20:20][W][utils/_logger:68:megatron.data.gpt_dataset] > WARNING: could not find index map files, building on rank 0
using:
number of documents: 13912
number of epochs: 11
sequence length: 4096
total number of samples: 23443
[2025-03-12 09:20:20][W][utils/_logger:68:megatron.data.gpt_dataset] > WARNING: could not find index map files, building on rank 0
using:
number of documents: 19705
number of epochs: 11
sequence length: 4096
total number of samples: 44530
[2025-03-12 09:20:20][W][utils/_logger:68:megatron.data.gpt_dataset] > WARNING: could not find index map files, building on rank 0
using:
number of documents: 14001
number of epochs: 10
sequence length: 4096
total number of samples: 33130
[2025-03-12 09:20:20][W][utils/_logger:68:megatron.data.gpt_dataset] > WARNING: could not find index map files, building on rank 0
using:
number of documents: 14672
number of epochs: 11
sequence length: 4096
total number of samples: 31661
[2025-03-12 09:20:21][W][utils/_logger:68:megatron.data.gpt_dataset] > WARNING: could not find index map files, building on rank 0
using:
number of documents: 12151
number of epochs: 10
sequence length: 4096
total number of samples: 24866
[2025-03-12 09:20:21][W][utils/_logger:68:megatron.data.gpt_dataset] > WARNING: could not find index map files, building on rank 0
using:
number of documents: 23064
number of epochs: 11
sequence length: 4096
total number of samples: 52512
[2025-03-12 09:20:21][W][utils/_logger:68:megatron.data.gpt_dataset] > WARNING: could not find index map files, building on rank 0
using:
number of documents: 15395
number of epochs: 11
sequence length: 4096
total number of samples: 32910
[2025-03-12 09:20:21][W][utils/_logger:68:megatron.data.gpt_dataset] > WARNING: could not find index map files, building on rank 0
using:
number of documents: 15829
number of epochs: 10
sequence length: 4096
total number of samples: 33963
[2025-03-12 09:20:21][W][utils/_logger:68:megatron.data.gpt_dataset] > WARNING: could not find index map files, building on rank 0
using:
number of documents: 12971
number of epochs: 11
sequence length: 4096
total number of samples: 27905
[2025-03-12 09:20:21][W][utils/_logger:68:megatron.data.gpt_dataset] > WARNING: could not find index map files, building on rank 0
using:
number of documents: 19431
number of epochs: 11
sequence length: 4096
total number of samples: 44037
[2025-03-12 09:20:21][W][utils/_logger:68:megatron.data.gpt_dataset] > WARNING: could not find index map files, building on rank 0
using:
number of documents: 16901
number of epochs: 11
sequence length: 4096
total number of samples: 37245
[2025-03-12 09:20:21][W][utils/_logger:68:megatron.data.gpt_dataset] > WARNING: could not find index map files, building on rank 0
using:
number of documents: 21247
number of epochs: 11
sequence length: 4096
total number of samples: 47851
[2025-03-12 09:20:21][W][utils/_logger:68:megatron.data.gpt_dataset] > WARNING: could not find index map files, building on rank 0
using:
number of documents: 20078
number of epochs: 11
sequence length: 4096
total number of samples: 46771
[2025-03-12 09:20:21][W][utils/_logger:68:megatron.data.gpt_dataset] > WARNING: could not find index map files, building on rank 0
using:
number of documents: 19598
number of epochs: 11
sequence length: 4096
total number of samples: 44827
[2025-03-12 09:20:21][W][utils/_logger:68:megatron.data.gpt_dataset] > WARNING: could not find index map files, building on rank 0
using:
number of documents: 18285
number of epochs: 10
sequence length: 4096
total number of samples: 39746
[2025-03-12 09:20:21][W][utils/_logger:68:megatron.data.gpt_dataset] > WARNING: could not find index map files, building on rank 0
using:
number of documents: 18529
number of epochs: 10
sequence length: 4096
total number of samples: 42204
[2025-03-12 09:20:22][W][utils/_logger:68:megatron.data.gpt_dataset] > WARNING: could not find index map files, building on rank 0
using:
number of documents: 15445
number of epochs: 10
sequence length: 4096
total number of samples: 30535
[2025-03-12 09:20:22][W][utils/_logger:68:megatron.data.gpt_dataset] > WARNING: could not find index map files, building on rank 0
using:
number of documents: 14543
number of epochs: 11
sequence length: 4096
total number of samples: 31296
[2025-03-12 09:20:22][W][utils/_logger:68:megatron.data.gpt_dataset] > WARNING: could not find index map files, building on rank 0
using:
number of documents: 19724
number of epochs: 11
sequence length: 4096
total number of samples: 38889
[2025-03-12 09:20:22][W][utils/_logger:68:megatron.data.gpt_dataset] > WARNING: could not find index map files, building on rank 0
using:
number of documents: 19584
number of epochs: 11
sequence length: 4096
total number of samples: 39778
[2025-03-12 09:20:22][W][utils/_logger:68:megatron.data.gpt_dataset] > WARNING: could not find index map files, building on rank 0
using:
number of documents: 12380
number of epochs: 11
sequence length: 4096
total number of samples: 29061
[2025-03-12 09:20:22][W][utils/_logger:68:megatron.data.gpt_dataset] > WARNING: could not find index map files, building on rank 0
using:
number of documents: 12015
number of epochs: 10
sequence length: 4096
total number of samples: 25767
[2025-03-12 09:20:22][W][utils/_logger:68:megatron.data.gpt_dataset] > WARNING: could not find index map files, building on rank 0
using:
number of documents: 12283
number of epochs: 10
sequence length: 4096
total number of samples: 24851
[2025-03-12 09:20:22][W][utils/_logger:68:megatron.data.gpt_dataset] > WARNING: could not find index map files, building on rank 0
using:
number of documents: 15157
number of epochs: 11
sequence length: 4096
total number of samples: 32441
[2025-03-12 09:20:22][W][utils/_logger:68:megatron.data.gpt_dataset] > WARNING: could not find index map files, building on rank 0
using:
number of documents: 15009
number of epochs: 11
sequence length: 4096
total number of samples: 31241
[2025-03-12 09:20:22][W][utils/_logger:68:megatron.data.gpt_dataset] > WARNING: could not find index map files, building on rank 0
using:
number of documents: 17556
number of epochs: 10
sequence length: 4096
total number of samples: 33780
[2025-03-12 09:20:22][W][utils/_logger:68:megatron.data.gpt_dataset] > WARNING: could not find index map files, building on rank 0
using:
number of documents: 16009
number of epochs: 11
sequence length: 4096
total number of samples: 33213
[2025-03-12 09:20:23][W][utils/_logger:68:megatron.data.gpt_dataset] > WARNING: could not find index map files, building on rank 0
using:
number of documents: 15193
number of epochs: 12
sequence length: 4096
total number of samples: 33500
[2025-03-12 09:20:23][W][utils/_logger:68:megatron.data.gpt_dataset] > WARNING: could not find index map files, building on rank 0
using:
number of documents: 19515
number of epochs: 11
sequence length: 4096
total number of samples: 30353
[2025-03-12 09:20:23][W][utils/_logger:68:megatron.data.gpt_dataset] > WARNING: could not find index map files, building on rank 0
using:
number of documents: 13401
number of epochs: 11
sequence length: 4096
total number of samples: 20941
[2025-03-12 09:20:23][W][utils/_logger:68:megatron.data.gpt_dataset] > WARNING: could not find index map files, building on rank 0
using:
number of documents: 14934
number of epochs: 11
sequence length: 4096
total number of samples: 23461
[2025-03-12 09:20:23][W][utils/_logger:68:megatron.data.gpt_dataset] > WARNING: could not find index map files, building on rank 0
using:
number of documents: 18897
number of epochs: 11
sequence length: 4096
total number of samples: 31956
[2025-03-12 09:20:23][W][utils/_logger:68:megatron.data.gpt_dataset] > WARNING: could not find index map files, building on rank 0
using:
number of documents: 13603
number of epochs: 11
sequence length: 4096
total number of samples: 20963
[2025-03-12 09:20:23][W][utils/_logger:68:megatron.data.gpt_dataset] > WARNING: could not find index map files, building on rank 0
using:
number of documents: 18864
number of epochs: 11
sequence length: 4096
total number of samples: 30855
[2025-03-12 09:20:23][W][utils/_logger:68:megatron.data.gpt_dataset] > WARNING: could not find index map files, building on rank 0
using:
number of documents: 22022
number of epochs: 9
sequence length: 4096
total number of samples: 35167
[2025-03-12 09:20:23][W][utils/_logger:68:megatron.data.gpt_dataset] > WARNING: could not find index map files, building on rank 0
using:
number of documents: 18971
number of epochs: 10
sequence length: 4096
total number of samples: 32559
[2025-03-12 09:20:23][W][utils/_logger:68:megatron.data.gpt_dataset] > WARNING: could not find index map files, building on rank 0
using:
number of documents: 10941
number of epochs: 10
sequence length: 4096
total number of samples: 19655
[2025-03-12 09:20:23][W][utils/_logger:68:megatron.data.gpt_dataset] > WARNING: could not find index map files, building on rank 0
using:
number of documents: 10578
number of epochs: 10
sequence length: 4096
total number of samples: 20791
[2025-03-12 09:20:23][W][utils/_logger:68:megatron.data.gpt_dataset] > WARNING: could not find index map files, building on rank 0
using:
number of documents: 9162
number of epochs: 11
sequence length: 4096
total number of samples: 19610
[2025-03-12 09:20:24][W][utils/_logger:68:megatron.data.gpt_dataset] > WARNING: could not find index map files, building on rank 0
using:
number of documents: 10026
number of epochs: 10
sequence length: 4096
total number of samples: 19075
[2025-03-12 09:20:24][W][utils/_logger:68:megatron.data.gpt_dataset] > WARNING: could not find index map files, building on rank 0
using:
number of documents: 13651
number of epochs: 11
sequence length: 4096
total number of samples: 23980
[2025-03-12 09:20:24][W][utils/_logger:68:megatron.data.gpt_dataset] > WARNING: could not find index map files, building on rank 0
using:
number of documents: 16560
number of epochs: 11
sequence length: 4096
total number of samples: 32183
[2025-03-12 09:20:24][W][utils/_logger:68:megatron.data.gpt_dataset] > WARNING: could not find index map files, building on rank 0
using:
number of documents: 17185
number of epochs: 10
sequence length: 4096
total number of samples: 28294
[2025-03-12 09:20:24][W][utils/_logger:68:megatron.data.gpt_dataset] > WARNING: could not find index map files, building on rank 0
using:
number of documents: 10909
number of epochs: 10
sequence length: 4096
total number of samples: 17972
[2025-03-12 09:20:24][W][utils/_logger:68:megatron.data.gpt_dataset] > WARNING: could not find index map files, building on rank 0
using:
number of documents: 19120
number of epochs: 10
sequence length: 4096
total number of samples: 34020
[2025-03-12 09:20:24][W][utils/_logger:68:megatron.data.gpt_dataset] > WARNING: could not find index map files, building on rank 0
using:
number of documents: 19179
number of epochs: 11
sequence length: 4096
total number of samples: 36582
[2025-03-12 09:20:24][W][utils/_logger:68:megatron.data.gpt_dataset] > WARNING: could not find index map files, building on rank 0
using:
number of documents: 23466
number of epochs: 38
sequence length: 4096
total number of samples: 171575
[2025-03-12 09:20:24][W][utils/_logger:68:megatron.data.gpt_dataset] > WARNING: could not find index map files, building on rank 0
using:
number of documents: 13525
number of epochs: 11
sequence length: 4096
total number of samples: 36214
[2025-03-12 09:20:24][W][utils/_logger:68:megatron.data.gpt_dataset] > WARNING: could not find index map files, building on rank 0
using:
number of documents: 13580
number of epochs: 15
sequence length: 4096
total number of samples: 36283
[2025-03-12 09:20:24][W][utils/_logger:68:megatron.data.gpt_dataset] > WARNING: could not find index map files, building on rank 0
using:
number of documents: 13538
number of epochs: 15
sequence length: 4096
total number of samples: 35909
[2025-03-12 09:20:25][W][utils/_logger:68:megatron.data.gpt_dataset] > WARNING: could not find index map files, building on rank 0
using:
number of documents: 13489
number of epochs: 14
sequence length: 4096
total number of samples: 36945
[2025-03-12 09:20:25][W][utils/_logger:68:megatron.data.gpt_dataset] > WARNING: could not find index map files, building on rank 0
using:
number of documents: 13552
number of epochs: 14
sequence length: 4096
total number of samples: 36041
[2025-03-12 09:20:25][W][utils/_logger:68:megatron.data.gpt_dataset] > WARNING: could not find index map files, building on rank 0
using:
number of documents: 13314
number of epochs: 14
sequence length: 4096
total number of samples: 34932
[2025-03-12 09:20:25][W][utils/_logger:68:megatron.data.gpt_dataset] > WARNING: could not find index map files, building on rank 0
using:
number of documents: 13390
number of epochs: 15
sequence length: 4096
total number of samples: 36179
[2025-03-12 09:20:25][W][utils/_logger:68:megatron.data.gpt_dataset] > WARNING: could not find index map files, building on rank 0
using:
number of documents: 13324
number of epochs: 14
sequence length: 4096
total number of samples: 35342
[2025-03-12 09:20:25][W][utils/_logger:68:megatron.data.gpt_dataset] > WARNING: could not find index map files, building on rank 0
using:
number of documents: 13277
number of epochs: 14
sequence length: 4096
total number of samples: 36406
[2025-03-12 09:20:25][W][utils/_logger:68:megatron.data.gpt_dataset] > WARNING: could not find index map files, building on rank 0
using:
number of documents: 13372
number of epochs: 14
sequence length: 4096
total number of samples: 35805
[2025-03-12 09:20:25][W][utils/_logger:68:megatron.data.gpt_dataset] > WARNING: could not find index map files, building on rank 0
using:
number of documents: 13129
number of epochs: 14
sequence length: 4096
total number of samples: 35447
[2025-03-12 09:20:25][W][utils/_logger:68:megatron.data.gpt_dataset] > WARNING: could not find index map files, building on rank 0
using:
number of documents: 13155
number of epochs: 15
sequence length: 4096
total number of samples: 36897
[2025-03-12 09:20:25][W][utils/_logger:68:megatron.data.gpt_dataset] > WARNING: could not find index map files, building on rank 0
using:
number of documents: 13198
number of epochs: 15
sequence length: 4096
total number of samples: 34777
[2025-03-12 09:20:25][W][utils/_logger:68:megatron.data.gpt_dataset] > WARNING: could not find index map files, building on rank 0
using:
number of documents: 13166
number of epochs: 16
sequence length: 4096
total number of samples: 36330
[2025-03-12 09:20:25][W][utils/_logger:68:megatron.data.gpt_dataset] > WARNING: could not find index map files, building on rank 0
using:
number of documents: 13118
number of epochs: 14
sequence length: 4096
total number of samples: 35401
[2025-03-12 09:20:25][W][utils/_logger:68:megatron.data.gpt_dataset] > WARNING: could not find index map files, building on rank 0
using:
number of documents: 13224
number of epochs: 15
sequence length: 4096
total number of samples: 35667
[2025-03-12 09:20:26][W][utils/_logger:68:megatron.data.gpt_dataset] > WARNING: could not find index map files, building on rank 0
using:
number of documents: 12979
number of epochs: 14
sequence length: 4096
total number of samples: 35048
[2025-03-12 09:20:26][W][utils/_logger:68:megatron.data.gpt_dataset] > WARNING: could not find index map files, building on rank 0
using:
number of documents: 13034
number of epochs: 14
sequence length: 4096
total number of samples: 34449
[2025-03-12 09:20:26][W][utils/_logger:68:megatron.data.gpt_dataset] > WARNING: could not find index map files, building on rank 0
using:
number of documents: 13113
number of epochs: 13
sequence length: 4096
total number of samples: 35932
[2025-03-12 09:20:26][W][utils/_logger:68:megatron.data.gpt_dataset] > WARNING: could not find index map files, building on rank 0
using:
number of documents: 13049
number of epochs: 14
sequence length: 4096
total number of samples: 35677
[2025-03-12 09:20:26][W][utils/_logger:68:megatron.data.gpt_dataset] > WARNING: could not find index map files, building on rank 0
using:
number of documents: 13130
number of epochs: 14
sequence length: 4096
total number of samples: 36438
[2025-03-12 09:20:26][W][utils/_logger:68:megatron.data.gpt_dataset] > WARNING: could not find index map files, building on rank 0
using:
number of documents: 62
number of epochs: 57
sequence length: 4096
total number of samples: 1134
[2025-03-12 09:20:26][W][utils/_logger:68:megatron.data.gpt_dataset] > WARNING: could not find index map files, building on rank 0
using:
number of documents: 64
number of epochs: 55
sequence length: 4096
total number of samples: 1235
[2025-03-12 09:20:26][W][utils/_logger:68:megatron.data.gpt_dataset] > WARNING: could not find index map files, building on rank 0
using:
number of documents: 235
number of epochs: 55
sequence length: 4096
total number of samples: 4953
[2025-03-12 09:20:26][W][utils/_logger:68:megatron.data.gpt_dataset] > WARNING: could not find index map files, building on rank 0
using:
number of documents: 271
number of epochs: 63
sequence length: 4096
total number of samples: 5949
[2025-03-12 09:20:26][W][utils/_logger:68:megatron.data.gpt_dataset] > WARNING: could not find index map files, building on rank 0
using:
number of documents: 124
number of epochs: 53
sequence length: 4096
total number of samples: 2662
[2025-03-12 09:20:26][W][utils/_logger:68:megatron.data.gpt_dataset] > WARNING: could not find index map files, building on rank 0
using:
number of documents: 87
number of epochs: 55
sequence length: 4096
total number of samples: 1692
[2025-03-12 09:20:26][W][utils/_logger:68:megatron.data.gpt_dataset] > WARNING: could not find index map files, building on rank 0
using:
number of documents: 45
number of epochs: 100
sequence length: 4096
total number of samples: 851
[2025-03-12 09:20:26][W][utils/_logger:68:megatron.data.gpt_dataset] > WARNING: could not find index map files, building on rank 0
using:
number of documents: 100
number of epochs: 32
sequence length: 4096
total number of samples: 1811
[2025-03-12 09:20:26][W][utils/_logger:68:megatron.data.gpt_dataset] > WARNING: could not find index map files, building on rank 0
using:
number of documents: 77
number of epochs: 44
sequence length: 4096
total number of samples: 1519
[2025-03-12 09:20:26][W][utils/_logger:68:megatron.data.gpt_dataset] > WARNING: could not find index map files, building on rank 0
using:
number of documents: 34
number of epochs: 35
sequence length: 4096
total number of samples: 604
[2025-03-12 09:20:26][W][utils/_logger:68:megatron.data.gpt_dataset] > WARNING: could not find index map files, building on rank 0
using:
number of documents: 4987
number of epochs: 52
sequence length: 4096
total number of samples: 142860
[2025-03-12 09:20:26][W][utils/_logger:68:megatron.data.gpt_dataset] > WARNING: could not find index map files, building on rank 0
using:
number of documents: 4936
number of epochs: 41
sequence length: 4096
total number of samples: 350411
[2025-03-12 09:20:26][W][utils/_logger:68:megatron.data.gpt_dataset] > WARNING: could not find index map files, building on rank 0
using:
number of documents: 50251
number of epochs: 21
sequence length: 4096
total number of samples: 58293
[2025-03-12 09:20:26][W][utils/_logger:68:megatron.data.gpt_dataset] > WARNING: could not find index map files, building on rank 0
using:
number of documents: 50433
number of epochs: 21
sequence length: 4096
total number of samples: 58240
[2025-03-12 09:20:26][W][utils/_logger:68:megatron.data.gpt_dataset] > WARNING: could not find index map files, building on rank 0
using:
number of documents: 49875
number of epochs: 21
sequence length: 4096
total number of samples: 57788
[2025-03-12 09:20:27][W][utils/_logger:68:megatron.data.gpt_dataset] > WARNING: could not find index map files, building on rank 0
using:
number of documents: 9771
number of epochs: 58
sequence length: 4096
total number of samples: 89340
[2025-03-12 09:20:27][W][utils/_logger:68:megatron.data.gpt_dataset] > WARNING: could not find index map files, building on rank 0
using:
number of documents: 32972
number of epochs: 22
sequence length: 4096
total number of samples: 518930
[2025-03-12 09:20:27][W][utils/_logger:68:megatron.data.gpt_dataset] > WARNING: could not find index map files, building on rank 0
using:
number of documents: 51115
number of epochs: 28
sequence length: 4096
total number of samples: 379054
[2025-03-12 09:20:27][W][utils/_logger:68:megatron.data.gpt_dataset] > WARNING: could not find index map files, building on rank 0
using:
number of documents: 4998
number of epochs: 41
sequence length: 4096
total number of samples: 28231
[2025-03-12 09:20:27][W][utils/_logger:68:megatron.data.gpt_dataset] > WARNING: could not find index map files, building on rank 0
using:
number of documents: 14044
number of epochs: 79
sequence length: 4096
total number of samples: 24062
[2025-03-12 09:20:27][W][utils/_logger:68:megatron.data.gpt_dataset] > WARNING: could not find index map files, building on rank 0
using:
number of documents: 2000
number of epochs: 40
sequence length: 4096
total number of samples: 21279
> building indices for blendable datasets ...
> sample ratios:
dataset 0, input: 0.108499, achieved: 0.108499
dataset 1, input: 0.103053, achieved: 0.103053
dataset 2, input: 0.0854747, achieved: 0.0854747
dataset 3, input: 0.0433844, achieved: 0.0433844
dataset 4, input: 0.0113772, achieved: 0.0113772
dataset 5, input: 0.0527751, achieved: 0.0527751
dataset 6, input: 0.00885534, achieved: 0.00885534
dataset 7, input: 0.0852544, achieved: 0.0852544
dataset 8, input: 0.0730518, achieved: 0.0730518
dataset 9, input: 0.0799135, achieved: 0.0799135
dataset 10, input: 0.0413842, achieved: 0.0413842
dataset 11, input: 0.0496325, achieved: 0.0496325
dataset 12, input: 0.0116255, achieved: 0.0116255
dataset 13, input: 0.0320609, achieved: 0.0320609
dataset 14, input: 0.106373, achieved: 0.106373
dataset 15, input: 0.107285, achieved: 0.107285
[2025-03-12 09:20:27][I][data/gpt_dataset:191:megatron.data.gpt_dataset] [BuildConcatDataset] Caught args.shuffle_sample_in_corpus=True across 1675373 samples
> building indices for blendable datasets ...
> sample ratios:
dataset 0, input: 0.00859641, achieved: 0.00859641
dataset 1, input: 0.00880487, achieved: 0.00880487
dataset 2, input: 0.0105312, achieved: 0.0105312
dataset 3, input: 0.00971668, achieved: 0.00971668
dataset 4, input: 0.0094472, achieved: 0.0094472
dataset 5, input: 0.0101289, achieved: 0.0101289
dataset 6, input: 0.0105224, achieved: 0.0105224
dataset 7, input: 0.0102594, achieved: 0.0102594
dataset 8, input: 0.0105675, achieved: 0.0105675
dataset 9, input: 0.0084371, achieved: 0.0084371
dataset 10, input: 0.0102051, achieved: 0.0102051
dataset 11, input: 0.00864047, achieved: 0.00864047
dataset 12, input: 0.012604, achieved: 0.012604
dataset 13, input: 0.00930382, achieved: 0.00930382
dataset 14, input: 0.0111695, achieved: 0.0111695
dataset 15, input: 0.010155, achieved: 0.010155
dataset 16, input: 0.0107163, achieved: 0.0107163
dataset 17, input: 0.0110763, achieved: 0.0110763
dataset 18, input: 0.01032, achieved: 0.01032
dataset 19, input: 0.0107664, achieved: 0.0107664
dataset 20, input: 0.0116291, achieved: 0.0116291
dataset 21, input: 0.0093872, achieved: 0.0093872
dataset 22, input: 0.0101136, achieved: 0.0101136
dataset 23, input: 0.00983294, achieved: 0.00983294
dataset 24, input: 0.00962855, achieved: 0.00962855
dataset 25, input: 0.00957126, achieved: 0.00957126
dataset 26, input: 0.00977464, achieved: 0.00977464
dataset 27, input: 0.00901977, achieved: 0.00901977
dataset 28, input: 0.0103566, achieved: 0.0103566
dataset 29, input: 0.00999056, achieved: 0.00999056
dataset 30, input: 0.0124182, achieved: 0.0124182
dataset 31, input: 0.00891062, achieved: 0.00891062
dataset 32, input: 0.00931399, achieved: 0.00931399
dataset 33, input: 0.0114227, achieved: 0.0114227
dataset 34, input: 0.0119182, achieved: 0.0119182
dataset 35, input: 0.0103448, achieved: 0.0103448
dataset 36, input: 0.00920281, achieved: 0.00920281
dataset 37, input: 0.0100794, achieved: 0.0100794
dataset 38, input: 0.00899367, achieved: 0.00899367
dataset 39, input: 0.0100109, achieved: 0.0100109
dataset 40, input: 0.00949635, achieved: 0.00949635
dataset 41, input: 0.00916891, achieved: 0.00916891
dataset 42, input: 0.0105868, achieved: 0.0105868
dataset 43, input: 0.0110166, achieved: 0.0110166
dataset 44, input: 0.00956516, achieved: 0.00956516
dataset 45, input: 0.010096, achieved: 0.010096
dataset 46, input: 0.0111118, achieved: 0.0111118
dataset 47, input: 0.00861403, achieved: 0.00861403
dataset 48, input: 0.00969295, achieved: 0.00969295
dataset 49, input: 0.00888452, achieved: 0.00888452
dataset 50, input: 0.0106549, achieved: 0.0106549
dataset 51, input: 0.0107085, achieved: 0.0107085
dataset 52, input: 0.0105183, achieved: 0.0105183
dataset 53, input: 0.0105936, achieved: 0.0105936
dataset 54, input: 0.0101075, achieved: 0.0101075
dataset 55, input: 0.0106142, achieved: 0.0106142
dataset 56, input: 0.00844354, achieved: 0.00844354
dataset 57, input: 0.01004, achieved: 0.01004
dataset 58, input: 0.00954313, achieved: 0.00954313
dataset 59, input: 0.0104014, achieved: 0.0104014
dataset 60, input: 0.0115471, achieved: 0.0115471
dataset 61, input: 0.00886656, achieved: 0.00886656
dataset 62, input: 0.0115071, achieved: 0.0115071
dataset 63, input: 0.00804085, achieved: 0.00804085
dataset 64, input: 0.0102777, achieved: 0.0102777
dataset 65, input: 0.00969363, achieved: 0.00969363
dataset 66, input: 0.00880419, achieved: 0.00880419
dataset 67, input: 0.0101621, achieved: 0.0101621
dataset 68, input: 0.0106685, achieved: 0.0106685
dataset 69, input: 0.0103031, achieved: 0.0103031
dataset 70, input: 0.00776019, achieved: 0.00776019
dataset 71, input: 0.010156, achieved: 0.010156
dataset 72, input: 0.0117694, achieved: 0.0117694
dataset 73, input: 0.00965532, achieved: 0.00965532
dataset 74, input: 0.00980277, achieved: 0.00980277
dataset 75, input: 0.00957092, achieved: 0.00957092
dataset 76, input: 0.0102658, achieved: 0.0102658
dataset 77, input: 0.0101736, achieved: 0.0101736
dataset 78, input: 0.00952516, achieved: 0.00952516
dataset 79, input: 0.00954821, achieved: 0.00954821
dataset 80, input: 0.00878894, achieved: 0.00878894
dataset 81, input: 0.00989429, achieved: 0.00989429
dataset 82, input: 0.0107254, achieved: 0.0107254
dataset 83, input: 0.0105407, achieved: 0.0105407
dataset 84, input: 0.0103658, achieved: 0.0103658
dataset 85, input: 0.0113505, achieved: 0.0113505
dataset 86, input: 0.00890893, achieved: 0.00890893
dataset 87, input: 0.0101038, achieved: 0.0101038
dataset 88, input: 0.00923094, achieved: 0.00923094
dataset 89, input: 0.00903197, achieved: 0.00903197
dataset 90, input: 0.00932958, achieved: 0.00932958
dataset 91, input: 0.0107644, achieved: 0.0107644
dataset 92, input: 0.0102692, achieved: 0.0102692
dataset 93, input: 0.0113081, achieved: 0.0113081
dataset 94, input: 0.0102949, achieved: 0.0102949
dataset 95, input: 0.00908553, achieved: 0.00908553
dataset 96, input: 0.00956041, achieved: 0.00956041
dataset 97, input: 0.0103095, achieved: 0.0103095
dataset 98, input: 0.00869708, achieved: 0.00869708
dataset 99, input: 0.00959601, achieved: 0.00959601
[2025-03-12 09:20:28][I][data/gpt_dataset:191:megatron.data.gpt_dataset] [BuildConcatDataset] Caught args.shuffle_sample_in_corpus=True across 2950186 samples
> building indices for blendable datasets ...
> sample ratios:
dataset 0, input: 0.430653, achieved: 0.430653
dataset 1, input: 0.430584, achieved: 0.430584
dataset 2, input: 0.138763, achieved: 0.138763
[2025-03-12 09:20:28][I][data/gpt_dataset:191:megatron.data.gpt_dataset] [BuildConcatDataset] Caught args.shuffle_sample_in_corpus=True across 707075 samples
> building indices for blendable datasets ...
> sample ratios:
dataset 0, input: 0.00616803, achieved: 0.00616803
dataset 1, input: 0.00616438, achieved: 0.00616438
dataset 2, input: 0.00616803, achieved: 0.00616803
dataset 3, input: 0.00616986, achieved: 0.00616986
dataset 4, input: 0.00616647, achieved: 0.00616647
dataset 5, input: 0.00616803, achieved: 0.00616803
dataset 6, input: 0.00617952, achieved: 0.00617952
dataset 7, input: 0.00615837, achieved: 0.00615837
dataset 8, input: 0.00617012, achieved: 0.00617012
dataset 9, input: 0.00617064, achieved: 0.00617064
dataset 10, input: 0.00615915, achieved: 0.00615915
dataset 11, input: 0.00617534, achieved: 0.00617534
dataset 12, input: 0.00617299, achieved: 0.00617299
dataset 13, input: 0.00617404, achieved: 0.00617404
dataset 14, input: 0.00617743, achieved: 0.00617743
dataset 15, input: 0.0061696, achieved: 0.0061696
dataset 16, input: 0.00618814, achieved: 0.00618814
dataset 17, input: 0.00618919, achieved: 0.00618919
dataset 18, input: 0.00615941, achieved: 0.00615941
dataset 19, input: 0.0061568, achieved: 0.0061568
dataset 20, input: 0.00614792, achieved: 0.00614792
dataset 21, input: 0.00615367, achieved: 0.00615367
dataset 22, input: 0.00615602, achieved: 0.00615602
dataset 23, input: 0.00617482, achieved: 0.00617482
dataset 24, input: 0.00617273, achieved: 0.00617273
dataset 25, input: 0.00615759, achieved: 0.00615759
dataset 26, input: 0.00617195, achieved: 0.00617195
dataset 27, input: 0.00617456, achieved: 0.00617456
dataset 28, input: 0.00617273, achieved: 0.00617273
dataset 29, input: 0.00616281, achieved: 0.00616281
dataset 30, input: 0.00618161, achieved: 0.00618161
dataset 31, input: 0.00605077, achieved: 0.00605077
dataset 32, input: 0.00601655, achieved: 0.00601655
dataset 33, input: 0.00600637, achieved: 0.00600637
dataset 34, input: 0.0060001, achieved: 0.0060001
dataset 35, input: 0.00602073, achieved: 0.00602073
dataset 36, input: 0.00600297, achieved: 0.00600297
dataset 37, input: 0.00600402, achieved: 0.00600402
dataset 38, input: 0.00600976, achieved: 0.00600976
dataset 39, input: 0.00599227, achieved: 0.00599227
dataset 40, input: 0.00601446, achieved: 0.00601446
dataset 41, input: 0.00599566, achieved: 0.00599566
dataset 42, input: 0.00599932, achieved: 0.00599932
dataset 43, input: 0.00599958, achieved: 0.00599958
dataset 44, input: 0.00598521, achieved: 0.00598521
dataset 45, input: 0.00599331, achieved: 0.00599331
dataset 46, input: 0.0059954, achieved: 0.0059954
dataset 47, input: 0.00598887, achieved: 0.00598887
dataset 48, input: 0.00600376, achieved: 0.00600376
dataset 49, input: 0.0060035, achieved: 0.0060035
dataset 50, input: 0.00598991, achieved: 0.00598991
dataset 51, input: 0.00598417, achieved: 0.00598417
dataset 52, input: 0.00599122, achieved: 0.00599122
dataset 53, input: 0.00599801, achieved: 0.00599801
dataset 54, input: 0.00598756, achieved: 0.00598756
dataset 55, input: 0.00598234, achieved: 0.00598234
dataset 56, input: 0.00599435, achieved: 0.00599435
dataset 57, input: 0.00597294, achieved: 0.00597294
dataset 58, input: 0.00599253, achieved: 0.00599253
dataset 59, input: 0.00597868, achieved: 0.00597868
dataset 60, input: 0.00597424, achieved: 0.00597424
dataset 61, input: 0.00598339, achieved: 0.00598339
dataset 62, input: 0.00594682, achieved: 0.00594682
dataset 63, input: 0.0059019, achieved: 0.0059019
dataset 64, input: 0.00589198, achieved: 0.00589198
dataset 65, input: 0.00587683, achieved: 0.00587683
dataset 66, input: 0.00587683, achieved: 0.00587683
dataset 67, input: 0.00587604, achieved: 0.00587604
dataset 68, input: 0.00586952, achieved: 0.00586952
dataset 69, input: 0.00587631, achieved: 0.00587631
dataset 70, input: 0.00586508, achieved: 0.00586508
dataset 71, input: 0.00586116, achieved: 0.00586116
dataset 72, input: 0.0058797, achieved: 0.0058797
dataset 73, input: 0.00587448, achieved: 0.00587448
dataset 74, input: 0.00587448, achieved: 0.00587448
dataset 75, input: 0.00587213, achieved: 0.00587213
dataset 76, input: 0.00588205, achieved: 0.00588205
dataset 77, input: 0.00587134, achieved: 0.00587134
dataset 78, input: 0.00588127, achieved: 0.00588127
dataset 79, input: 0.00588623, achieved: 0.00588623
dataset 80, input: 0.00585855, achieved: 0.00585855
dataset 81, input: 0.00587604, achieved: 0.00587604
dataset 82, input: 0.00585776, achieved: 0.00585776
dataset 83, input: 0.00585672, achieved: 0.00585672
dataset 84, input: 0.00586899, achieved: 0.00586899
dataset 85, input: 0.00585358, achieved: 0.00585358
dataset 86, input: 0.00586795, achieved: 0.00586795
dataset 87, input: 0.00584627, achieved: 0.00584627
dataset 88, input: 0.00585123, achieved: 0.00585123
dataset 89, input: 0.00587187, achieved: 0.00587187
dataset 90, input: 0.00586403, achieved: 0.00586403
dataset 91, input: 0.00585123, achieved: 0.00585123
dataset 92, input: 0.0058622, achieved: 0.0058622
dataset 93, input: 0.00584836, achieved: 0.00584836
dataset 94, input: 0.00579247, achieved: 0.00579247
dataset 95, input: 0.00578359, achieved: 0.00578359
dataset 96, input: 0.0057909, achieved: 0.0057909
dataset 97, input: 0.0057849, achieved: 0.0057849
dataset 98, input: 0.0057862, achieved: 0.0057862
dataset 99, input: 0.00576975, achieved: 0.00576975
dataset 100, input: 0.00578411, achieved: 0.00578411
dataset 101, input: 0.00578385, achieved: 0.00578385
dataset 102, input: 0.0057687, achieved: 0.0057687
dataset 103, input: 0.00577393, achieved: 0.00577393
dataset 104, input: 0.00577576, achieved: 0.00577576
dataset 105, input: 0.0057533, achieved: 0.0057533
dataset 106, input: 0.00575747, achieved: 0.00575747
dataset 107, input: 0.00575591, achieved: 0.00575591
dataset 108, input: 0.00575408, achieved: 0.00575408
dataset 109, input: 0.00576792, achieved: 0.00576792
dataset 110, input: 0.00575565, achieved: 0.00575565
dataset 111, input: 0.00576348, achieved: 0.00576348
dataset 112, input: 0.00575878, achieved: 0.00575878
dataset 113, input: 0.00575565, achieved: 0.00575565
dataset 114, input: 0.00576061, achieved: 0.00576061
dataset 115, input: 0.00575878, achieved: 0.00575878
dataset 116, input: 0.0057499, achieved: 0.0057499
dataset 117, input: 0.00576139, achieved: 0.00576139
dataset 118, input: 0.0057593, achieved: 0.0057593
dataset 119, input: 0.00573266, achieved: 0.00573266
dataset 120, input: 0.00575695, achieved: 0.00575695
dataset 121, input: 0.0057499, achieved: 0.0057499
dataset 122, input: 0.00576009, achieved: 0.00576009
dataset 123, input: 0.00575538, achieved: 0.00575538
dataset 124, input: 0.00575643, achieved: 0.00575643
dataset 125, input: 0.00571438, achieved: 0.00571438
dataset 126, input: 0.00570002, achieved: 0.00570002
dataset 127, input: 0.00568461, achieved: 0.00568461
dataset 128, input: 0.0057042, achieved: 0.0057042
dataset 129, input: 0.00567416, achieved: 0.00567416
dataset 130, input: 0.00567677, achieved: 0.00567677
dataset 131, input: 0.00568356, achieved: 0.00568356
dataset 132, input: 0.00567625, achieved: 0.00567625
dataset 133, input: 0.00566502, achieved: 0.00566502
dataset 134, input: 0.00567938, achieved: 0.00567938
dataset 135, input: 0.0056739, achieved: 0.0056739
dataset 136, input: 0.00568017, achieved: 0.00568017
dataset 137, input: 0.00567599, achieved: 0.00567599
dataset 138, input: 0.0056705, achieved: 0.0056705
dataset 139, input: 0.00566972, achieved: 0.00566972
dataset 140, input: 0.00565927, achieved: 0.00565927
dataset 141, input: 0.00566319, achieved: 0.00566319
dataset 142, input: 0.00566267, achieved: 0.00566267
dataset 143, input: 0.00565431, achieved: 0.00565431
dataset 144, input: 0.00566659, achieved: 0.00566659
dataset 145, input: 0.00567599, achieved: 0.00567599
dataset 146, input: 0.00566084, achieved: 0.00566084
dataset 147, input: 0.00565562, achieved: 0.00565562
dataset 148, input: 0.00565614, achieved: 0.00565614
dataset 149, input: 0.0056598, achieved: 0.0056598
dataset 150, input: 0.00566293, achieved: 0.00566293
dataset 151, input: 0.00566371, achieved: 0.00566371
dataset 152, input: 0.00566032, achieved: 0.00566032
dataset 153, input: 0.00566136, achieved: 0.00566136
dataset 154, input: 0.00565823, achieved: 0.00565823
dataset 155, input: 0.00565196, achieved: 0.00565196
dataset 156, input: 0.00566084, achieved: 0.00566084
dataset 157, input: 0.00563499, achieved: 0.00563499
dataset 158, input: 0.00561383, achieved: 0.00561383
dataset 159, input: 0.00561122, achieved: 0.00561122
dataset 160, input: 0.00560756, achieved: 0.00560756
dataset 161, input: 0.00560391, achieved: 0.00560391
dataset 162, input: 0.0056026, achieved: 0.0056026
dataset 163, input: 0.00561697, achieved: 0.00561697
dataset 164, input: 0.00561383, achieved: 0.00561383
dataset 165, input: 0.00560652, achieved: 0.00560652
dataset 166, input: 0.00558954, achieved: 0.00558954
dataset 167, input: 0.00560913, achieved: 0.00560913
dataset 168, input: 0.00560129, achieved: 0.00560129
dataset 169, input: 0.00559633, achieved: 0.00559633
dataset 170, input: 0.00186919, achieved: 0.00186919
[2025-03-12 09:20:29][I][data/gpt_dataset:191:megatron.data.gpt_dataset] [BuildConcatDataset] Caught args.shuffle_sample_in_corpus=True across 3828936 samples
> building indices for blendable datasets ...
> sample ratios:
dataset 0, input: 0.00105726, achieved: 0.00105726
dataset 1, input: 0.00107928, achieved: 0.00107928
dataset 2, input: 0.00109672, achieved: 0.00109672
dataset 3, input: 0.00109393, achieved: 0.00109393
dataset 4, input: 0.0010994, achieved: 0.0010994
dataset 5, input: 0.00107491, achieved: 0.00107491
dataset 6, input: 0.00108898, achieved: 0.00108898
dataset 7, input: 0.000683743, achieved: 0.000683743
dataset 8, input: 0.00110611, achieved: 0.00110611
dataset 9, input: 0.0011095, achieved: 0.0011095
dataset 10, input: 0.00108458, achieved: 0.00108458
dataset 11, input: 0.0011302, achieved: 0.0011302
dataset 12, input: 0.00113901, achieved: 0.00113901
dataset 13, input: 0.00108227, achieved: 0.00108227
dataset 14, input: 0.00110645, achieved: 0.00110645
dataset 15, input: 0.00112896, achieved: 0.00112896
dataset 16, input: 0.00111693, achieved: 0.00111693
dataset 17, input: 0.00118774, achieved: 0.00118774
dataset 18, input: 0.00117349, achieved: 0.00117349
dataset 19, input: 0.00110703, achieved: 0.00110703
dataset 20, input: 0.00110024, achieved: 0.00110024
dataset 21, input: 0.0011074, achieved: 0.0011074
dataset 22, input: 0.00125365, achieved: 0.00125365
dataset 23, input: 0.00124519, achieved: 0.00124519
dataset 24, input: 0.000702826, achieved: 0.000702826
dataset 25, input: 0.00111163, achieved: 0.00111163
dataset 26, input: 0.00116457, achieved: 0.00116457
dataset 27, input: 0.00114033, achieved: 0.00114033
dataset 28, input: 0.00108084, achieved: 0.00108084
dataset 29, input: 0.00114093, achieved: 0.00114093
dataset 30, input: 0.000691025, achieved: 0.000691025
dataset 31, input: 0.000760019, achieved: 0.000760019
dataset 32, input: 0.0010096, achieved: 0.0010096
dataset 33, input: 0.00129976, achieved: 0.00129976
dataset 34, input: 0.000736013, achieved: 0.000736013
dataset 35, input: 0.00109681, achieved: 0.00109681
dataset 36, input: 0.000696868, achieved: 0.000696868
dataset 37, input: 0.00113699, achieved: 0.00113699
dataset 38, input: 0.0011532, achieved: 0.0011532
dataset 39, input: 0.000760249, achieved: 0.000760249
dataset 40, input: 0.000760421, achieved: 0.000760421
dataset 41, input: 0.00118782, achieved: 0.00118782
dataset 42, input: 0.00119723, achieved: 0.00119723
dataset 43, input: 0.00127187, achieved: 0.00127187
dataset 44, input: 0.00075193, achieved: 0.00075193
dataset 45, input: 0.00088252, achieved: 0.00088252
dataset 46, input: 0.000878317, achieved: 0.000878317
dataset 47, input: 0.00129593, achieved: 0.00129593
dataset 48, input: 0.000647074, achieved: 0.000647074
dataset 49, input: 0.00107732, achieved: 0.00107732
dataset 50, input: 0.00110709, achieved: 0.00110709
dataset 51, input: 0.00127478, achieved: 0.00127478
dataset 52, input: 0.00135992, achieved: 0.00135992
dataset 53, input: 0.00111425, achieved: 0.00111425
dataset 54, input: 0.00112836, achieved: 0.00112836
dataset 55, input: 0.00107217, achieved: 0.00107217
dataset 56, input: 0.00111443, achieved: 0.00111443
dataset 57, input: 0.00109946, achieved: 0.00109946
dataset 58, input: 0.00109523, achieved: 0.00109523
dataset 59, input: 0.00110734, achieved: 0.00110734
dataset 60, input: 0.00110199, achieved: 0.00110199
dataset 61, input: 0.00112873, achieved: 0.00112873
dataset 62, input: 0.00110829, achieved: 0.00110829
dataset 63, input: 0.00111008, achieved: 0.00111008
dataset 64, input: 0.00110855, achieved: 0.00110855
dataset 65, input: 0.00109658, achieved: 0.00109658
dataset 66, input: 0.00110116, achieved: 0.00110116
dataset 67, input: 0.00105519, achieved: 0.00105519
dataset 68, input: 0.00102554, achieved: 0.00102554
dataset 69, input: 0.00094731, achieved: 0.00094731
dataset 70, input: 0.000868991, achieved: 0.000868991
dataset 71, input: 0.00111486, achieved: 0.00111486
dataset 72, input: 0.000803395, achieved: 0.000803395
dataset 73, input: 0.000821845, achieved: 0.000821845
dataset 74, input: 0.000815512, achieved: 0.000815512
dataset 75, input: 0.000801927, achieved: 0.000801927
dataset 76, input: 0.00082006, achieved: 0.00082006
dataset 77, input: 0.000797034, achieved: 0.000797034
dataset 78, input: 0.000825759, achieved: 0.000825759
dataset 79, input: 0.000809928, achieved: 0.000809928
dataset 80, input: 0.000809439, achieved: 0.000809439
dataset 81, input: 0.000808317, achieved: 0.000808317
dataset 82, input: 0.000798559, achieved: 0.000798559
dataset 83, input: 0.000805064, achieved: 0.000805064
dataset 84, input: 0.000795796, achieved: 0.000795796
dataset 85, input: 0.000741712, achieved: 0.000741712
dataset 86, input: 0.00072119, achieved: 0.00072119
dataset 87, input: 0.000734229, achieved: 0.000734229
dataset 88, input: 0.000713044, achieved: 0.000713044
dataset 89, input: 0.00071716, achieved: 0.00071716
dataset 90, input: 0.000723608, achieved: 0.000723608
dataset 91, input: 0.000740331, achieved: 0.000740331
dataset 92, input: 0.000739266, achieved: 0.000739266
dataset 93, input: 0.000734891, achieved: 0.000734891
dataset 94, input: 0.000711922, achieved: 0.000711922
dataset 95, input: 0.000912483, achieved: 0.000912483
dataset 96, input: 0.00105343, achieved: 0.00105343
dataset 97, input: 0.00107571, achieved: 0.00107571
dataset 98, input: 0.00102163, achieved: 0.00102163
dataset 99, input: 0.00103542, achieved: 0.00103542
dataset 100, input: 0.0010408, achieved: 0.0010408
dataset 101, input: 0.001014, achieved: 0.001014
dataset 102, input: 0.00102785, achieved: 0.00102785
dataset 103, input: 0.00101656, achieved: 0.00101656
dataset 104, input: 0.0010292, achieved: 0.0010292
dataset 105, input: 0.00103639, achieved: 0.00103639
dataset 106, input: 0.00102482, achieved: 0.00102482
dataset 107, input: 0.000989305, achieved: 0.000989305
dataset 108, input: 0.000995551, achieved: 0.000995551
dataset 109, input: 0.00100036, achieved: 0.00100036
dataset 110, input: 0.00102494, achieved: 0.00102494
dataset 111, input: 0.00106011, achieved: 0.00106011
dataset 112, input: 0.00107073, achieved: 0.00107073
dataset 113, input: 0.00106728, achieved: 0.00106728
dataset 114, input: 0.00106239, achieved: 0.00106239
dataset 115, input: 0.000961817, achieved: 0.000961817
dataset 116, input: 0.000943223, achieved: 0.000943223
dataset 117, input: 0.000940575, achieved: 0.000940575
dataset 118, input: 0.00138766, achieved: 0.00138766
dataset 119, input: 0.000905834, achieved: 0.000905834
dataset 120, input: 0.000908367, achieved: 0.000908367
dataset 121, input: 0.000912771, achieved: 0.000912771
dataset 122, input: 0.000880879, achieved: 0.000880879
dataset 123, input: 0.000877051, achieved: 0.000877051
dataset 124, input: 0.000880015, achieved: 0.000880015
dataset 125, input: 0.000859176, achieved: 0.000859176
dataset 126, input: 0.000854514, achieved: 0.000854514
dataset 127, input: 0.000853132, achieved: 0.000853132
dataset 128, input: 0.000825846, achieved: 0.000825846
dataset 129, input: 0.000809957, achieved: 0.000809957
dataset 130, input: 0.000803539, achieved: 0.000803539
dataset 131, input: 0.000804057, achieved: 0.000804057
dataset 132, input: 0.000789176, achieved: 0.000789176
dataset 133, input: 0.000771417, achieved: 0.000771417
dataset 134, input: 0.000765516, achieved: 0.000765516
dataset 135, input: 0.00077772, achieved: 0.00077772
dataset 136, input: 0.00121594, achieved: 0.00121594
dataset 137, input: 0.00134175, achieved: 0.00134175
dataset 138, input: 0.00134909, achieved: 0.00134909
dataset 139, input: 0.00132972, achieved: 0.00132972
dataset 140, input: 0.00131879, achieved: 0.00131879
dataset 141, input: 0.00130425, achieved: 0.00130425
dataset 142, input: 0.00086571, achieved: 0.00086571
dataset 143, input: 0.000821931, achieved: 0.000821931
dataset 144, input: 0.000770438, achieved: 0.000770438
dataset 145, input: 0.00115855, achieved: 0.00115855
dataset 146, input: 0.00105343, achieved: 0.00105343
dataset 147, input: 0.00103245, achieved: 0.00103245
dataset 148, input: 0.00103677, achieved: 0.00103677
dataset 149, input: 0.00104975, achieved: 0.00104975
dataset 150, input: 0.00101242, achieved: 0.00101242
dataset 151, input: 0.00100948, achieved: 0.00100948
dataset 152, input: 0.00100396, achieved: 0.00100396
dataset 153, input: 0.00139008, achieved: 0.00139008
dataset 154, input: 0.00128076, achieved: 0.00128076
dataset 155, input: 0.00127316, achieved: 0.00127316
dataset 156, input: 0.00125422, achieved: 0.00125422
dataset 157, input: 0.00122035, achieved: 0.00122035
dataset 158, input: 0.00121491, achieved: 0.00121491
dataset 159, input: 0.00118218, achieved: 0.00118218
dataset 160, input: 0.00121341, achieved: 0.00121341
dataset 161, input: 0.00122665, achieved: 0.00122665
dataset 162, input: 0.000977763, achieved: 0.000977763
dataset 163, input: 0.00096294, achieved: 0.00096294
dataset 164, input: 0.00093715, achieved: 0.00093715
dataset 165, input: 0.000958421, achieved: 0.000958421
dataset 166, input: 0.000948606, achieved: 0.000948606
dataset 167, input: 0.000971575, achieved: 0.000971575
dataset 168, input: 0.000976295, achieved: 0.000976295
dataset 169, input: 0.000953787, achieved: 0.000953787
dataset 170, input: 0.00093833, achieved: 0.00093833
dataset 171, input: 0.000944231, achieved: 0.000944231
dataset 172, input: 0.00140214, achieved: 0.00140214
dataset 173, input: 0.00141161, achieved: 0.00141161
dataset 174, input: 0.00141268, achieved: 0.00141268
dataset 175, input: 0.00141368, achieved: 0.00141368
dataset 176, input: 0.000784484, achieved: 0.000784484
dataset 177, input: 0.000802186, achieved: 0.000802186
dataset 178, input: 0.000786384, achieved: 0.000786384
dataset 179, input: 0.000774093, achieved: 0.000774093
dataset 180, input: 0.000788715, achieved: 0.000788715
dataset 181, input: 0.000790126, achieved: 0.000790126
dataset 182, input: 0.000754204, achieved: 0.000754204
dataset 183, input: 0.000732876, achieved: 0.000732876
dataset 184, input: 0.00106774, achieved: 0.00106774
dataset 185, input: 0.00118186, achieved: 0.00118186
dataset 186, input: 0.00123704, achieved: 0.00123704
dataset 187, input: 0.000771733, achieved: 0.000771733
dataset 188, input: 0.00078057, achieved: 0.00078057
dataset 189, input: 0.000764163, achieved: 0.000764163
dataset 190, input: 0.000742893, achieved: 0.000742893
dataset 191, input: 0.000734718, achieved: 0.000734718
dataset 192, input: 0.000724961, achieved: 0.000724961
dataset 193, input: 0.000728702, achieved: 0.000728702
dataset 194, input: 0.000714628, achieved: 0.000714628
dataset 195, input: 0.00107016, achieved: 0.00107016
dataset 196, input: 0.00137926, achieved: 0.00137926
dataset 197, input: 0.000925291, achieved: 0.000925291
dataset 198, input: 0.00135698, achieved: 0.00135698
dataset 199, input: 0.00131985, achieved: 0.00131985
dataset 200, input: 0.00122967, achieved: 0.00122967
dataset 201, input: 0.00124032, achieved: 0.00124032
dataset 202, input: 0.00137788, achieved: 0.00137788
dataset 203, input: 0.00136677, achieved: 0.00136677
dataset 204, input: 0.00135007, achieved: 0.00135007
dataset 205, input: 0.00130673, achieved: 0.00130673
dataset 206, input: 0.00127486, achieved: 0.00127486
dataset 207, input: 0.00127236, achieved: 0.00127236
dataset 208, input: 0.00125716, achieved: 0.00125716
dataset 209, input: 0.00126082, achieved: 0.00126082
dataset 210, input: 0.00125218, achieved: 0.00125218
dataset 211, input: 0.00120455, achieved: 0.00120455
dataset 212, input: 0.00119145, achieved: 0.00119145
dataset 213, input: 0.00117271, achieved: 0.00117271
dataset 214, input: 0.00115789, achieved: 0.00115789
dataset 215, input: 0.000945497, achieved: 0.000945497
dataset 216, input: 0.000947253, achieved: 0.000947253
dataset 217, input: 0.00136559, achieved: 0.00136559
dataset 218, input: 0.00133326, achieved: 0.00133326
dataset 219, input: 0.00131314, achieved: 0.00131314
dataset 220, input: 0.00128888, achieved: 0.00128888
dataset 221, input: 0.00139362, achieved: 0.00139362
dataset 222, input: 0.000997278, achieved: 0.000997278
dataset 223, input: 0.000999178, achieved: 0.000999178
dataset 224, input: 0.00137652, achieved: 0.00137652
dataset 225, input: 0.00136432, achieved: 0.00136432
dataset 226, input: 0.00135422, achieved: 0.00135422
dataset 227, input: 0.00135094, achieved: 0.00135094
dataset 228, input: 0.00131663, achieved: 0.00131663
dataset 229, input: 0.001115, achieved: 0.001115
dataset 230, input: 0.00110642, achieved: 0.00110642
dataset 231, input: 0.00110372, achieved: 0.00110372
dataset 232, input: 0.00107563, achieved: 0.00107563
dataset 233, input: 0.00104146, achieved: 0.00104146
dataset 234, input: 0.00101227, achieved: 0.00101227
dataset 235, input: 0.00101501, achieved: 0.00101501
dataset 236, input: 0.000999552, achieved: 0.000999552
dataset 237, input: 0.00101253, achieved: 0.00101253
dataset 238, input: 0.00098326, achieved: 0.00098326
dataset 239, input: 0.000964379, achieved: 0.000964379
dataset 240, input: 0.000960551, achieved: 0.000960551
dataset 241, input: 0.000944432, achieved: 0.000944432
dataset 242, input: 0.00131418, achieved: 0.00131418
dataset 243, input: 0.000830249, achieved: 0.000830249
dataset 244, input: 0.000810101, achieved: 0.000810101
dataset 245, input: 0.000771129, achieved: 0.000771129
dataset 246, input: 0.000749023, achieved: 0.000749023
dataset 247, input: 0.00075714, achieved: 0.00075714
dataset 248, input: 0.000729739, achieved: 0.000729739
dataset 249, input: 0.000752794, achieved: 0.000752794
dataset 250, input: 0.000713534, achieved: 0.000713534
dataset 251, input: 0.000729767, achieved: 0.000729767
dataset 252, input: 0.00120029, achieved: 0.00120029
dataset 253, input: 0.00139872, achieved: 0.00139872
dataset 254, input: 0.00135715, achieved: 0.00135715
dataset 255, input: 0.00131714, achieved: 0.00131714
dataset 256, input: 0.00128543, achieved: 0.00128543
dataset 257, input: 0.00125699, achieved: 0.00125699
dataset 258, input: 0.000818995, achieved: 0.000818995
dataset 259, input: 0.00123534, achieved: 0.00123534
dataset 260, input: 0.00127961, achieved: 0.00127961
dataset 261, input: 0.00127486, achieved: 0.00127486
dataset 262, input: 0.00125333, achieved: 0.00125333
dataset 263, input: 0.00124844, achieved: 0.00124844
dataset 264, input: 0.00122772, achieved: 0.00122772
dataset 265, input: 0.0008236, achieved: 0.0008236
dataset 266, input: 0.00121827, achieved: 0.00121827
dataset 267, input: 0.000811972, achieved: 0.000811972
dataset 268, input: 0.00132362, achieved: 0.00132362
dataset 269, input: 0.00139814, achieved: 0.00139814
dataset 270, input: 0.00125598, achieved: 0.00125598
dataset 271, input: 0.00108371, achieved: 0.00108371
dataset 272, input: 0.00107062, achieved: 0.00107062
dataset 273, input: 0.00105916, achieved: 0.00105916
dataset 274, input: 0.00102508, achieved: 0.00102508
dataset 275, input: 0.001115, achieved: 0.001115
dataset 276, input: 0.00109822, achieved: 0.00109822
dataset 277, input: 0.00117478, achieved: 0.00117478
dataset 278, input: 0.00119669, achieved: 0.00119669
dataset 279, input: 0.00114093, achieved: 0.00114093
dataset 280, input: 0.000779562, achieved: 0.000779562
dataset 281, input: 0.00123281, achieved: 0.00123281
dataset 282, input: 0.000626781, achieved: 0.000626781
dataset 283, input: 0.00125362, achieved: 0.00125362
dataset 284, input: 0.00109894, achieved: 0.00109894
dataset 285, input: 0.0012276, achieved: 0.0012276
dataset 286, input: 0.00127763, achieved: 0.00127763
dataset 287, input: 0.00117288, achieved: 0.00117288
dataset 288, input: 0.000738575, achieved: 0.000738575
dataset 289, input: 0.0010606, achieved: 0.0010606
dataset 290, input: 0.00123911, achieved: 0.00123911
dataset 291, input: 0.00130963, achieved: 0.00130963
dataset 292, input: 0.00122003, achieved: 0.00122003
dataset 293, input: 0.000671683, achieved: 0.000671683
dataset 294, input: 0.000733768, achieved: 0.000733768
dataset 295, input: 0.00113394, achieved: 0.00113394
dataset 296, input: 0.00123382, achieved: 0.00123382
dataset 297, input: 0.00115412, achieved: 0.00115412
dataset 298, input: 0.000686219, achieved: 0.000686219
dataset 299, input: 0.00125178, achieved: 0.00125178
dataset 300, input: 0.00123963, achieved: 0.00123963
dataset 301, input: 0.00107753, achieved: 0.00107753
dataset 302, input: 0.00115829, achieved: 0.00115829
dataset 303, input: 0.00119977, achieved: 0.00119977
dataset 304, input: 0.00117927, achieved: 0.00117927
dataset 305, input: 0.000645116, achieved: 0.000645116
dataset 306, input: 0.00123742, achieved: 0.00123742
dataset 307, input: 0.00126571, achieved: 0.00126571
dataset 308, input: 0.00114568, achieved: 0.00114568
dataset 309, input: 0.00119626, achieved: 0.00119626
dataset 310, input: 0.00122441, achieved: 0.00122441
dataset 311, input: 0.000677066, achieved: 0.000677066
dataset 312, input: 0.000732156, achieved: 0.000732156
dataset 313, input: 0.00120647, achieved: 0.00120647
dataset 314, input: 0.00122651, achieved: 0.00122651
dataset 315, input: 0.00116149, achieved: 0.00116149
dataset 316, input: 0.00121459, achieved: 0.00121459
dataset 317, input: 0.00119836, achieved: 0.00119836
dataset 318, input: 0.00127201, achieved: 0.00127201
dataset 319, input: 0.00110162, achieved: 0.00110162
dataset 320, input: 0.00109042, achieved: 0.00109042
dataset 321, input: 0.00119994, achieved: 0.00119994
dataset 322, input: 0.00109324, achieved: 0.00109324
dataset 323, input: 0.00118549, achieved: 0.00118549
dataset 324, input: 0.0011572, achieved: 0.0011572
dataset 325, input: 0.00123549, achieved: 0.00123549
dataset 326, input: 0.00118112, achieved: 0.00118112
dataset 327, input: 0.00118874, achieved: 0.00118874
dataset 328, input: 0.00107531, achieved: 0.00107531
dataset 329, input: 0.00107845, achieved: 0.00107845
dataset 330, input: 0.00124867, achieved: 0.00124867
dataset 331, input: 0.00110691, achieved: 0.00110691
dataset 332, input: 0.0010271, achieved: 0.0010271
dataset 333, input: 0.00117421, achieved: 0.00117421
dataset 334, input: 0.00113149, achieved: 0.00113149
dataset 335, input: 0.00111281, achieved: 0.00111281
dataset 336, input: 0.00110363, achieved: 0.00110363
dataset 337, input: 0.00121197, achieved: 0.00121197
dataset 338, input: 0.00119801, achieved: 0.00119801
dataset 339, input: 0.0011519, achieved: 0.0011519
dataset 340, input: 0.0011559, achieved: 0.0011559
dataset 341, input: 0.00119496, achieved: 0.00119496
dataset 342, input: 0.00104569, achieved: 0.00104569
dataset 343, input: 0.0010756, achieved: 0.0010756
dataset 344, input: 0.00109649, achieved: 0.00109649
dataset 345, input: 0.00113204, achieved: 0.00113204
dataset 346, input: 0.00101803, achieved: 0.00101803
dataset 347, input: 0.00109609, achieved: 0.00109609
dataset 348, input: 0.00106152, achieved: 0.00106152
dataset 349, input: 0.00119758, achieved: 0.00119758
dataset 350, input: 0.00130123, achieved: 0.00130123
dataset 351, input: 0.00127432, achieved: 0.00127432
dataset 352, input: 0.00124073, achieved: 0.00124073
dataset 353, input: 0.00125926, achieved: 0.00125926
dataset 354, input: 0.00121514, achieved: 0.00121514
dataset 355, input: 0.00126171, achieved: 0.00126171
dataset 356, input: 0.00125399, achieved: 0.00125399
dataset 357, input: 0.00125549, achieved: 0.00125549
dataset 358, input: 0.00118394, achieved: 0.00118394
dataset 359, input: 0.00124139, achieved: 0.00124139
dataset 360, input: 0.00060931, achieved: 0.00060931
dataset 361, input: 0.00107773, achieved: 0.00107773
dataset 362, input: 0.000908942, achieved: 0.000908942
dataset 363, input: 0.000896076, achieved: 0.000896076
dataset 364, input: 0.000916282, achieved: 0.000916282
dataset 365, input: 0.00115259, achieved: 0.00115259
dataset 366, input: 0.000930818, achieved: 0.000930818
dataset 367, input: 0.00108648, achieved: 0.00108648
dataset 368, input: 0.00108345, achieved: 0.00108345
dataset 369, input: 0.00106921, achieved: 0.00106921
dataset 370, input: 0.00108187, achieved: 0.00108187
dataset 371, input: 0.00107059, achieved: 0.00107059
dataset 372, input: 0.00106279, achieved: 0.00106279
dataset 373, input: 0.00105715, achieved: 0.00105715
dataset 374, input: 0.000961903, achieved: 0.000961903
dataset 375, input: 0.000869625, achieved: 0.000869625
dataset 376, input: 0.000964868, achieved: 0.000964868
dataset 377, input: 0.000934761, achieved: 0.000934761
dataset 378, input: 0.000975374, achieved: 0.000975374
dataset 379, input: 0.000934962, achieved: 0.000934962
dataset 380, input: 0.000880361, achieved: 0.000880361
dataset 381, input: 0.000916628, achieved: 0.000916628
dataset 382, input: 0.000851981, achieved: 0.000851981
dataset 383, input: 0.000893054, achieved: 0.000893054
dataset 384, input: 0.000926184, achieved: 0.000926184
dataset 385, input: 0.000934502, achieved: 0.000934502
dataset 386, input: 0.000911878, achieved: 0.000911878
dataset 387, input: 0.000905863, achieved: 0.000905863
dataset 388, input: 0.00111212, achieved: 0.00111212
dataset 389, input: 0.000974654, achieved: 0.000974654
dataset 390, input: 0.000943511, achieved: 0.000943511
dataset 391, input: 0.000927824, achieved: 0.000927824
dataset 392, input: 0.000950707, achieved: 0.000950707
dataset 393, input: 0.000920341, achieved: 0.000920341
dataset 394, input: 0.000930559, achieved: 0.000930559
dataset 395, input: 0.000935193, achieved: 0.000935193
dataset 396, input: 0.000913548, achieved: 0.000913548
dataset 397, input: 0.000896191, achieved: 0.000896191
dataset 398, input: 0.00089291, achieved: 0.00089291
dataset 399, input: 0.000873798, achieved: 0.000873798
dataset 400, input: 0.000873827, achieved: 0.000873827
dataset 401, input: 0.000937006, achieved: 0.000937006
dataset 402, input: 0.000879468, achieved: 0.000879468
dataset 403, input: 0.000877079, achieved: 0.000877079
dataset 404, input: 0.00086358, achieved: 0.00086358
dataset 405, input: 0.000869481, achieved: 0.000869481
dataset 406, input: 0.000827832, achieved: 0.000827832
dataset 407, input: 0.000860443, achieved: 0.000860443
dataset 408, input: 0.000857708, achieved: 0.000857708
dataset 409, input: 0.000894004, achieved: 0.000894004
dataset 410, input: 0.000883987, achieved: 0.000883987
dataset 411, input: 0.000877454, achieved: 0.000877454
dataset 412, input: 0.000880908, achieved: 0.000880908
dataset 413, input: 0.000841244, achieved: 0.000841244
dataset 414, input: 0.000850167, achieved: 0.000850167
dataset 415, input: 0.000808259, achieved: 0.000808259
dataset 416, input: 0.000844209, achieved: 0.000844209
dataset 417, input: 0.000806561, achieved: 0.000806561
dataset 418, input: 0.000799595, achieved: 0.000799595
dataset 419, input: 0.000804719, achieved: 0.000804719
dataset 420, input: 0.000806964, achieved: 0.000806964
dataset 421, input: 0.000775274, achieved: 0.000775274
dataset 422, input: 0.000757428, achieved: 0.000757428
dataset 423, input: 0.000966566, achieved: 0.000966566
dataset 424, input: 0.00113167, achieved: 0.00113167
dataset 425, input: 0.00111028, achieved: 0.00111028
dataset 426, input: 0.00109252, achieved: 0.00109252
dataset 427, input: 0.00107577, achieved: 0.00107577
dataset 428, input: 0.00107767, achieved: 0.00107767
dataset 429, input: 0.00107422, achieved: 0.00107422
dataset 430, input: 0.001056, achieved: 0.001056
dataset 431, input: 0.000535654, achieved: 0.000535654
dataset 432, input: 0.00104445, achieved: 0.00104445
dataset 433, input: 0.00103605, achieved: 0.00103605
dataset 434, input: 0.00102966, achieved: 0.00102966
dataset 435, input: 0.00103639, achieved: 0.00103639
dataset 436, input: 0.00101461, achieved: 0.00101461
dataset 437, input: 0.00102399, achieved: 0.00102399
dataset 438, input: 0.000797465, achieved: 0.000797465
dataset 439, input: 0.000865048, achieved: 0.000865048
dataset 440, input: 0.000831257, achieved: 0.000831257
dataset 441, input: 0.000854254, achieved: 0.000854254
dataset 442, input: 0.000833646, achieved: 0.000833646
dataset 443, input: 0.000821211, achieved: 0.000821211
dataset 444, input: 0.000825471, achieved: 0.000825471
dataset 445, input: 0.00080115, achieved: 0.00080115
dataset 446, input: 0.000794731, achieved: 0.000794731
dataset 447, input: 0.000775274, achieved: 0.000775274
dataset 448, input: 0.000776281, achieved: 0.000776281
dataset 449, input: 0.000776195, achieved: 0.000776195
dataset 450, input: 0.000776857, achieved: 0.000776857
dataset 451, input: 0.000766178, achieved: 0.000766178
dataset 452, input: 0.000739093, achieved: 0.000739093
dataset 453, input: 0.000756248, achieved: 0.000756248
dataset 454, input: 0.00108469, achieved: 0.00108469
dataset 455, input: 0.00109134, achieved: 0.00109134
dataset 456, input: 0.00106509, achieved: 0.00106509
dataset 457, input: 0.00102641, achieved: 0.00102641
dataset 458, input: 0.00101348, achieved: 0.00101348
dataset 459, input: 0.000990917, achieved: 0.000990917
dataset 460, input: 0.000954189, achieved: 0.000954189
dataset 461, input: 0.000954276, achieved: 0.000954276
dataset 462, input: 0.000930789, achieved: 0.000930789
dataset 463, input: 0.000939193, achieved: 0.000939193
dataset 464, input: 0.000878029, achieved: 0.000878029
dataset 465, input: 0.000840985, achieved: 0.000840985
dataset 466, input: 0.000837647, achieved: 0.000837647
dataset 467, input: 0.00082881, achieved: 0.00082881
dataset 468, input: 0.000801466, achieved: 0.000801466
dataset 469, input: 0.00081036, achieved: 0.00081036
dataset 470, input: 0.000888535, achieved: 0.000888535
dataset 471, input: 0.00100767, achieved: 0.00100767
dataset 472, input: 0.000980353, achieved: 0.000980353
dataset 473, input: 0.00086453, achieved: 0.00086453
dataset 474, input: 0.000748879, achieved: 0.000748879
dataset 475, input: 0.000746836, achieved: 0.000746836
dataset 476, input: 0.000845677, achieved: 0.000845677
dataset 477, input: 0.000897947, achieved: 0.000897947
dataset 478, input: 0.000767991, achieved: 0.000767991
dataset 479, input: 0.000885657, achieved: 0.000885657
dataset 480, input: 0.00089694, achieved: 0.00089694
dataset 481, input: 0.00107099, achieved: 0.00107099
dataset 482, input: 0.0010572, achieved: 0.0010572
dataset 483, input: 0.00104684, achieved: 0.00104684
dataset 484, input: 0.000883383, achieved: 0.000883383
dataset 485, input: 0.000876849, achieved: 0.000876849
dataset 486, input: 0.00103012, achieved: 0.00103012
dataset 487, input: 0.00100839, achieved: 0.00100839
dataset 488, input: 0.00100042, achieved: 0.00100042
dataset 489, input: 0.000997652, achieved: 0.000997652
dataset 490, input: 0.00100922, achieved: 0.00100922
dataset 491, input: 0.000992989, achieved: 0.000992989
dataset 492, input: 0.000994918, achieved: 0.000994918
dataset 493, input: 0.000976151, achieved: 0.000976151
dataset 494, input: 0.000985937, achieved: 0.000985937
dataset 495, input: 0.000979778, achieved: 0.000979778
dataset 496, input: 0.000979116, achieved: 0.000979116
dataset 497, input: 0.00100036, achieved: 0.00100036
dataset 498, input: 0.000985333, achieved: 0.000985333
dataset 499, input: 0.000982829, achieved: 0.000982829
dataset 500, input: 0.000964494, achieved: 0.000964494
dataset 501, input: 0.000973244, achieved: 0.000973244
dataset 502, input: 0.000739842, achieved: 0.000739842
dataset 503, input: 0.000756795, achieved: 0.000756795
dataset 504, input: 0.000775504, achieved: 0.000775504
dataset 505, input: 0.000751815, achieved: 0.000751815
dataset 506, input: 0.000758148, achieved: 0.000758148
dataset 507, input: 0.00073892, achieved: 0.00073892
dataset 508, input: 0.000737136, achieved: 0.000737136
dataset 509, input: 0.00074698, achieved: 0.00074698
dataset 510, input: 0.0010393, achieved: 0.0010393
dataset 511, input: 0.00106869, achieved: 0.00106869
dataset 512, input: 0.00107036, achieved: 0.00107036
dataset 513, input: 0.00102186, achieved: 0.00102186
dataset 514, input: 0.000717938, achieved: 0.000717938
dataset 515, input: 0.000764998, achieved: 0.000764998
dataset 516, input: 0.000764077, achieved: 0.000764077
dataset 517, input: 0.000749311, achieved: 0.000749311
dataset 518, input: 0.000758752, achieved: 0.000758752
dataset 519, input: 0.000727033, achieved: 0.000727033
dataset 520, input: 0.000731811, achieved: 0.000731811
dataset 521, input: 0.000715865, achieved: 0.000715865
dataset 522, input: 0.000727637, achieved: 0.000727637
dataset 523, input: 0.000723723, achieved: 0.000723723
dataset 524, input: 0.000725162, achieved: 0.000725162
dataset 525, input: 0.000718657, achieved: 0.000718657
dataset 526, input: 0.000706194, achieved: 0.000706194
dataset 527, input: 0.000834538, achieved: 0.000834538
dataset 528, input: 0.000985678, achieved: 0.000985678
dataset 529, input: 0.000981217, achieved: 0.000981217
dataset 530, input: 0.000931393, achieved: 0.000931393
dataset 531, input: 0.000933523, achieved: 0.000933523
dataset 532, input: 0.000914699, achieved: 0.000914699
dataset 533, input: 0.000895558, achieved: 0.000895558
dataset 534, input: 0.000864559, achieved: 0.000864559
dataset 535, input: 0.000850887, achieved: 0.000850887
dataset 536, input: 0.000842166, achieved: 0.000842166
dataset 537, input: 0.0008255, achieved: 0.0008255
dataset 538, input: 0.000985649, achieved: 0.000985649
dataset 539, input: 0.000962853, achieved: 0.000962853
dataset 540, input: 0.000963515, achieved: 0.000963515
dataset 541, input: 0.000948778, achieved: 0.000948778
dataset 542, input: 0.000944605, achieved: 0.000944605
dataset 543, input: 0.000926903, achieved: 0.000926903
dataset 544, input: 0.000911389, achieved: 0.000911389
dataset 545, input: 0.000893457, achieved: 0.000893457
dataset 546, input: 0.000892248, achieved: 0.000892248
dataset 547, input: 0.000896594, achieved: 0.000896594
dataset 548, input: 0.000883038, achieved: 0.000883038
dataset 549, input: 0.000850369, achieved: 0.000850369
dataset 550, input: 0.000866113, achieved: 0.000866113
dataset 551, input: 0.000871323, achieved: 0.000871323
dataset 552, input: 0.000874806, achieved: 0.000874806
dataset 553, input: 0.000835574, achieved: 0.000835574
dataset 554, input: 0.000844814, achieved: 0.000844814
dataset 555, input: 0.000834912, achieved: 0.000834912
dataset 556, input: 0.00081059, achieved: 0.00081059
dataset 557, input: 0.000825903, achieved: 0.000825903
dataset 558, input: 0.00081747, achieved: 0.00081747
dataset 559, input: 0.000806791, achieved: 0.000806791
dataset 560, input: 0.000790126, achieved: 0.000790126
dataset 561, input: 0.000794069, achieved: 0.000794069
dataset 562, input: 0.000785204, achieved: 0.000785204
dataset 563, input: 0.000982656, achieved: 0.000982656
dataset 564, input: 0.000985477, achieved: 0.000985477
dataset 565, input: 0.000982829, achieved: 0.000982829
dataset 566, input: 0.000977216, achieved: 0.000977216
dataset 567, input: 0.000954909, achieved: 0.000954909
dataset 568, input: 0.000929609, achieved: 0.000929609
dataset 569, input: 0.000961961, achieved: 0.000961961
dataset 570, input: 0.000917318, achieved: 0.000917318
dataset 571, input: 0.000894954, achieved: 0.000894954
dataset 572, input: 0.000898782, achieved: 0.000898782
dataset 573, input: 0.000891701, achieved: 0.000891701
dataset 574, input: 0.000847087, achieved: 0.000847087
dataset 575, input: 0.000820722, achieved: 0.000820722
dataset 576, input: 0.00118843, achieved: 0.00118843
dataset 577, input: 0.000583664, achieved: 0.000583664
dataset 578, input: 0.000977936, achieved: 0.000977936
dataset 579, input: 0.000581765, achieved: 0.000581765
dataset 580, input: 0.000965415, achieved: 0.000965415
dataset 581, input: 0.00114482, achieved: 0.00114482
dataset 582, input: 0.00115279, achieved: 0.00115279
dataset 583, input: 0.00112375, achieved: 0.00112375
dataset 584, input: 0.00110306, achieved: 0.00110306
dataset 585, input: 0.001103, achieved: 0.001103
dataset 586, input: 0.00109773, achieved: 0.00109773
dataset 587, input: 0.000557587, achieved: 0.000557587
dataset 588, input: 0.0010625, achieved: 0.0010625
dataset 589, input: 0.00105418, achieved: 0.00105418
dataset 590, input: 0.00105697, achieved: 0.00105697
dataset 591, input: 0.00103662, achieved: 0.00103662
dataset 592, input: 0.00053528, achieved: 0.00053528
dataset 593, input: 0.00104655, achieved: 0.00104655
dataset 594, input: 0.00101345, achieved: 0.00101345
dataset 595, input: 0.000978137, achieved: 0.000978137
dataset 596, input: 0.000989391, achieved: 0.000989391
dataset 597, input: 0.000980066, achieved: 0.000980066
dataset 598, input: 0.000958651, achieved: 0.000958651
dataset 599, input: 0.000949354, achieved: 0.000949354
dataset 600, input: 0.000947713, achieved: 0.000947713
dataset 601, input: 0.000930501, achieved: 0.000930501
dataset 602, input: 0.000927853, achieved: 0.000927853
dataset 603, input: 0.000799135, achieved: 0.000799135
dataset 604, input: 0.000789665, achieved: 0.000789665
dataset 605, input: 0.0007924, achieved: 0.0007924
dataset 606, input: 0.000782354, achieved: 0.000782354
dataset 607, input: 0.000770409, achieved: 0.000770409
dataset 608, input: 0.000770265, achieved: 0.000770265
dataset 609, input: 0.000753254, achieved: 0.000753254
dataset 610, input: 0.000736445, achieved: 0.000736445
dataset 611, input: 0.000745166, achieved: 0.000745166
dataset 612, input: 0.000732531, achieved: 0.000732531
dataset 613, input: 0.000734488, achieved: 0.000734488
dataset 614, input: 0.000687917, achieved: 0.000687917
dataset 615, input: 0.000692292, achieved: 0.000692292
dataset 616, input: 0.000684607, achieved: 0.000684607
dataset 617, input: 0.000829069, achieved: 0.000829069
dataset 618, input: 0.000939078, achieved: 0.000939078
dataset 619, input: 0.00093053, achieved: 0.00093053
dataset 620, input: 0.000926011, achieved: 0.000926011
dataset 621, input: 0.00090923, achieved: 0.00090923
dataset 622, input: 0.000900451, achieved: 0.000900451
dataset 623, input: 0.000894205, achieved: 0.000894205
dataset 624, input: 0.000884448, achieved: 0.000884448
dataset 625, input: 0.000879123, achieved: 0.000879123
dataset 626, input: 0.000876389, achieved: 0.000876389
dataset 627, input: 0.000872647, achieved: 0.000872647
dataset 628, input: 0.00083379, achieved: 0.00083379
dataset 629, input: 0.000841101, achieved: 0.000841101
dataset 630, input: 0.00086335, achieved: 0.00086335
dataset 631, input: 0.000903761, achieved: 0.000903761
dataset 632, input: 0.000872647, achieved: 0.000872647
dataset 633, input: 0.000878404, achieved: 0.000878404
dataset 634, input: 0.000876763, achieved: 0.000876763
dataset 635, input: 0.000868473, achieved: 0.000868473
dataset 636, input: 0.00084467, achieved: 0.00084467
dataset 637, input: 0.000835344, achieved: 0.000835344
dataset 638, input: 0.000840151, achieved: 0.000840151
dataset 639, input: 0.000828868, achieved: 0.000828868
dataset 640, input: 0.000811857, achieved: 0.000811857
dataset 641, input: 0.000798156, achieved: 0.000798156
dataset 642, input: 0.000792572, achieved: 0.000792572
dataset 643, input: 0.00105035, achieved: 0.00105035
dataset 644, input: 0.000988672, achieved: 0.000988672
dataset 645, input: 0.000968523, achieved: 0.000968523
dataset 646, input: 0.000888881, achieved: 0.000888881
dataset 647, input: 0.00093761, achieved: 0.00093761
dataset 648, input: 0.000921204, achieved: 0.000921204
dataset 649, input: 0.00106351, achieved: 0.00106351
dataset 650, input: 0.0010184, achieved: 0.0010184
dataset 651, input: 0.00103274, achieved: 0.00103274
dataset 652, input: 0.00101029, achieved: 0.00101029
dataset 653, input: 0.00108702, achieved: 0.00108702
dataset 654, input: 0.000783534, achieved: 0.000783534
dataset 655, input: 0.000794961, achieved: 0.000794961
dataset 656, input: 0.000558709, achieved: 0.000558709
dataset 657, input: 0.000806762, achieved: 0.000806762
dataset 658, input: 0.000806359, achieved: 0.000806359
dataset 659, input: 0.000549038, achieved: 0.000549038
dataset 660, input: 0.000786902, achieved: 0.000786902
dataset 661, input: 0.000804316, achieved: 0.000804316
dataset 662, input: 0.000810015, achieved: 0.000810015
dataset 663, input: 0.000784858, achieved: 0.000784858
dataset 664, input: 0.000803308, achieved: 0.000803308
dataset 665, input: 0.000817786, achieved: 0.000817786
dataset 666, input: 0.000810447, achieved: 0.000810447
dataset 667, input: 0.000796228, achieved: 0.000796228
dataset 668, input: 0.000810216, achieved: 0.000810216
dataset 669, input: 0.000544087, achieved: 0.000544087
dataset 670, input: 0.00081249, achieved: 0.00081249
dataset 671, input: 0.000796775, achieved: 0.000796775
dataset 672, input: 0.000799768, achieved: 0.000799768
dataset 673, input: 0.000819312, achieved: 0.000819312
dataset 674, input: 0.000784772, achieved: 0.000784772
dataset 675, input: 0.000563602, achieved: 0.000563602
dataset 676, input: 0.000821327, achieved: 0.000821327
dataset 677, input: 0.000785779, achieved: 0.000785779
dataset 678, input: 0.000550103, achieved: 0.000550103
dataset 679, input: 0.000578368, achieved: 0.000578368
dataset 680, input: 0.000810792, achieved: 0.000810792
dataset 681, input: 0.000559717, achieved: 0.000559717
dataset 682, input: 0.000795969, achieved: 0.000795969
dataset 683, input: 0.000786211, achieved: 0.000786211
dataset 684, input: 0.000799192, achieved: 0.000799192
dataset 685, input: 0.000789751, achieved: 0.000789751
dataset 686, input: 0.000782038, achieved: 0.000782038
dataset 687, input: 0.000817959, achieved: 0.000817959
dataset 688, input: 0.000797552, achieved: 0.000797552
dataset 689, input: 0.000795854, achieved: 0.000795854
dataset 690, input: 0.000785578, achieved: 0.000785578
dataset 691, input: 0.000807626, achieved: 0.000807626
dataset 692, input: 0.000794184, achieved: 0.000794184
dataset 693, input: 0.000582973, achieved: 0.000582973
dataset 694, input: 0.000549384, achieved: 0.000549384
dataset 695, input: 0.000805381, achieved: 0.000805381
dataset 696, input: 0.00078696, achieved: 0.00078696
dataset 697, input: 0.000796918, achieved: 0.000796918
dataset 698, input: 0.000817671, achieved: 0.000817671
dataset 699, input: 0.000756075, achieved: 0.000756075
dataset 700, input: 0.000672547, achieved: 0.000672547
dataset 701, input: 0.000692896, achieved: 0.000692896
dataset 702, input: 0.000674533, achieved: 0.000674533
dataset 703, input: 0.000660486, achieved: 0.000660486
dataset 704, input: 0.000682966, achieved: 0.000682966
dataset 705, input: 0.000660688, achieved: 0.000660688
dataset 706, input: 0.000671568, achieved: 0.000671568
dataset 707, input: 0.000639245, achieved: 0.000639245
dataset 708, input: 0.000680318, achieved: 0.000680318
dataset 709, input: 0.000661033, achieved: 0.000661033
dataset 710, input: 0.000649463, achieved: 0.000649463
dataset 711, input: 0.000676547, achieved: 0.000676547
dataset 712, input: 0.000658673, achieved: 0.000658673
dataset 713, input: 0.000660371, achieved: 0.000660371
dataset 714, input: 0.000675857, achieved: 0.000675857
dataset 715, input: 0.000676663, achieved: 0.000676663
dataset 716, input: 0.000657522, achieved: 0.000657522
dataset 717, input: 0.000652111, achieved: 0.000652111
dataset 718, input: 0.000632941, achieved: 0.000632941
dataset 719, input: 0.00064408, achieved: 0.00064408
dataset 720, input: 0.000624824, achieved: 0.000624824
dataset 721, input: 0.000614923, achieved: 0.000614923
dataset 722, input: 0.000621025, achieved: 0.000621025
dataset 723, input: 0.000620708, achieved: 0.000620708
dataset 724, input: 0.000637518, achieved: 0.000637518
dataset 725, input: 0.000616419, achieved: 0.000616419
dataset 726, input: 0.000616736, achieved: 0.000616736
dataset 727, input: 0.00061026, achieved: 0.00061026
dataset 728, input: 0.000629314, achieved: 0.000629314
dataset 729, input: 0.000617197, achieved: 0.000617197
dataset 730, input: 0.000633833, achieved: 0.000633833
dataset 731, input: 0.000612706, achieved: 0.000612706
dataset 732, input: 0.000616707, achieved: 0.000616707
dataset 733, input: 0.000578426, achieved: 0.000578426
dataset 734, input: 0.000607324, achieved: 0.000607324
dataset 735, input: 0.000609627, achieved: 0.000609627
dataset 736, input: 0.000615355, achieved: 0.000615355
dataset 737, input: 0.000613685, achieved: 0.000613685
dataset 738, input: 0.000614952, achieved: 0.000614952
dataset 739, input: 0.000984844, achieved: 0.000984844
dataset 740, input: 0.000985448, achieved: 0.000985448
dataset 741, input: 0.000887096, achieved: 0.000887096
dataset 742, input: 0.000855233, achieved: 0.000855233
dataset 743, input: 0.000844555, achieved: 0.000844555
dataset 744, input: 0.000841043, achieved: 0.000841043
dataset 745, input: 0.000844727, achieved: 0.000844727
dataset 746, input: 0.000837733, achieved: 0.000837733
dataset 747, input: 0.000836898, achieved: 0.000836898
dataset 748, input: 0.000836927, achieved: 0.000836927
dataset 749, input: 0.000839949, achieved: 0.000839949
dataset 750, input: 0.0008196, achieved: 0.0008196
dataset 751, input: 0.000815023, achieved: 0.000815023
dataset 752, input: 0.00061026, achieved: 0.00061026
dataset 753, input: 0.000602287, achieved: 0.000602287
dataset 754, input: 0.000608331, achieved: 0.000608331
dataset 755, input: 0.000524515, achieved: 0.000524515
dataset 756, input: 0.000585823, achieved: 0.000585823
dataset 757, input: 0.000589219, achieved: 0.000589219
dataset 758, input: 0.000566653, achieved: 0.000566653
dataset 759, input: 0.000586082, achieved: 0.000586082
dataset 760, input: 0.000583434, achieved: 0.000583434
dataset 761, input: 0.000583204, achieved: 0.000583204
dataset 762, input: 0.000566164, achieved: 0.000566164
dataset 763, input: 0.000581909, achieved: 0.000581909
dataset 764, input: 0.000556292, achieved: 0.000556292
dataset 765, input: 0.000557299, achieved: 0.000557299
dataset 766, input: 0.000575893, achieved: 0.000575893
dataset 767, input: 0.00057195, achieved: 0.00057195
dataset 768, input: 0.000563602, achieved: 0.000563602
dataset 769, input: 0.000588644, achieved: 0.000588644
dataset 770, input: 0.000575519, achieved: 0.000575519
dataset 771, input: 0.000551945, achieved: 0.000551945
dataset 772, input: 0.000562595, achieved: 0.000562595
dataset 773, input: 0.00058021, achieved: 0.00058021
dataset 774, input: 0.000539079, achieved: 0.000539079
dataset 775, input: 0.000576238, achieved: 0.000576238
dataset 776, input: 0.000571086, achieved: 0.000571086
dataset 777, input: 0.000553989, achieved: 0.000553989
dataset 778, input: 0.000556349, achieved: 0.000556349
dataset 779, input: 0.000565214, achieved: 0.000565214
dataset 780, input: 0.000578195, achieved: 0.000578195
dataset 781, input: 0.000553298, achieved: 0.000553298
dataset 782, input: 0.000540576, achieved: 0.000540576
dataset 783, input: 0.000950016, achieved: 0.000950016
dataset 784, input: 0.000960896, achieved: 0.000960896
dataset 785, input: 0.000938359, achieved: 0.000938359
dataset 786, input: 0.000923248, achieved: 0.000923248
dataset 787, input: 0.000889629, achieved: 0.000889629
dataset 788, input: 0.000852585, achieved: 0.000852585
dataset 789, input: 0.000825846, achieved: 0.000825846
dataset 790, input: 0.000832552, achieved: 0.000832552
dataset 791, input: 0.00084585, achieved: 0.00084585
dataset 792, input: 0.000820204, achieved: 0.000820204
dataset 793, input: 0.000830307, achieved: 0.000830307
dataset 794, input: 0.000823831, achieved: 0.000823831
dataset 795, input: 0.000831113, achieved: 0.000831113
dataset 796, input: 0.000809497, achieved: 0.000809497
dataset 797, input: 0.00059866, achieved: 0.00059866
dataset 798, input: 0.000603697, achieved: 0.000603697
dataset 799, input: 0.000610577, achieved: 0.000610577
dataset 800, input: 0.000628364, achieved: 0.000628364
dataset 801, input: 0.000412606, achieved: 0.000412606
dataset 802, input: 0.0006235, achieved: 0.0006235
dataset 803, input: 0.00061429, achieved: 0.00061429
dataset 804, input: 0.00060246, achieved: 0.00060246
dataset 805, input: 0.000600128, achieved: 0.000600128
dataset 806, input: 0.000587953, achieved: 0.000587953
dataset 807, input: 0.000595724, achieved: 0.000595724
dataset 808, input: 0.000584413, achieved: 0.000584413
dataset 809, input: 0.000590313, achieved: 0.000590313
dataset 810, input: 0.000585967, achieved: 0.000585967
dataset 811, input: 0.000584528, achieved: 0.000584528
dataset 812, input: 0.000583233, achieved: 0.000583233
dataset 813, input: 0.000574425, achieved: 0.000574425
dataset 814, input: 0.00100597, achieved: 0.00100597
dataset 815, input: 0.000929868, achieved: 0.000929868
dataset 816, input: 0.000955168, achieved: 0.000955168
dataset 817, input: 0.000952865, achieved: 0.000952865
dataset 818, input: 0.0009478, achieved: 0.0009478
dataset 819, input: 0.000965415, achieved: 0.000965415
dataset 820, input: 0.000936517, achieved: 0.000936517
dataset 821, input: 0.000923795, achieved: 0.000923795
dataset 822, input: 0.000920542, achieved: 0.000920542
dataset 823, input: 0.000888017, achieved: 0.000888017
dataset 824, input: 0.000894263, achieved: 0.000894263
dataset 825, input: 0.000868387, achieved: 0.000868387
dataset 826, input: 0.000871208, achieved: 0.000871208
dataset 827, input: 0.000862659, achieved: 0.000862659
dataset 828, input: 0.000899386, achieved: 0.000899386
dataset 829, input: 0.00089078, achieved: 0.00089078
dataset 830, input: 0.000872129, achieved: 0.000872129
dataset 831, input: 0.000858716, achieved: 0.000858716
dataset 832, input: 0.000942388, achieved: 0.000942388
dataset 833, input: 0.00097664, achieved: 0.00097664
dataset 834, input: 0.00097641, achieved: 0.00097641
dataset 835, input: 0.000943885, achieved: 0.000943885
dataset 836, input: 0.000972927, achieved: 0.000972927
dataset 837, input: 0.000950476, achieved: 0.000950476
dataset 838, input: 0.000932113, achieved: 0.000932113
dataset 839, input: 0.000936862, achieved: 0.000936862
dataset 840, input: 0.000922039, achieved: 0.000922039
dataset 841, input: 0.000921406, achieved: 0.000921406
dataset 842, input: 0.000905517, achieved: 0.000905517
dataset 843, input: 0.000902294, achieved: 0.000902294
dataset 844, input: 0.000892507, achieved: 0.000892507
dataset 845, input: 0.000902006, achieved: 0.000902006
dataset 846, input: 0.000838193, achieved: 0.000838193
dataset 847, input: 0.000579635, achieved: 0.000579635
dataset 848, input: 0.000568639, achieved: 0.000568639
dataset 849, input: 0.000555486, achieved: 0.000555486
dataset 850, input: 0.000559688, achieved: 0.000559688
dataset 851, input: 0.000556637, achieved: 0.000556637
dataset 852, input: 0.000539022, achieved: 0.000539022
dataset 853, input: 0.000539396, achieved: 0.000539396
dataset 854, input: 0.000531797, achieved: 0.000531797
dataset 855, input: 0.000561185, achieved: 0.000561185
dataset 856, input: 0.000531624, achieved: 0.000531624
dataset 857, input: 0.000533236, achieved: 0.000533236
dataset 858, input: 0.000515045, achieved: 0.000515045
dataset 859, input: 0.000518384, achieved: 0.000518384
dataset 860, input: 0.000511016, achieved: 0.000511016
dataset 861, input: 0.000516887, achieved: 0.000516887
dataset 862, input: 0.000529984, achieved: 0.000529984
dataset 863, input: 0.00092958, achieved: 0.00092958
dataset 864, input: 0.000910324, achieved: 0.000910324
dataset 865, input: 0.000883469, achieved: 0.000883469
dataset 866, input: 0.000893313, achieved: 0.000893313
dataset 867, input: 0.00088062, achieved: 0.00088062
dataset 868, input: 0.000853679, achieved: 0.000853679
dataset 869, input: 0.000866113, achieved: 0.000866113
dataset 870, input: 0.000823744, achieved: 0.000823744
dataset 871, input: 0.000830652, achieved: 0.000830652
dataset 872, input: 0.000821183, achieved: 0.000821183
dataset 873, input: 0.000805956, achieved: 0.000805956
dataset 874, input: 0.000800516, achieved: 0.000800516
dataset 875, input: 0.000789492, achieved: 0.000789492
dataset 876, input: 0.000612505, achieved: 0.000612505
dataset 877, input: 0.000419428, achieved: 0.000419428
dataset 878, input: 0.000620104, achieved: 0.000620104
dataset 879, input: 0.000616045, achieved: 0.000616045
dataset 880, input: 0.000616419, achieved: 0.000616419
dataset 881, input: 0.000610979, achieved: 0.000610979
dataset 882, input: 0.000413757, achieved: 0.000413757
dataset 883, input: 0.000605482, achieved: 0.000605482
dataset 884, input: 0.000603064, achieved: 0.000603064
dataset 885, input: 0.000604474, achieved: 0.000604474
dataset 886, input: 0.000586456, achieved: 0.000586456
dataset 887, input: 0.000584413, achieved: 0.000584413
dataset 888, input: 0.000582801, achieved: 0.000582801
dataset 889, input: 0.000590774, achieved: 0.000590774
dataset 890, input: 0.000586974, achieved: 0.000586974
dataset 891, input: 0.000585046, achieved: 0.000585046
dataset 892, input: 0.000585823, achieved: 0.000585823
dataset 893, input: 0.000566769, achieved: 0.000566769
dataset 894, input: 0.000584585, achieved: 0.000584585
dataset 895, input: 0.000597538, achieved: 0.000597538
dataset 896, input: 0.000563027, achieved: 0.000563027
dataset 897, input: 0.000609914, achieved: 0.000609914
dataset 898, input: 0.000603697, achieved: 0.000603697
dataset 899, input: 0.000606576, achieved: 0.000606576
dataset 900, input: 0.000595465, achieved: 0.000595465
dataset 901, input: 0.000593393, achieved: 0.000593393
dataset 902, input: 0.000598488, achieved: 0.000598488
dataset 903, input: 0.00058542, achieved: 0.00058542
dataset 904, input: 0.000573475, achieved: 0.000573475
dataset 905, input: 0.000561041, achieved: 0.000561041
dataset 906, input: 0.000564207, achieved: 0.000564207
dataset 907, input: 0.000552291, achieved: 0.000552291
dataset 908, input: 0.000555111, achieved: 0.000555111
dataset 909, input: 0.000554248, achieved: 0.000554248
dataset 910, input: 0.000550679, achieved: 0.000550679
dataset 911, input: 0.00055088, achieved: 0.00055088
dataset 912, input: 0.000543915, achieved: 0.000543915
dataset 913, input: 0.000549211, achieved: 0.000549211
dataset 914, input: 0.00052961, achieved: 0.00052961
dataset 915, input: 0.000530991, achieved: 0.000530991
dataset 916, input: 0.000537899, achieved: 0.000537899
dataset 917, input: 0.000528832, achieved: 0.000528832
dataset 918, input: 0.000568208, achieved: 0.000568208
dataset 919, input: 0.000579606, achieved: 0.000579606
dataset 920, input: 0.000583779, achieved: 0.000583779
dataset 921, input: 0.000594429, achieved: 0.000594429
dataset 922, input: 0.000576267, achieved: 0.000576267
dataset 923, input: 0.000576296, achieved: 0.000576296
dataset 924, input: 0.000581765, achieved: 0.000581765
dataset 925, input: 0.000562394, achieved: 0.000562394
dataset 926, input: 0.000557385, achieved: 0.000557385
dataset 927, input: 0.000563833, achieved: 0.000563833
dataset 928, input: 0.000560782, achieved: 0.000560782
dataset 929, input: 0.000570769, achieved: 0.000570769
dataset 930, input: 0.000565099, achieved: 0.000565099
dataset 931, input: 0.000560235, achieved: 0.000560235
dataset 932, input: 0.000555486, achieved: 0.000555486
dataset 933, input: 0.000551197, achieved: 0.000551197
dataset 934, input: 0.000529207, achieved: 0.000529207
dataset 935, input: 0.000545555, achieved: 0.000545555
dataset 936, input: 0.000978943, achieved: 0.000978943
dataset 937, input: 0.000794875, achieved: 0.000794875
dataset 938, input: 0.000782469, achieved: 0.000782469
dataset 939, input: 0.000778411, achieved: 0.000778411
dataset 940, input: 0.000757543, achieved: 0.000757543
dataset 941, input: 0.000761227, achieved: 0.000761227
dataset 942, input: 0.000739295, achieved: 0.000739295
dataset 943, input: 0.000746865, achieved: 0.000746865
dataset 944, input: 0.000745051, achieved: 0.000745051
dataset 945, input: 0.000748419, achieved: 0.000748419
dataset 946, input: 0.000720499, achieved: 0.000720499
dataset 947, input: 0.000718974, achieved: 0.000718974
dataset 948, input: 0.000719434, achieved: 0.000719434
dataset 949, input: 0.000721679, achieved: 0.000721679
dataset 950, input: 0.000742115, achieved: 0.000742115
dataset 951, input: 0.000708871, achieved: 0.000708871
dataset 952, input: 0.000731437, achieved: 0.000731437
dataset 953, input: 0.000681009, achieved: 0.000681009
dataset 954, input: 0.000701531, achieved: 0.000701531
dataset 955, input: 0.000674878, achieved: 0.000674878
dataset 956, input: 0.000678188, achieved: 0.000678188
dataset 957, input: 0.000671366, achieved: 0.000671366
dataset 958, input: 0.000651506, achieved: 0.000651506
dataset 959, input: 0.000704956, achieved: 0.000704956
dataset 960, input: 0.000504597, achieved: 0.000504597
dataset 961, input: 0.000503158, achieved: 0.000503158
dataset 962, input: 0.0004972, achieved: 0.0004972
dataset 963, input: 0.000493343, achieved: 0.000493343
dataset 964, input: 0.000821039, achieved: 0.000821039
dataset 965, input: 0.000953787, achieved: 0.000953787
dataset 966, input: 0.000485082, achieved: 0.000485082
dataset 967, input: 0.000487557, achieved: 0.000487557
dataset 968, input: 0.000795364, achieved: 0.000795364
dataset 969, input: 0.000846627, achieved: 0.000846627
dataset 970, input: 0.000772712, achieved: 0.000772712
dataset 971, input: 0.000839518, achieved: 0.000839518
dataset 972, input: 0.000831314, achieved: 0.000831314
dataset 973, input: 0.000938531, achieved: 0.000938531
dataset 974, input: 0.000954506, achieved: 0.000954506
dataset 975, input: 0.000517204, achieved: 0.000517204
dataset 976, input: 0.000959255, achieved: 0.000959255
dataset 977, input: 0.000892853, achieved: 0.000892853
dataset 978, input: 0.000813757, achieved: 0.000813757
dataset 979, input: 0.000743324, achieved: 0.000743324
dataset 980, input: 0.000970884, achieved: 0.000970884
dataset 981, input: 0.000809439, achieved: 0.000809439
dataset 982, input: 0.00086548, achieved: 0.00086548
dataset 983, input: 0.000802675, achieved: 0.000802675
dataset 984, input: 0.000792716, achieved: 0.000792716
dataset 985, input: 0.000919391, achieved: 0.000919391
dataset 986, input: 0.000747843, achieved: 0.000747843
dataset 987, input: 0.000890636, achieved: 0.000890636
dataset 988, input: 0.000862947, achieved: 0.000862947
dataset 989, input: 0.000880936, achieved: 0.000880936
dataset 990, input: 0.000513462, achieved: 0.000513462
dataset 991, input: 0.000500021, achieved: 0.000500021
dataset 992, input: 0.000514786, achieved: 0.000514786
dataset 993, input: 0.000518355, achieved: 0.000518355
dataset 994, input: 0.000511534, achieved: 0.000511534
dataset 995, input: 0.000871409, achieved: 0.000871409
dataset 996, input: 0.00083969, achieved: 0.00083969
dataset 997, input: 0.000492508, achieved: 0.000492508
dataset 998, input: 0.000926586, achieved: 0.000926586
dataset 999, input: 0.000917635, achieved: 0.000917635
dataset 1000, input: 0.000902409, achieved: 0.000902409
dataset 1001, input: 0.000836639, achieved: 0.000836639
dataset 1002, input: 0.000817326, achieved: 0.000817326
dataset 1003, input: 0.00080492, achieved: 0.00080492
dataset 1004, input: 0.000801869, achieved: 0.000801869
dataset 1005, input: 0.000793062, achieved: 0.000793062
dataset 1006, input: 0.000802992, achieved: 0.000802992
dataset 1007, input: 0.00078506, achieved: 0.00078506
dataset 1008, input: 0.000782987, achieved: 0.000782987
dataset 1009, input: 0.000772165, achieved: 0.000772165
dataset 1010, input: 0.000744763, achieved: 0.000744763
dataset 1011, input: 0.000744763, achieved: 0.000744763
dataset 1012, input: 0.000755442, achieved: 0.000755442
dataset 1013, input: 0.000747728, achieved: 0.000747728
dataset 1014, input: 0.000742576, achieved: 0.000742576
dataset 1015, input: 0.000747152, achieved: 0.000747152
dataset 1016, input: 0.000741252, achieved: 0.000741252
dataset 1017, input: 0.000737021, achieved: 0.000737021
dataset 1018, input: 0.00072522, achieved: 0.00072522
dataset 1019, input: 0.000532718, achieved: 0.000532718
dataset 1020, input: 0.000537784, achieved: 0.000537784
dataset 1021, input: 0.000518355, achieved: 0.000518355
dataset 1022, input: 0.000522414, achieved: 0.000522414
dataset 1023, input: 0.000534071, achieved: 0.000534071
dataset 1024, input: 0.000529869, achieved: 0.000529869
dataset 1025, input: 0.000519622, achieved: 0.000519622
dataset 1026, input: 0.00052037, achieved: 0.00052037
dataset 1027, input: 0.000528286, achieved: 0.000528286
dataset 1028, input: 0.000516312, achieved: 0.000516312
dataset 1029, input: 0.000499992, achieved: 0.000499992
dataset 1030, input: 0.000511476, achieved: 0.000511476
dataset 1031, input: 0.00050169, achieved: 0.00050169
dataset 1032, input: 0.00048396, achieved: 0.00048396
dataset 1033, input: 0.000497056, achieved: 0.000497056
dataset 1034, input: 0.000965242, achieved: 0.000965242
dataset 1035, input: 0.00089363, achieved: 0.00089363
dataset 1036, input: 0.000792284, achieved: 0.000792284
dataset 1037, input: 0.000812145, achieved: 0.000812145
dataset 1038, input: 0.000779793, achieved: 0.000779793
dataset 1039, input: 0.000767387, achieved: 0.000767387
dataset 1040, input: 0.000780771, achieved: 0.000780771
dataset 1041, input: 0.000748678, achieved: 0.000748678
dataset 1042, input: 0.000746433, achieved: 0.000746433
dataset 1043, input: 0.000758896, achieved: 0.000758896
dataset 1044, input: 0.000737568, achieved: 0.000737568
dataset 1045, input: 0.000737251, achieved: 0.000737251
dataset 1046, input: 0.000734344, achieved: 0.000734344
dataset 1047, input: 0.000829789, achieved: 0.000829789
dataset 1048, input: 0.000944058, achieved: 0.000944058
dataset 1049, input: 0.00074036, achieved: 0.00074036
dataset 1050, input: 0.000855406, achieved: 0.000855406
dataset 1051, input: 0.000902437, achieved: 0.000902437
dataset 1052, input: 0.000537496, achieved: 0.000537496
dataset 1053, input: 0.000527739, achieved: 0.000527739
dataset 1054, input: 0.000966595, achieved: 0.000966595
dataset 1055, input: 0.000525983, achieved: 0.000525983
dataset 1056, input: 0.000525637, achieved: 0.000525637
dataset 1057, input: 0.000511908, achieved: 0.000511908
dataset 1058, input: 0.000517406, achieved: 0.000517406
dataset 1059, input: 0.000518758, achieved: 0.000518758
dataset 1060, input: 0.00050287, achieved: 0.00050287
dataset 1061, input: 0.000940633, achieved: 0.000940633
dataset 1062, input: 0.000509202, achieved: 0.000509202
dataset 1063, input: 0.00048986, achieved: 0.00048986
dataset 1064, input: 0.000501604, achieved: 0.000501604
dataset 1065, input: 0.00052037, achieved: 0.00052037
dataset 1066, input: 0.000504827, achieved: 0.000504827
dataset 1067, input: 0.000495127, achieved: 0.000495127
dataset 1068, input: 0.000504338, achieved: 0.000504338
dataset 1069, input: 0.000496279, achieved: 0.000496279
dataset 1070, input: 0.000496912, achieved: 0.000496912
dataset 1071, input: 0.000503762, achieved: 0.000503762
dataset 1072, input: 0.000503906, achieved: 0.000503906
dataset 1073, input: 0.000514182, achieved: 0.000514182
dataset 1074, input: 0.00050713, achieved: 0.00050713
dataset 1075, input: 0.000964465, achieved: 0.000964465
dataset 1076, input: 0.000485284, achieved: 0.000485284
dataset 1077, input: 0.000972352, achieved: 0.000972352
dataset 1078, input: 0.000958967, achieved: 0.000958967
dataset 1079, input: 0.00091444, achieved: 0.00091444
dataset 1080, input: 0.000463178, achieved: 0.000463178
dataset 1081, input: 0.000540634, achieved: 0.000540634
dataset 1082, input: 0.000524745, achieved: 0.000524745
dataset 1083, input: 0.000541986, achieved: 0.000541986
dataset 1084, input: 0.000535798, achieved: 0.000535798
dataset 1085, input: 0.000507619, achieved: 0.000507619
dataset 1086, input: 0.000522068, achieved: 0.000522068
dataset 1087, input: 0.000512368, achieved: 0.000512368
dataset 1088, input: 0.000476447, achieved: 0.000476447
dataset 1089, input: 0.000523105, achieved: 0.000523105
dataset 1090, input: 0.000883786, achieved: 0.000883786
dataset 1091, input: 0.00091775, achieved: 0.00091775
dataset 1092, input: 0.000892248, achieved: 0.000892248
dataset 1093, input: 0.000882491, achieved: 0.000882491
dataset 1094, input: 0.000867696, achieved: 0.000867696
dataset 1095, input: 0.000831804, achieved: 0.000831804
dataset 1096, input: 0.000831343, achieved: 0.000831343
dataset 1097, input: 0.000664315, achieved: 0.000664315
dataset 1098, input: 0.00586928, achieved: 0.00586928
dataset 1099, input: 0.00614952, achieved: 0.00614952
dataset 1100, input: 0.00595218, achieved: 0.00595218
dataset 1101, input: 0.00596332, achieved: 0.00596332
dataset 1102, input: 0.00481165, achieved: 0.00481165
dataset 1103, input: 0.00538103, achieved: 0.00538103
dataset 1104, input: 0.00548126, achieved: 0.00548126
dataset 1105, input: 0.00176982, achieved: 0.00176982
dataset 1106, input: 0.00505262, achieved: 0.00505262
[2025-03-12 09:21:24][I][data/gpt_dataset:191:megatron.data.gpt_dataset] [BuildConcatDataset] Caught args.shuffle_sample_in_corpus=True across 34742575 samples
> building indices for blendable datasets ...
> sample ratios:
dataset 0, input: 0.00201113, achieved: 0.00201113
dataset 1, input: 0.00203689, achieved: 0.00203689
dataset 2, input: 0.00202407, achieved: 0.00202407
dataset 3, input: 0.00200028, achieved: 0.00200028
dataset 4, input: 0.00203932, achieved: 0.00203932
dataset 5, input: 0.00203025, achieved: 0.00203025
dataset 6, input: 0.00201148, achieved: 0.00201148
dataset 7, input: 0.00206305, achieved: 0.00206305
dataset 8, input: 0.00200524, achieved: 0.00200524
dataset 9, input: 0.00212208, achieved: 0.00212208
dataset 10, input: 0.00200114, achieved: 0.00200114
dataset 11, input: 0.00201293, achieved: 0.00201293
dataset 12, input: 0.00205122, achieved: 0.00205122
dataset 13, input: 0.00199444, achieved: 0.00199444
dataset 14, input: 0.00200744, achieved: 0.00200744
dataset 15, input: 0.00205855, achieved: 0.00205855
dataset 16, input: 0.00202632, achieved: 0.00202632
dataset 17, input: 0.00204429, achieved: 0.00204429
dataset 18, input: 0.00205555, achieved: 0.00205555
dataset 19, input: 0.002009, achieved: 0.002009
dataset 20, input: 0.00203938, achieved: 0.00203938
dataset 21, input: 0.00203245, achieved: 0.00203245
dataset 22, input: 0.00204174, achieved: 0.00204174
dataset 23, input: 0.00198549, achieved: 0.00198549
dataset 24, input: 0.00205, achieved: 0.00205
dataset 25, input: 0.00205231, achieved: 0.00205231
dataset 26, input: 0.00201841, achieved: 0.00201841
dataset 27, input: 0.00201408, achieved: 0.00201408
dataset 28, input: 0.00203528, achieved: 0.00203528
dataset 29, input: 0.00196944, achieved: 0.00196944
dataset 30, input: 0.00201015, achieved: 0.00201015
dataset 31, input: 0.00197077, achieved: 0.00197077
dataset 32, input: 0.0020228, achieved: 0.0020228
dataset 33, input: 0.0020172, achieved: 0.0020172
dataset 34, input: 0.00199785, achieved: 0.00199785
dataset 35, input: 0.00199098, achieved: 0.00199098
dataset 36, input: 0.00203597, achieved: 0.00203597
dataset 37, input: 0.0019919, achieved: 0.0019919
dataset 38, input: 0.00203204, achieved: 0.00203204
dataset 39, input: 0.0019915, achieved: 0.0019915
dataset 40, input: 0.00204105, achieved: 0.00204105
dataset 41, input: 0.00202453, achieved: 0.00202453
dataset 42, input: 0.00202748, achieved: 0.00202748
dataset 43, input: 0.00203718, achieved: 0.00203718
dataset 44, input: 0.0019945, achieved: 0.0019945
dataset 45, input: 0.00200819, achieved: 0.00200819
dataset 46, input: 0.00203655, achieved: 0.00203655
dataset 47, input: 0.00201587, achieved: 0.00201587
dataset 48, input: 0.00200611, achieved: 0.00200611
dataset 49, input: 0.00201154, achieved: 0.00201154
dataset 50, input: 0.00197637, achieved: 0.00197637
dataset 51, input: 0.00199369, achieved: 0.00199369
dataset 52, input: 0.00197683, achieved: 0.00197683
dataset 53, input: 0.00201668, achieved: 0.00201668
dataset 54, input: 0.00201859, achieved: 0.00201859
dataset 55, input: 0.00200126, achieved: 0.00200126
dataset 56, input: 0.00202829, achieved: 0.00202829
dataset 57, input: 0.0020131, achieved: 0.0020131
dataset 58, input: 0.00197354, achieved: 0.00197354
dataset 59, input: 0.00199115, achieved: 0.00199115
dataset 60, input: 0.00197452, achieved: 0.00197452
dataset 61, input: 0.00202482, achieved: 0.00202482
dataset 62, input: 0.00199185, achieved: 0.00199185
dataset 63, input: 0.00201431, achieved: 0.00201431
dataset 64, input: 0.00199537, achieved: 0.00199537
dataset 65, input: 0.00199462, achieved: 0.00199462
dataset 66, input: 0.0020336, achieved: 0.0020336
dataset 67, input: 0.00202321, achieved: 0.00202321
dataset 68, input: 0.00207333, achieved: 0.00207333
dataset 69, input: 0.0020064, achieved: 0.0020064
dataset 70, input: 0.00205116, achieved: 0.00205116
dataset 71, input: 0.00199219, achieved: 0.00199219
dataset 72, input: 0.00204671, achieved: 0.00204671
dataset 73, input: 0.00198503, achieved: 0.00198503
dataset 74, input: 0.0020161, achieved: 0.0020161
dataset 75, input: 0.00198665, achieved: 0.00198665
dataset 76, input: 0.00203672, achieved: 0.00203672
dataset 77, input: 0.00198561, achieved: 0.00198561
dataset 78, input: 0.00200259, achieved: 0.00200259
dataset 79, input: 0.00203129, achieved: 0.00203129
dataset 80, input: 0.00202147, achieved: 0.00202147
dataset 81, input: 0.0019662, achieved: 0.0019662
dataset 82, input: 0.00203938, achieved: 0.00203938
dataset 83, input: 0.00199121, achieved: 0.00199121
dataset 84, input: 0.00203435, achieved: 0.00203435
dataset 85, input: 0.00199756, achieved: 0.00199756
dataset 86, input: 0.0019945, achieved: 0.0019945
dataset 87, input: 0.00203308, achieved: 0.00203308
dataset 88, input: 0.00195823, achieved: 0.00195823
dataset 89, input: 0.00200368, achieved: 0.00200368
dataset 90, input: 0.00200646, achieved: 0.00200646
dataset 91, input: 0.00199225, achieved: 0.00199225
dataset 92, input: 0.00200646, achieved: 0.00200646
dataset 93, input: 0.00199571, achieved: 0.00199571
dataset 94, input: 0.00197874, achieved: 0.00197874
dataset 95, input: 0.00200091, achieved: 0.00200091
dataset 96, input: 0.00197648, achieved: 0.00197648
dataset 97, input: 0.0020373, achieved: 0.0020373
dataset 98, input: 0.00200963, achieved: 0.00200963
dataset 99, input: 0.00200034, achieved: 0.00200034
dataset 100, input: 0.0020142, achieved: 0.0020142
dataset 101, input: 0.00206571, achieved: 0.00206571
dataset 102, input: 0.00199722, achieved: 0.00199722
dataset 103, input: 0.00203851, achieved: 0.00203851
dataset 104, input: 0.00200207, achieved: 0.00200207
dataset 105, input: 0.00201067, achieved: 0.00201067
dataset 106, input: 0.00202679, achieved: 0.00202679
dataset 107, input: 0.00199883, achieved: 0.00199883
dataset 108, input: 0.00205433, achieved: 0.00205433
dataset 109, input: 0.00197735, achieved: 0.00197735
dataset 110, input: 0.00203966, achieved: 0.00203966
dataset 111, input: 0.00201697, achieved: 0.00201697
dataset 112, input: 0.0020068, achieved: 0.0020068
dataset 113, input: 0.00204827, achieved: 0.00204827
dataset 114, input: 0.00202569, achieved: 0.00202569
dataset 115, input: 0.00202765, achieved: 0.00202765
dataset 116, input: 0.00199577, achieved: 0.00199577
dataset 117, input: 0.00204042, achieved: 0.00204042
dataset 118, input: 0.0020075, achieved: 0.0020075
dataset 119, input: 0.00206063, achieved: 0.00206063
dataset 120, input: 0.00200507, achieved: 0.00200507
dataset 121, input: 0.00204365, achieved: 0.00204365
dataset 122, input: 0.00199745, achieved: 0.00199745
dataset 123, input: 0.00204937, achieved: 0.00204937
dataset 124, input: 0.00203834, achieved: 0.00203834
dataset 125, input: 0.00203943, achieved: 0.00203943
dataset 126, input: 0.00201968, achieved: 0.00201968
dataset 127, input: 0.00203094, achieved: 0.00203094
dataset 128, input: 0.00198266, achieved: 0.00198266
dataset 129, input: 0.00201217, achieved: 0.00201217
dataset 130, input: 0.00200085, achieved: 0.00200085
dataset 131, input: 0.0019628, achieved: 0.0019628
dataset 132, input: 0.00201616, achieved: 0.00201616
dataset 133, input: 0.0020094, achieved: 0.0020094
dataset 134, input: 0.00202944, achieved: 0.00202944
dataset 135, input: 0.00205041, achieved: 0.00205041
dataset 136, input: 0.00198584, achieved: 0.00198584
dataset 137, input: 0.00199011, achieved: 0.00199011
dataset 138, input: 0.00199976, achieved: 0.00199976
dataset 139, input: 0.00198555, achieved: 0.00198555
dataset 140, input: 0.00202788, achieved: 0.00202788
dataset 141, input: 0.0019554, achieved: 0.0019554
dataset 142, input: 0.00203123, achieved: 0.00203123
dataset 143, input: 0.00197088, achieved: 0.00197088
dataset 144, input: 0.00200415, achieved: 0.00200415
dataset 145, input: 0.0020202, achieved: 0.0020202
dataset 146, input: 0.00205041, achieved: 0.00205041
dataset 147, input: 0.00200328, achieved: 0.00200328
dataset 148, input: 0.00201246, achieved: 0.00201246
dataset 149, input: 0.00200449, achieved: 0.00200449
dataset 150, input: 0.00200507, achieved: 0.00200507
dataset 151, input: 0.0019811, achieved: 0.0019811
dataset 152, input: 0.0019844, achieved: 0.0019844
dataset 153, input: 0.00201766, achieved: 0.00201766
dataset 154, input: 0.00199901, achieved: 0.00199901
dataset 155, input: 0.00199982, achieved: 0.00199982
dataset 156, input: 0.00202107, achieved: 0.00202107
dataset 157, input: 0.00203412, achieved: 0.00203412
dataset 158, input: 0.00201056, achieved: 0.00201056
dataset 159, input: 0.00196759, achieved: 0.00196759
dataset 160, input: 0.0019766, achieved: 0.0019766
dataset 161, input: 0.00198405, achieved: 0.00198405
dataset 162, input: 0.00199075, achieved: 0.00199075
dataset 163, input: 0.0020068, achieved: 0.0020068
dataset 164, input: 0.00203227, achieved: 0.00203227
dataset 165, input: 0.00200363, achieved: 0.00200363
dataset 166, input: 0.00201749, achieved: 0.00201749
dataset 167, input: 0.00197556, achieved: 0.00197556
dataset 168, input: 0.00200293, achieved: 0.00200293
dataset 169, input: 0.00201974, achieved: 0.00201974
dataset 170, input: 0.00197591, achieved: 0.00197591
dataset 171, input: 0.00203597, achieved: 0.00203597
dataset 172, input: 0.00197429, achieved: 0.00197429
dataset 173, input: 0.00201685, achieved: 0.00201685
dataset 174, input: 0.00197954, achieved: 0.00197954
dataset 175, input: 0.0019833, achieved: 0.0019833
dataset 176, input: 0.00200981, achieved: 0.00200981
dataset 177, input: 0.00196251, achieved: 0.00196251
dataset 178, input: 0.00204463, achieved: 0.00204463
dataset 179, input: 0.00201333, achieved: 0.00201333
dataset 180, input: 0.00199941, achieved: 0.00199941
dataset 181, input: 0.00201079, achieved: 0.00201079
dataset 182, input: 0.00198191, achieved: 0.00198191
dataset 183, input: 0.00200513, achieved: 0.00200513
dataset 184, input: 0.0019781, achieved: 0.0019781
dataset 185, input: 0.00198717, achieved: 0.00198717
dataset 186, input: 0.00198821, achieved: 0.00198821
dataset 187, input: 0.00201968, achieved: 0.00201968
dataset 188, input: 0.00201974, achieved: 0.00201974
dataset 189, input: 0.00197839, achieved: 0.00197839
dataset 190, input: 0.00202713, achieved: 0.00202713
dataset 191, input: 0.00198607, achieved: 0.00198607
dataset 192, input: 0.00200322, achieved: 0.00200322
dataset 193, input: 0.00195696, achieved: 0.00195696
dataset 194, input: 0.00201391, achieved: 0.00201391
dataset 195, input: 0.00197175, achieved: 0.00197175
dataset 196, input: 0.00197989, achieved: 0.00197989
dataset 197, input: 0.0019833, achieved: 0.0019833
dataset 198, input: 0.00193865, achieved: 0.00193865
dataset 199, input: 0.00200074, achieved: 0.00200074
dataset 200, input: 0.00196372, achieved: 0.00196372
dataset 201, input: 0.00199057, achieved: 0.00199057
dataset 202, input: 0.00197423, achieved: 0.00197423
dataset 203, input: 0.00198087, achieved: 0.00198087
dataset 204, input: 0.00196066, achieved: 0.00196066
dataset 205, input: 0.00200831, achieved: 0.00200831
dataset 206, input: 0.00197001, achieved: 0.00197001
dataset 207, input: 0.00203995, achieved: 0.00203995
dataset 208, input: 0.00198584, achieved: 0.00198584
dataset 209, input: 0.00203608, achieved: 0.00203608
dataset 210, input: 0.00202748, achieved: 0.00202748
dataset 211, input: 0.00199514, achieved: 0.00199514
dataset 212, input: 0.00201206, achieved: 0.00201206
dataset 213, input: 0.00202257, achieved: 0.00202257
dataset 214, input: 0.00199109, achieved: 0.00199109
dataset 215, input: 0.00203568, achieved: 0.00203568
dataset 216, input: 0.00197059, achieved: 0.00197059
dataset 217, input: 0.00199774, achieved: 0.00199774
dataset 218, input: 0.00200068, achieved: 0.00200068
dataset 219, input: 0.00199421, achieved: 0.00199421
dataset 220, input: 0.00201737, achieved: 0.00201737
dataset 221, input: 0.0019796, achieved: 0.0019796
dataset 222, input: 0.00196014, achieved: 0.00196014
dataset 223, input: 0.00201847, achieved: 0.00201847
dataset 224, input: 0.00200074, achieved: 0.00200074
dataset 225, input: 0.00199779, achieved: 0.00199779
dataset 226, input: 0.00194928, achieved: 0.00194928
dataset 227, input: 0.00203961, achieved: 0.00203961
dataset 228, input: 0.0019535, achieved: 0.0019535
dataset 229, input: 0.00201396, achieved: 0.00201396
dataset 230, input: 0.00197573, achieved: 0.00197573
dataset 231, input: 0.00198012, achieved: 0.00198012
dataset 232, input: 0.00202962, achieved: 0.00202962
dataset 233, input: 0.00198278, achieved: 0.00198278
dataset 234, input: 0.00202783, achieved: 0.00202783
dataset 235, input: 0.00201818, achieved: 0.00201818
dataset 236, input: 0.00198896, achieved: 0.00198896
dataset 237, input: 0.0020254, achieved: 0.0020254
dataset 238, input: 0.00201674, achieved: 0.00201674
dataset 239, input: 0.00198353, achieved: 0.00198353
dataset 240, input: 0.00204486, achieved: 0.00204486
dataset 241, input: 0.0019569, achieved: 0.0019569
dataset 242, input: 0.00203591, achieved: 0.00203591
dataset 243, input: 0.00199849, achieved: 0.00199849
dataset 244, input: 0.00200536, achieved: 0.00200536
dataset 245, input: 0.00198659, achieved: 0.00198659
dataset 246, input: 0.00198815, achieved: 0.00198815
dataset 247, input: 0.00198538, achieved: 0.00198538
dataset 248, input: 0.00200877, achieved: 0.00200877
dataset 249, input: 0.00199225, achieved: 0.00199225
dataset 250, input: 0.0020094, achieved: 0.0020094
dataset 251, input: 0.00194813, achieved: 0.00194813
dataset 252, input: 0.00199185, achieved: 0.00199185
dataset 253, input: 0.0019893, achieved: 0.0019893
dataset 254, input: 0.00194934, achieved: 0.00194934
dataset 255, input: 0.0019844, achieved: 0.0019844
dataset 256, input: 0.00193167, achieved: 0.00193167
dataset 257, input: 0.00203037, achieved: 0.00203037
dataset 258, input: 0.00196441, achieved: 0.00196441
dataset 259, input: 0.00196129, achieved: 0.00196129
dataset 260, input: 0.0019509, achieved: 0.0019509
dataset 261, input: 0.0019952, achieved: 0.0019952
dataset 262, input: 0.00194634, achieved: 0.00194634
dataset 263, input: 0.00200946, achieved: 0.00200946
dataset 264, input: 0.00198647, achieved: 0.00198647
dataset 265, input: 0.00197596, achieved: 0.00197596
dataset 266, input: 0.00200409, achieved: 0.00200409
dataset 267, input: 0.00196799, achieved: 0.00196799
dataset 268, input: 0.00201962, achieved: 0.00201962
dataset 269, input: 0.00197706, achieved: 0.00197706
dataset 270, input: 0.00196759, achieved: 0.00196759
dataset 271, input: 0.00200137, achieved: 0.00200137
dataset 272, input: 0.00199098, achieved: 0.00199098
dataset 273, input: 0.00199364, achieved: 0.00199364
dataset 274, input: 0.00199716, achieved: 0.00199716
dataset 275, input: 0.00199779, achieved: 0.00199779
dataset 276, input: 0.00199866, achieved: 0.00199866
dataset 277, input: 0.00198763, achieved: 0.00198763
dataset 278, input: 0.00200161, achieved: 0.00200161
dataset 279, input: 0.00198122, achieved: 0.00198122
dataset 280, input: 0.00200744, achieved: 0.00200744
dataset 281, input: 0.00200767, achieved: 0.00200767
dataset 282, input: 0.00200034, achieved: 0.00200034
dataset 283, input: 0.00200796, achieved: 0.00200796
dataset 284, input: 0.00198041, achieved: 0.00198041
dataset 285, input: 0.00199652, achieved: 0.00199652
dataset 286, input: 0.00198474, achieved: 0.00198474
dataset 287, input: 0.00198243, achieved: 0.00198243
dataset 288, input: 0.00197556, achieved: 0.00197556
dataset 289, input: 0.00198388, achieved: 0.00198388
dataset 290, input: 0.00202367, achieved: 0.00202367
dataset 291, input: 0.00195939, achieved: 0.00195939
dataset 292, input: 0.00201391, achieved: 0.00201391
dataset 293, input: 0.00198145, achieved: 0.00198145
dataset 294, input: 0.0019822, achieved: 0.0019822
dataset 295, input: 0.0019662, achieved: 0.0019662
dataset 296, input: 0.00198549, achieved: 0.00198549
dataset 297, input: 0.00201581, achieved: 0.00201581
dataset 298, input: 0.00199248, achieved: 0.00199248
dataset 299, input: 0.00201974, achieved: 0.00201974
dataset 300, input: 0.00198624, achieved: 0.00198624
dataset 301, input: 0.00197342, achieved: 0.00197342
dataset 302, input: 0.00205457, achieved: 0.00205457
dataset 303, input: 0.00199681, achieved: 0.00199681
dataset 304, input: 0.00200022, achieved: 0.00200022
dataset 305, input: 0.00198209, achieved: 0.00198209
dataset 306, input: 0.00199427, achieved: 0.00199427
dataset 307, input: 0.00199802, achieved: 0.00199802
dataset 308, input: 0.00198826, achieved: 0.00198826
dataset 309, input: 0.00205139, achieved: 0.00205139
dataset 310, input: 0.00198977, achieved: 0.00198977
dataset 311, input: 0.00199751, achieved: 0.00199751
dataset 312, input: 0.00200085, achieved: 0.00200085
dataset 313, input: 0.00197186, achieved: 0.00197186
dataset 314, input: 0.00198572, achieved: 0.00198572
dataset 315, input: 0.00199208, achieved: 0.00199208
dataset 316, input: 0.00197296, achieved: 0.00197296
dataset 317, input: 0.00202315, achieved: 0.00202315
dataset 318, input: 0.00197723, achieved: 0.00197723
dataset 319, input: 0.00199537, achieved: 0.00199537
dataset 320, input: 0.00197914, achieved: 0.00197914
dataset 321, input: 0.00199964, achieved: 0.00199964
dataset 322, input: 0.00199808, achieved: 0.00199808
dataset 323, input: 0.00198671, achieved: 0.00198671
dataset 324, input: 0.00196921, achieved: 0.00196921
dataset 325, input: 0.00198214, achieved: 0.00198214
dataset 326, input: 0.00198041, achieved: 0.00198041
dataset 327, input: 0.00197723, achieved: 0.00197723
dataset 328, input: 0.00199837, achieved: 0.00199837
dataset 329, input: 0.0019785, achieved: 0.0019785
dataset 330, input: 0.00201437, achieved: 0.00201437
dataset 331, input: 0.00197591, achieved: 0.00197591
dataset 332, input: 0.00198676, achieved: 0.00198676
dataset 333, input: 0.00200253, achieved: 0.00200253
dataset 334, input: 0.00198578, achieved: 0.00198578
dataset 335, input: 0.00203129, achieved: 0.00203129
dataset 336, input: 0.00198226, achieved: 0.00198226
dataset 337, input: 0.00202887, achieved: 0.00202887
dataset 338, input: 0.00199369, achieved: 0.00199369
dataset 339, input: 0.00204613, achieved: 0.00204613
dataset 340, input: 0.00198549, achieved: 0.00198549
dataset 341, input: 0.00202938, achieved: 0.00202938
dataset 342, input: 0.00202245, achieved: 0.00202245
dataset 343, input: 0.00204787, achieved: 0.00204787
dataset 344, input: 0.00201991, achieved: 0.00201991
dataset 345, input: 0.00201564, achieved: 0.00201564
dataset 346, input: 0.00198902, achieved: 0.00198902
dataset 347, input: 0.00203753, achieved: 0.00203753
dataset 348, input: 0.00201962, achieved: 0.00201962
dataset 349, input: 0.00205318, achieved: 0.00205318
dataset 350, input: 0.00200715, achieved: 0.00200715
dataset 351, input: 0.00203764, achieved: 0.00203764
dataset 352, input: 0.00202136, achieved: 0.00202136
dataset 353, input: 0.00201916, achieved: 0.00201916
dataset 354, input: 0.00202361, achieved: 0.00202361
dataset 355, input: 0.00199335, achieved: 0.00199335
dataset 356, input: 0.00198665, achieved: 0.00198665
dataset 357, input: 0.00201882, achieved: 0.00201882
dataset 358, input: 0.00201062, achieved: 0.00201062
dataset 359, input: 0.00193952, achieved: 0.00193952
dataset 360, input: 0.00201229, achieved: 0.00201229
dataset 361, input: 0.00197718, achieved: 0.00197718
dataset 362, input: 0.0019915, achieved: 0.0019915
dataset 363, input: 0.00195477, achieved: 0.00195477
dataset 364, input: 0.00196181, achieved: 0.00196181
dataset 365, input: 0.00197723, achieved: 0.00197723
dataset 366, input: 0.00195338, achieved: 0.00195338
dataset 367, input: 0.0019926, achieved: 0.0019926
dataset 368, input: 0.00202141, achieved: 0.00202141
dataset 369, input: 0.00201304, achieved: 0.00201304
dataset 370, input: 0.00197198, achieved: 0.00197198
dataset 371, input: 0.00196274, achieved: 0.00196274
dataset 372, input: 0.00199953, achieved: 0.00199953
dataset 373, input: 0.00197394, achieved: 0.00197394
dataset 374, input: 0.00197394, achieved: 0.00197394
dataset 375, input: 0.00199317, achieved: 0.00199317
dataset 376, input: 0.00197105, achieved: 0.00197105
dataset 377, input: 0.00195361, achieved: 0.00195361
dataset 378, input: 0.00197163, achieved: 0.00197163
dataset 379, input: 0.00199369, achieved: 0.00199369
dataset 380, input: 0.00195552, achieved: 0.00195552
dataset 381, input: 0.00197492, achieved: 0.00197492
dataset 382, input: 0.0019647, achieved: 0.0019647
dataset 383, input: 0.00197625, achieved: 0.00197625
dataset 384, input: 0.00197163, achieved: 0.00197163
dataset 385, input: 0.00198162, achieved: 0.00198162
dataset 386, input: 0.0019781, achieved: 0.0019781
dataset 387, input: 0.00197793, achieved: 0.00197793
dataset 388, input: 0.00196932, achieved: 0.00196932
dataset 389, input: 0.00196274, achieved: 0.00196274
dataset 390, input: 0.00207992, achieved: 0.00207992
dataset 391, input: 0.00196643, achieved: 0.00196643
dataset 392, input: 0.0019997, achieved: 0.0019997
dataset 393, input: 0.00196603, achieved: 0.00196603
dataset 394, input: 0.001958, achieved: 0.001958
dataset 395, input: 0.0020075, achieved: 0.0020075
dataset 396, input: 0.00195973, achieved: 0.00195973
dataset 397, input: 0.00199791, achieved: 0.00199791
dataset 398, input: 0.00195096, achieved: 0.00195096
dataset 399, input: 0.00199473, achieved: 0.00199473
dataset 400, input: 0.0019885, achieved: 0.0019885
dataset 401, input: 0.0020001, achieved: 0.0020001
dataset 402, input: 0.00194738, achieved: 0.00194738
dataset 403, input: 0.00197204, achieved: 0.00197204
dataset 404, input: 0.00194801, achieved: 0.00194801
dataset 405, input: 0.00197879, achieved: 0.00197879
dataset 406, input: 0.00194882, achieved: 0.00194882
dataset 407, input: 0.00197683, achieved: 0.00197683
dataset 408, input: 0.00195962, achieved: 0.00195962
dataset 409, input: 0.00196378, achieved: 0.00196378
dataset 410, input: 0.0019844, achieved: 0.0019844
dataset 411, input: 0.00196262, achieved: 0.00196262
dataset 412, input: 0.00205226, achieved: 0.00205226
dataset 413, input: 0.00197666, achieved: 0.00197666
dataset 414, input: 0.00194824, achieved: 0.00194824
dataset 415, input: 0.00201339, achieved: 0.00201339
dataset 416, input: 0.00198399, achieved: 0.00198399
dataset 417, input: 0.0019755, achieved: 0.0019755
dataset 418, input: 0.00198855, achieved: 0.00198855
dataset 419, input: 0.00200161, achieved: 0.00200161
dataset 420, input: 0.00198087, achieved: 0.00198087
dataset 421, input: 0.00196932, achieved: 0.00196932
dataset 422, input: 0.00200461, achieved: 0.00200461
dataset 423, input: 0.00201812, achieved: 0.00201812
dataset 424, input: 0.00198035, achieved: 0.00198035
dataset 425, input: 0.00200005, achieved: 0.00200005
dataset 426, input: 0.00199843, achieved: 0.00199843
dataset 427, input: 0.00197244, achieved: 0.00197244
dataset 428, input: 0.00200992, achieved: 0.00200992
dataset 429, input: 0.00196072, achieved: 0.00196072
dataset 430, input: 0.00199288, achieved: 0.00199288
dataset 431, input: 0.00196903, achieved: 0.00196903
dataset 432, input: 0.00200045, achieved: 0.00200045
dataset 433, input: 0.00196049, achieved: 0.00196049
dataset 434, input: 0.00202401, achieved: 0.00202401
dataset 435, input: 0.00198936, achieved: 0.00198936
dataset 436, input: 0.00199779, achieved: 0.00199779
dataset 437, input: 0.00196551, achieved: 0.00196551
dataset 438, input: 0.00197937, achieved: 0.00197937
dataset 439, input: 0.00198925, achieved: 0.00198925
dataset 440, input: 0.00197504, achieved: 0.00197504
dataset 441, input: 0.00197567, achieved: 0.00197567
dataset 442, input: 0.0019837, achieved: 0.0019837
dataset 443, input: 0.00197989, achieved: 0.00197989
dataset 444, input: 0.0020008, achieved: 0.0020008
dataset 445, input: 0.00202141, achieved: 0.00202141
dataset 446, input: 0.00206866, achieved: 0.00206866
dataset 447, input: 0.00201974, achieved: 0.00201974
dataset 448, input: 0.00201876, achieved: 0.00201876
dataset 449, input: 0.00200207, achieved: 0.00200207
dataset 450, input: 0.00200432, achieved: 0.00200432
dataset 451, input: 0.00200906, achieved: 0.00200906
dataset 452, input: 0.0019952, achieved: 0.0019952
dataset 453, input: 0.00202661, achieved: 0.00202661
dataset 454, input: 0.0019926, achieved: 0.0019926
dataset 455, input: 0.00197648, achieved: 0.00197648
dataset 456, input: 0.00195361, achieved: 0.00195361
dataset 457, input: 0.00198925, achieved: 0.00198925
dataset 458, input: 0.00200293, achieved: 0.00200293
dataset 459, input: 0.00197839, achieved: 0.00197839
dataset 460, input: 0.00199554, achieved: 0.00199554
dataset 461, input: 0.00201212, achieved: 0.00201212
dataset 462, input: 0.00198803, achieved: 0.00198803
dataset 463, input: 0.00200802, achieved: 0.00200802
dataset 464, input: 0.00199762, achieved: 0.00199762
dataset 465, input: 0.00200952, achieved: 0.00200952
dataset 466, input: 0.00198133, achieved: 0.00198133
dataset 467, input: 0.00200825, achieved: 0.00200825
dataset 468, input: 0.00199999, achieved: 0.00199999
dataset 469, input: 0.002009, achieved: 0.002009
dataset 470, input: 0.0019822, achieved: 0.0019822
dataset 471, input: 0.00201402, achieved: 0.00201402
dataset 472, input: 0.00198318, achieved: 0.00198318
dataset 473, input: 0.001996, achieved: 0.001996
dataset 474, input: 0.00199924, achieved: 0.00199924
dataset 475, input: 0.00197862, achieved: 0.00197862
dataset 476, input: 0.00203123, achieved: 0.00203123
dataset 477, input: 0.00195044, achieved: 0.00195044
dataset 478, input: 0.00197521, achieved: 0.00197521
dataset 479, input: 0.00201483, achieved: 0.00201483
dataset 480, input: 0.00200155, achieved: 0.00200155
dataset 481, input: 0.00198411, achieved: 0.00198411
dataset 482, input: 0.00198676, achieved: 0.00198676
dataset 483, input: 0.00199473, achieved: 0.00199473
dataset 484, input: 0.00199133, achieved: 0.00199133
dataset 485, input: 0.00205589, achieved: 0.00205589
dataset 486, input: 0.00198399, achieved: 0.00198399
dataset 487, input: 0.00205098, achieved: 0.00205098
dataset 488, input: 0.00200397, achieved: 0.00200397
dataset 489, input: 0.0019766, achieved: 0.0019766
dataset 490, input: 0.00205122, achieved: 0.00205122
dataset 491, input: 0.00198619, achieved: 0.00198619
dataset 492, input: 0.00198751, achieved: 0.00198751
dataset 493, input: 0.00198307, achieved: 0.00198307
dataset 494, input: 0.00201489, achieved: 0.00201489
dataset 495, input: 0.00198913, achieved: 0.00198913
dataset 496, input: 0.00198584, achieved: 0.00198584
dataset 497, input: 0.00200657, achieved: 0.00200657
dataset 498, input: 0.002012, achieved: 0.002012
dataset 499, input: 0.00204446, achieved: 0.00204446
[2025-03-12 09:21:37][I][data/gpt_dataset:191:megatron.data.gpt_dataset] [BuildConcatDataset] Caught args.shuffle_sample_in_corpus=True across 17315099 samples
> building indices for blendable datasets ...
> sample ratios:
dataset 0, input: 0.000966334, achieved: 0.000966334
dataset 1, input: 0.00373151, achieved: 0.00373151
dataset 2, input: 0.00086052, achieved: 0.00086052
dataset 3, input: 0.00369728, achieved: 0.00369728
dataset 4, input: 0.00355256, achieved: 0.00355256
dataset 5, input: 0.00366771, achieved: 0.00366771
dataset 6, input: 0.000820061, achieved: 0.000820061
dataset 7, input: 0.00207894, achieved: 0.00207894
dataset 8, input: 0.00199802, achieved: 0.00199802
dataset 9, input: 0.00204471, achieved: 0.00204471
dataset 10, input: 0.00190933, achieved: 0.00190933
dataset 11, input: 0.0021443, achieved: 0.0021443
dataset 12, input: 0.00208361, achieved: 0.00208361
dataset 13, input: 0.00187042, achieved: 0.00187042
dataset 14, input: 0.00189377, achieved: 0.00189377
dataset 15, input: 0.00198246, achieved: 0.00198246
dataset 16, input: 0.00213029, achieved: 0.00213029
dataset 17, input: 0.00200892, achieved: 0.00200892
dataset 18, input: 0.0017864, achieved: 0.0017864
dataset 19, input: 0.00208983, achieved: 0.00208983
dataset 20, input: 0.00213029, achieved: 0.00213029
dataset 21, input: 0.00217231, achieved: 0.00217231
dataset 22, input: 0.0020167, achieved: 0.0020167
dataset 23, input: 0.00174594, achieved: 0.00174594
dataset 24, input: 0.00197001, achieved: 0.00197001
dataset 25, input: 0.00208516, achieved: 0.00208516
dataset 26, input: 0.00189532, achieved: 0.00189532
dataset 27, input: 0.00176617, achieved: 0.00176617
dataset 28, input: 0.0018782, achieved: 0.0018782
dataset 29, input: 0.00185486, achieved: 0.00185486
dataset 30, input: 0.00203226, achieved: 0.00203226
dataset 31, input: 0.00190466, achieved: 0.00190466
dataset 32, input: 0.00224077, achieved: 0.00224077
dataset 33, input: 0.0018144, achieved: 0.0018144
dataset 34, input: 0.00197624, achieved: 0.00197624
dataset 35, input: 0.00211784, achieved: 0.00211784
dataset 36, input: 0.00201358, achieved: 0.00201358
dataset 37, input: 0.00173193, achieved: 0.00173193
dataset 38, input: 0.00197001, achieved: 0.00197001
dataset 39, input: 0.00205871, achieved: 0.00205871
dataset 40, input: 0.00186731, achieved: 0.00186731
dataset 41, input: 0.00192956, achieved: 0.00192956
dataset 42, input: 0.00195912, achieved: 0.00195912
dataset 43, input: 0.0020805, achieved: 0.0020805
dataset 44, input: 0.00189688, achieved: 0.00189688
dataset 45, input: 0.00201203, achieved: 0.00201203
dataset 46, input: 0.00197935, achieved: 0.00197935
dataset 47, input: 0.00189843, achieved: 0.00189843
dataset 48, input: 0.00189688, achieved: 0.00189688
dataset 49, input: 0.00185331, achieved: 0.00185331
dataset 50, input: 0.00199958, achieved: 0.00199958
dataset 51, input: 0.001942, achieved: 0.001942
dataset 52, input: 0.00851494, achieved: 0.00851494
dataset 53, input: 0.0088246, achieved: 0.0088246
dataset 54, input: 0.00817105, achieved: 0.00817105
dataset 55, input: 0.008669, achieved: 0.008669
dataset 56, input: 0.00837489, achieved: 0.00837489
dataset 57, input: 0.0077369, achieved: 0.0077369
dataset 58, input: 0.00822395, achieved: 0.00822395
dataset 59, input: 0.00744124, achieved: 0.00744124
dataset 60, input: 0.00692306, achieved: 0.00692306
dataset 61, input: 0.00874524, achieved: 0.00874524
dataset 62, input: 0.00748636, achieved: 0.00748636
dataset 63, input: 0.00874058, achieved: 0.00874058
dataset 64, input: 0.00939258, achieved: 0.00939258
dataset 65, input: 0.00797031, achieved: 0.00797031
dataset 66, input: 0.00762174, achieved: 0.00762174
dataset 67, input: 0.00895999, achieved: 0.00895999
dataset 68, input: 0.00764353, achieved: 0.00764353
dataset 69, input: 0.00732297, achieved: 0.00732297
dataset 70, input: 0.00830176, achieved: 0.00830176
dataset 71, input: 0.00769177, achieved: 0.00769177
dataset 72, input: 0.00858808, achieved: 0.00858808
dataset 73, input: 0.00768088, achieved: 0.00768088
dataset 74, input: 0.00854762, achieved: 0.00854762
dataset 75, input: 0.0092074, achieved: 0.0092074
dataset 76, input: 0.00845581, achieved: 0.00845581
dataset 77, input: 0.00914049, achieved: 0.00914049
dataset 78, input: 0.00782404, achieved: 0.00782404
dataset 79, input: 0.0084278, achieved: 0.0084278
dataset 80, input: 0.00789562, achieved: 0.00789562
dataset 81, input: 0.00868144, achieved: 0.00868144
dataset 82, input: 0.00796253, achieved: 0.00796253
dataset 83, input: 0.00741634, achieved: 0.00741634
dataset 84, input: 0.00838423, achieved: 0.00838423
dataset 85, input: 0.00816482, achieved: 0.00816482
dataset 86, input: 0.00740856, achieved: 0.00740856
dataset 87, input: 0.0078396, achieved: 0.0078396
dataset 88, input: 0.00873591, achieved: 0.00873591
dataset 89, input: 0.00877481, achieved: 0.00877481
dataset 90, input: 0.00749415, achieved: 0.00749415
dataset 91, input: 0.0082224, achieved: 0.0082224
dataset 92, input: 0.00737588, achieved: 0.00737588
dataset 93, input: 0.00892264, achieved: 0.00892264
dataset 94, input: 0.00788006, achieved: 0.00788006
dataset 95, input: 0.00921207, achieved: 0.00921207
dataset 96, input: 0.00833288, achieved: 0.00833288
dataset 97, input: 0.0081586, achieved: 0.0081586
dataset 98, input: 0.00785205, achieved: 0.00785205
dataset 99, input: 0.00876081, achieved: 0.00876081
dataset 100, input: 0.00836089, achieved: 0.00836089
dataset 101, input: 0.00835622, achieved: 0.00835622
dataset 102, input: 0.00753149, achieved: 0.00753149
dataset 103, input: 0.00779603, achieved: 0.00779603
dataset 104, input: 0.00841691, achieved: 0.00841691
dataset 105, input: 0.00863321, achieved: 0.00863321
dataset 106, input: 0.00874836, achieved: 0.00874836
dataset 107, input: 0.00861142, achieved: 0.00861142
dataset 108, input: 0.00857252, achieved: 0.00857252
dataset 109, input: 0.00764197, achieved: 0.00764197
dataset 110, input: 0.00854762, achieved: 0.00854762
dataset 111, input: 0.00844336, achieved: 0.00844336
dataset 112, input: 0.00797809, achieved: 0.00797809
dataset 113, input: 0.00808702, achieved: 0.00808702
dataset 114, input: 0.00336894, achieved: 0.00336894
dataset 115, input: 0.00219565, achieved: 0.00219565
dataset 116, input: 0.00542299, achieved: 0.00542299
dataset 117, input: 0.00464961, achieved: 0.00464961
dataset 118, input: 0.00344363, achieved: 0.00344363
dataset 119, input: 0.00410653, achieved: 0.00410653
dataset 120, input: 0.00387156, achieved: 0.00387156
dataset 121, input: 0.00405674, achieved: 0.00405674
dataset 122, input: 0.00389179, achieved: 0.00389179
dataset 123, input: 0.00385756, achieved: 0.00385756
dataset 124, input: 0.00394625, achieved: 0.00394625
dataset 125, input: 0.00367082, achieved: 0.00367082
dataset 126, input: 0.00365526, achieved: 0.00365526
dataset 127, input: 0.00382488, achieved: 0.00382488
dataset 128, input: 0.00407852, achieved: 0.00407852
dataset 129, input: 0.00326935, achieved: 0.00326935
dataset 130, input: 0.0039976, achieved: 0.0039976
dataset 131, input: 0.0023497, achieved: 0.0023497
dataset 132, input: 0.0019669, achieved: 0.0019669
dataset 133, input: 0.00344519, achieved: 0.00344519
dataset 134, input: 0.00379376, achieved: 0.00379376
dataset 135, input: 0.00142227, achieved: 0.00142227
dataset 136, input: 0.00333004, achieved: 0.00333004
dataset 137, input: 0.00301726, achieved: 0.00301726
dataset 138, input: 0.00423724, achieved: 0.00423724
dataset 139, input: 0.00424813, achieved: 0.00424813
dataset 140, input: 0.00635353, achieved: 0.00635353
dataset 141, input: 0.00418745, achieved: 0.00418745
dataset 142, input: 0.00323512, achieved: 0.00323512
dataset 143, input: 0.00261735, achieved: 0.00261735
dataset 144, input: 0.00267181, achieved: 0.00267181
dataset 145, input: 0.00267337, achieved: 0.00267337
dataset 146, input: 0.0025411, achieved: 0.0025411
dataset 147, input: 0.00254732, achieved: 0.00254732
dataset 148, input: 0.00237149, achieved: 0.00237149
dataset 149, input: 0.00252398, achieved: 0.00252398
dataset 150, input: 0.0024633, achieved: 0.0024633
dataset 151, input: 0.00254266, achieved: 0.00254266
dataset 152, input: 0.00262669, achieved: 0.00262669
dataset 153, input: 0.00240261, achieved: 0.00240261
dataset 154, input: 0.00232169, achieved: 0.00232169
dataset 155, input: 0.00254732, achieved: 0.00254732
dataset 156, input: 0.0024384, achieved: 0.0024384
dataset 157, input: 0.00257067, achieved: 0.00257067
dataset 158, input: 0.00254266, achieved: 0.00254266
dataset 159, input: 0.00261268, achieved: 0.00261268
dataset 160, input: 0.00240416, achieved: 0.00240416
dataset 161, input: 0.00259245, achieved: 0.00259245
dataset 162, input: 0.00253799, achieved: 0.00253799
dataset 163, input: 0.00246174, achieved: 0.00246174
dataset 164, input: 0.00225945, achieved: 0.00225945
dataset 165, input: 0.00250375, achieved: 0.00250375
dataset 166, input: 0.00202292, achieved: 0.00202292
dataset 167, input: 0.00131179, achieved: 0.00131179
dataset 168, input: 0.00125577, achieved: 0.00125577
dataset 169, input: 0.00145962, achieved: 0.00145962
dataset 170, input: 0.00140204, achieved: 0.00140204
dataset 171, input: 0.00116707, achieved: 0.00116707
dataset 172, input: 0.00132579, achieved: 0.00132579
dataset 173, input: 0.00139426, achieved: 0.00139426
dataset 174, input: 0.00127444, achieved: 0.00127444
dataset 175, input: 0.00137403, achieved: 0.00137403
dataset 176, input: 0.0013398, achieved: 0.0013398
dataset 177, input: 0.00121842, achieved: 0.00121842
dataset 178, input: 0.00137714, achieved: 0.00137714
dataset 179, input: 0.00133357, achieved: 0.00133357
dataset 180, input: 0.00185642, achieved: 0.00185642
dataset 181, input: 0.00050573, achieved: 0.00050573
dataset 182, input: 0.00314642, achieved: 0.00314642
dataset 183, input: 0.00289278, achieved: 0.00289278
dataset 184, input: 0.00265314, achieved: 0.00265314
dataset 185, input: 0.00227812, achieved: 0.00227812
dataset 186, input: 0.00349654, achieved: 0.00349654
dataset 187, input: 0.00437107, achieved: 0.00437107
dataset 188, input: 0.00135225, achieved: 0.00135225
dataset 189, input: 0.00277607, achieved: 0.00277607
dataset 190, input: 0.00277296, achieved: 0.00277296
dataset 191, input: 0.0027325, achieved: 0.0027325
dataset 192, input: 0.00270293, achieved: 0.00270293
dataset 193, input: 0.00280097, achieved: 0.00280097
dataset 194, input: 0.00287099, achieved: 0.00287099
dataset 195, input: 0.0026298, achieved: 0.0026298
dataset 196, input: 0.00314953, achieved: 0.00314953
dataset 197, input: 0.00294724, achieved: 0.00294724
dataset 198, input: 0.00308418, achieved: 0.00308418
dataset 199, input: 0.000790495, achieved: 0.000790495
dataset 200, input: 0.00204937, achieved: 0.00204937
dataset 201, input: 0.00168214, achieved: 0.00168214
dataset 202, input: 0.00171948, achieved: 0.00171948
dataset 203, input: 0.00195601, achieved: 0.00195601
dataset 204, input: 0.0019918, achieved: 0.0019918
dataset 205, input: 0.00205249, achieved: 0.00205249
dataset 206, input: 0.00201825, achieved: 0.00201825
dataset 207, input: 0.0020556, achieved: 0.0020556
dataset 208, input: 0.00187665, achieved: 0.00187665
dataset 209, input: 0.00189688, achieved: 0.00189688
dataset 210, input: 0.00211006, achieved: 0.00211006
dataset 211, input: 0.00212251, achieved: 0.00212251
dataset 212, input: 0.00187976, achieved: 0.00187976
dataset 213, input: 0.00203226, achieved: 0.00203226
dataset 214, input: 0.00168214, achieved: 0.00168214
dataset 215, input: 0.00231235, achieved: 0.00231235
dataset 216, input: 0.00169147, achieved: 0.00169147
dataset 217, input: 0.00196846, achieved: 0.00196846
dataset 218, input: 0.00147051, achieved: 0.00147051
dataset 219, input: 0.00189221, achieved: 0.00189221
dataset 220, input: 0.00179729, achieved: 0.00179729
dataset 221, input: 0.0019669, achieved: 0.0019669
dataset 222, input: 0.00183463, achieved: 0.00183463
dataset 223, input: 0.00213496, achieved: 0.00213496
dataset 224, input: 0.00187665, achieved: 0.00187665
dataset 225, input: 0.00193422, achieved: 0.00193422
dataset 226, input: 0.00237149, achieved: 0.00237149
dataset 227, input: 0.00170703, achieved: 0.00170703
dataset 228, input: 0.00174749, achieved: 0.00174749
dataset 229, input: 0.000759374, achieved: 0.000759374
dataset 230, input: 0.00168058, achieved: 0.00168058
dataset 231, input: 0.00197935, achieved: 0.00197935
dataset 232, input: 0.0042139, achieved: 0.0042139
dataset 233, input: 0.00431816, achieved: 0.00431816
dataset 234, input: 0.00355723, achieved: 0.00355723
dataset 235, input: 0.00370195, achieved: 0.00370195
dataset 236, input: 0.00189999, achieved: 0.00189999
dataset 237, input: 0.00250375, achieved: 0.00250375
dataset 238, input: 0.00499973, achieved: 0.00499973
dataset 239, input: 0.000925875, achieved: 0.000925875
dataset 240, input: 0.00190466, achieved: 0.00190466
dataset 241, input: 0.00194667, achieved: 0.00194667
dataset 242, input: 0.00158722, achieved: 0.00158722
dataset 243, input: 0.0039976, achieved: 0.0039976
dataset 244, input: 0.00557704, achieved: 0.00557704
dataset 245, input: 0.00230146, achieved: 0.00230146
dataset 246, input: 0.00170548, achieved: 0.00170548
dataset 247, input: 0.0023746, achieved: 0.0023746
dataset 248, input: 0.00192333, achieved: 0.00192333
dataset 249, input: 0.00204782, achieved: 0.00204782
dataset 250, input: 0.00197779, achieved: 0.00197779
dataset 251, input: 0.00167591, achieved: 0.00167591
dataset 252, input: 0.00195445, achieved: 0.00195445
dataset 253, input: 0.00214585, achieved: 0.00214585
dataset 254, input: 0.00203848, achieved: 0.00203848
dataset 255, input: 0.000746925, achieved: 0.000746925
dataset 256, input: 0.00500595, achieved: 0.00500595
dataset 257, input: 0.00535452, achieved: 0.00535452
dataset 258, input: 0.00502151, achieved: 0.00502151
dataset 259, input: 0.00479121, achieved: 0.00479121
dataset 260, input: 0.00486746, achieved: 0.00486746
dataset 261, input: 0.00111105, achieved: 0.00111105
[2025-03-12 09:21:37][I][data/gpt_dataset:191:megatron.data.gpt_dataset] [BuildConcatDataset] Caught args.shuffle_sample_in_corpus=True across 642635 samples
> building indices for blendable datasets ...
> sample ratios:
dataset 0, input: 0.0835778, achieved: 0.0835778
dataset 1, input: 0.0834323, achieved: 0.0834323
dataset 2, input: 0.0510323, achieved: 0.0510323
dataset 3, input: 0.104354, achieved: 0.104354
dataset 4, input: 0.0513544, achieved: 0.0513544
dataset 5, input: 0.00400893, achieved: 0.00400893
dataset 6, input: 0.115667, achieved: 0.115667
dataset 7, input: 0.0827874, achieved: 0.0827874
dataset 8, input: 0.103788, achieved: 0.103788
dataset 9, input: 0.11266, achieved: 0.11266
dataset 10, input: 0.050851, achieved: 0.050851
dataset 11, input: 0.0513192, achieved: 0.0513192
dataset 12, input: 0.105168, achieved: 0.105168
[2025-03-12 09:21:37][I][data/gpt_dataset:191:megatron.data.gpt_dataset] [BuildConcatDataset] Caught args.shuffle_sample_in_corpus=True across 1704196 samples
> building indices for blendable datasets ...
> sample ratios:
dataset 0, input: 0.0182376, achieved: 0.0182376
dataset 1, input: 0.0182962, achieved: 0.0182962
dataset 2, input: 0.018299, achieved: 0.018299
dataset 3, input: 0.0182779, achieved: 0.0182779
dataset 4, input: 0.0182862, achieved: 0.0182862
dataset 5, input: 0.0181746, achieved: 0.0181746
dataset 6, input: 0.0183693, achieved: 0.0183693
dataset 7, input: 0.0220027, achieved: 0.0220027
dataset 8, input: 0.0486005, achieved: 0.0486005
dataset 9, input: 0.0484891, achieved: 0.0484891
dataset 10, input: 0.0512473, achieved: 0.0512473
dataset 11, input: 0.0512001, achieved: 0.0512001
dataset 12, input: 0.0512732, achieved: 0.0512732
dataset 13, input: 0.0485441, achieved: 0.0485441
dataset 14, input: 0.0485733, achieved: 0.0485733
dataset 15, input: 0.0511485, achieved: 0.0511485
dataset 16, input: 0.0485108, achieved: 0.0485108
dataset 17, input: 0.0485108, achieved: 0.0485108
dataset 18, input: 0.0487117, achieved: 0.0487117
dataset 19, input: 0.0511296, achieved: 0.0511296
dataset 20, input: 0.048739, achieved: 0.048739
dataset 21, input: 0.0512227, achieved: 0.0512227
dataset 22, input: 0.0486002, achieved: 0.0486002
dataset 23, input: 0.0487371, achieved: 0.0487371
dataset 24, input: 0.0511531, achieved: 0.0511531
dataset 25, input: 0.00566539, achieved: 0.00566539
[2025-03-12 09:21:38][I][data/gpt_dataset:191:megatron.data.gpt_dataset] [BuildConcatDataset] Caught args.shuffle_sample_in_corpus=True across 6726808 samples
> building indices for blendable datasets ...
> sample ratios:
dataset 0, input: 0.0130268, achieved: 0.0130268
dataset 1, input: 0.0134792, achieved: 0.0134792
dataset 2, input: 0.0136289, achieved: 0.0136289
dataset 3, input: 0.013172, achieved: 0.013172
dataset 4, input: 0.0132409, achieved: 0.0132409
dataset 5, input: 0.013352, achieved: 0.013352
dataset 6, input: 0.0134121, achieved: 0.0134121
dataset 7, input: 0.0128782, achieved: 0.0128782
dataset 8, input: 0.0128349, achieved: 0.0128349
dataset 9, input: 0.013055, achieved: 0.013055
dataset 10, input: 0.0128416, achieved: 0.0128416
dataset 11, input: 0.012892, achieved: 0.012892
dataset 12, input: 0.0128222, achieved: 0.0128222
dataset 13, input: 0.0128934, achieved: 0.0128934
dataset 14, input: 0.0131082, achieved: 0.0131082
dataset 15, input: 0.0129718, achieved: 0.0129718
dataset 16, input: 0.013004, achieved: 0.013004
dataset 17, input: 0.0128759, achieved: 0.0128759
dataset 18, input: 0.0129287, achieved: 0.0129287
dataset 19, input: 0.0130294, achieved: 0.0130294
dataset 20, input: 0.0128648, achieved: 0.0128648
dataset 21, input: 0.0131354, achieved: 0.0131354
dataset 22, input: 0.0129144, achieved: 0.0129144
dataset 23, input: 0.0129003, achieved: 0.0129003
dataset 24, input: 0.013258, achieved: 0.013258
dataset 25, input: 0.0129312, achieved: 0.0129312
dataset 26, input: 0.013249, achieved: 0.013249
dataset 27, input: 0.0131446, achieved: 0.0131446
dataset 28, input: 0.0131264, achieved: 0.0131264
dataset 29, input: 0.0128913, achieved: 0.0128913
dataset 30, input: 0.0129347, achieved: 0.0129347
dataset 31, input: 0.0132695, achieved: 0.0132695
dataset 32, input: 0.0129616, achieved: 0.0129616
dataset 33, input: 0.0129188, achieved: 0.0129188
dataset 34, input: 0.0128966, achieved: 0.0128966
dataset 35, input: 0.012892, achieved: 0.012892
dataset 36, input: 0.013181, achieved: 0.013181
dataset 37, input: 0.0130499, achieved: 0.0130499
dataset 38, input: 0.0129443, achieved: 0.0129443
dataset 39, input: 0.0130167, achieved: 0.0130167
dataset 40, input: 0.0127473, achieved: 0.0127473
dataset 41, input: 0.0127561, achieved: 0.0127561
dataset 42, input: 0.01274, achieved: 0.01274
dataset 43, input: 0.012751, achieved: 0.012751
dataset 44, input: 0.012733, achieved: 0.012733
dataset 45, input: 0.012737, achieved: 0.012737
dataset 46, input: 0.0127356, achieved: 0.0127356
dataset 47, input: 0.0127287, achieved: 0.0127287
dataset 48, input: 0.0127204, achieved: 0.0127204
dataset 49, input: 0.0127125, achieved: 0.0127125
dataset 50, input: 0.0126971, achieved: 0.0126971
dataset 51, input: 0.0127033, achieved: 0.0127033
dataset 52, input: 0.0126906, achieved: 0.0126906
dataset 53, input: 0.0126747, achieved: 0.0126747
dataset 54, input: 0.0126724, achieved: 0.0126724
dataset 55, input: 0.0126814, achieved: 0.0126814
dataset 56, input: 0.0126768, achieved: 0.0126768
dataset 57, input: 0.0126962, achieved: 0.0126962
dataset 58, input: 0.0126883, achieved: 0.0126883
dataset 59, input: 0.0126651, achieved: 0.0126651
dataset 60, input: 0.0126734, achieved: 0.0126734
dataset 61, input: 0.0126632, achieved: 0.0126632
dataset 62, input: 0.0126542, achieved: 0.0126542
dataset 63, input: 0.0126671, achieved: 0.0126671
dataset 64, input: 0.0126969, achieved: 0.0126969
dataset 65, input: 0.0138538, achieved: 0.0138538
dataset 66, input: 0.0124738, achieved: 0.0124738
dataset 67, input: 0.0124994, achieved: 0.0124994
dataset 68, input: 0.0123766, achieved: 0.0123766
dataset 69, input: 0.0124441, achieved: 0.0124441
dataset 70, input: 0.0122455, achieved: 0.0122455
dataset 71, input: 0.0124694, achieved: 0.0124694
dataset 72, input: 0.0121931, achieved: 0.0121931
dataset 73, input: 0.0122484, achieved: 0.0122484
dataset 74, input: 0.0117788, achieved: 0.0117788
dataset 75, input: 0.0133204, achieved: 0.0133204
dataset 76, input: 0.0131683, achieved: 0.0131683
dataset 77, input: 0.00943814, achieved: 0.00943814
[2025-03-12 09:21:38][I][data/gpt_dataset:191:megatron.data.gpt_dataset] [BuildConcatDataset] Caught args.shuffle_sample_in_corpus=True across 4339733 samples
> building indices for blendable datasets ...
> sample ratios:
dataset 0, input: 0.038046, achieved: 0.038046
dataset 1, input: 0.0413855, achieved: 0.0413855
dataset 2, input: 0.0406094, achieved: 0.0406094
dataset 3, input: 0.0365558, achieved: 0.0365558
dataset 4, input: 0.0341426, achieved: 0.0341426
dataset 5, input: 0.0350149, achieved: 0.0350149
dataset 6, input: 0.0358744, achieved: 0.0358744
dataset 7, input: 0.036827, achieved: 0.036827
dataset 8, input: 0.0375282, achieved: 0.0375282
dataset 9, input: 0.0379556, achieved: 0.0379556
dataset 10, input: 0.0381705, achieved: 0.0381705
dataset 11, input: 0.038556, achieved: 0.038556
dataset 12, input: 0.0388884, achieved: 0.0388884
dataset 13, input: 0.0391665, achieved: 0.0391665
dataset 14, input: 0.0393857, achieved: 0.0393857
dataset 15, input: 0.0397976, achieved: 0.0397976
dataset 16, input: 0.0400668, achieved: 0.0400668
dataset 17, input: 0.0403879, achieved: 0.0403879
dataset 18, input: 0.0408308, achieved: 0.0408308
dataset 19, input: 0.0411838, achieved: 0.0411838
dataset 20, input: 0.0418467, achieved: 0.0418467
dataset 21, input: 0.0425557, achieved: 0.0425557
dataset 22, input: 0.042814, achieved: 0.042814
dataset 23, input: 0.0425712, achieved: 0.0425712
dataset 24, input: 0.0388551, achieved: 0.0388551
dataset 25, input: 0.020984, achieved: 0.020984
[2025-03-12 09:21:39][I][data/gpt_dataset:191:megatron.data.gpt_dataset] [BuildConcatDataset] Caught args.shuffle_sample_in_corpus=True across 2578247 samples
> building indices for blendable datasets ...
> sample ratios:
dataset 0, input: 0.0235833, achieved: 0.0235833
dataset 1, input: 0.0216057, achieved: 0.0216057
dataset 2, input: 0.027075, achieved: 0.027075
dataset 3, input: 0.0271066, achieved: 0.0271066
dataset 4, input: 0.0274384, achieved: 0.0274384
dataset 5, input: 0.0257867, achieved: 0.0257867
dataset 6, input: 0.0255337, achieved: 0.0255337
dataset 7, input: 0.027977, achieved: 0.027977
dataset 8, input: 0.0270093, achieved: 0.0270093
dataset 9, input: 0.0285904, achieved: 0.0285904
dataset 10, input: 0.0283676, achieved: 0.0283676
dataset 11, input: 0.0153407, achieved: 0.0153407
dataset 12, input: 0.014138, achieved: 0.014138
dataset 13, input: 0.0141798, achieved: 0.0141798
dataset 14, input: 0.0141664, achieved: 0.0141664
dataset 15, input: 0.0150215, achieved: 0.0150215
dataset 16, input: 0.0280527, achieved: 0.0280527
dataset 17, input: 0.023455, achieved: 0.023455
dataset 18, input: 0.0247761, achieved: 0.0247761
dataset 19, input: 0.0205734, achieved: 0.0205734
dataset 20, input: 0.0205842, achieved: 0.0205842
dataset 21, input: 0.020579, achieved: 0.020579
dataset 22, input: 0.0205939, achieved: 0.0205939
dataset 23, input: 0.0203349, achieved: 0.0203349
dataset 24, input: 0.0199823, achieved: 0.0199823
dataset 25, input: 0.0199573, achieved: 0.0199573
dataset 26, input: 0.0199854, achieved: 0.0199854
dataset 27, input: 0.0168267, achieved: 0.0168267
dataset 28, input: 0.0172125, achieved: 0.0172125
dataset 29, input: 0.018342, achieved: 0.018342
dataset 30, input: 0.014919, achieved: 0.014919
dataset 31, input: 0.0149787, achieved: 0.0149787
dataset 32, input: 0.0149735, achieved: 0.0149735
dataset 33, input: 0.0149415, achieved: 0.0149415
dataset 34, input: 0.0149689, achieved: 0.0149689
dataset 35, input: 0.0149673, achieved: 0.0149673
dataset 36, input: 0.0230039, achieved: 0.0230039
dataset 37, input: 0.0215731, achieved: 0.0215731
dataset 38, input: 0.0215682, achieved: 0.0215682
dataset 39, input: 0.0211097, achieved: 0.0211097
dataset 40, input: 0.0190817, achieved: 0.0190817
dataset 41, input: 0.0191069, achieved: 0.0191069
dataset 42, input: 0.0189985, achieved: 0.0189985
dataset 43, input: 0.0186603, achieved: 0.0186603
dataset 44, input: 0.0219256, achieved: 0.0219256
dataset 45, input: 0.0310232, achieved: 0.0310232
dataset 46, input: 0.0189783, achieved: 0.0189783
dataset 47, input: 0.0184155, achieved: 0.0184155
dataset 48, input: 0.00263109, achieved: 0.00263109
[2025-03-12 09:21:40][I][data/gpt_dataset:191:megatron.data.gpt_dataset] [BuildConcatDataset] Caught args.shuffle_sample_in_corpus=True across 18622355 samples
> building indices for blendable datasets ...
> sample ratios:
dataset 0, input: 0.0180888, achieved: 0.0180888
dataset 1, input: 0.0157123, achieved: 0.0157123
dataset 2, input: 0.0156305, achieved: 0.0156305
dataset 3, input: 0.0150714, achieved: 0.0150714
dataset 4, input: 0.0142973, achieved: 0.0142973
dataset 5, input: 0.0129064, achieved: 0.0129064
dataset 6, input: 0.0162972, achieved: 0.0162972
dataset 7, input: 0.0174737, achieved: 0.0174737
dataset 8, input: 0.017477, achieved: 0.017477
dataset 9, input: 0.0148979, achieved: 0.0148979
dataset 10, input: 0.0158723, achieved: 0.0158723
dataset 11, input: 0.0164824, achieved: 0.0164824
dataset 12, input: 0.0147445, achieved: 0.0147445
dataset 13, input: 0.0160167, achieved: 0.0160167
dataset 14, input: 0.0164729, achieved: 0.0164729
dataset 15, input: 0.0169845, achieved: 0.0169845
dataset 16, input: 0.014763, achieved: 0.014763
dataset 17, input: 0.0152292, achieved: 0.0152292
dataset 18, input: 0.0156109, achieved: 0.0156109
dataset 19, input: 0.0155986, achieved: 0.0155986
dataset 20, input: 0.0157206, achieved: 0.0157206
dataset 21, input: 0.0135193, achieved: 0.0135193
dataset 22, input: 0.0106687, achieved: 0.0106687
dataset 23, input: 0.012068, achieved: 0.012068
dataset 24, input: 0.0143947, achieved: 0.0143947
dataset 25, input: 0.0133682, achieved: 0.0133682
dataset 26, input: 0.0116767, achieved: 0.0116767
dataset 27, input: 0.0121379, achieved: 0.0121379
dataset 28, input: 0.0188349, achieved: 0.0188349
dataset 29, input: 0.0185595, achieved: 0.0185595
dataset 30, input: 0.0184979, achieved: 0.0184979
dataset 31, input: 0.0163945, achieved: 0.0163945
dataset 32, input: 0.0160772, achieved: 0.0160772
dataset 33, input: 0.0161074, achieved: 0.0161074
dataset 34, input: 0.0160856, achieved: 0.0160856
dataset 35, input: 0.0160156, achieved: 0.0160156
dataset 36, input: 0.0157346, achieved: 0.0157346
dataset 37, input: 0.0157352, achieved: 0.0157352
dataset 38, input: 0.0155214, achieved: 0.0155214
dataset 39, input: 0.0154414, achieved: 0.0154414
dataset 40, input: 0.0172906, achieved: 0.0172906
dataset 41, input: 0.0117512, achieved: 0.0117512
dataset 42, input: 0.0169453, achieved: 0.0169453
dataset 43, input: 0.0181806, achieved: 0.0181806
dataset 44, input: 0.0184588, achieved: 0.0184588
dataset 45, input: 0.018433, achieved: 0.018433
dataset 46, input: 0.0209411, achieved: 0.0209411
dataset 47, input: 0.0209187, achieved: 0.0209187
dataset 48, input: 0.0186021, achieved: 0.0186021
dataset 49, input: 0.0118603, achieved: 0.0118603
dataset 50, input: 0.0118782, achieved: 0.0118782
dataset 51, input: 0.0118402, achieved: 0.0118402
dataset 52, input: 0.0118558, achieved: 0.0118558
dataset 53, input: 0.0118536, achieved: 0.0118536
dataset 54, input: 0.0118463, achieved: 0.0118463
dataset 55, input: 0.011829, achieved: 0.011829
dataset 56, input: 0.0118284, achieved: 0.0118284
dataset 57, input: 0.0155567, achieved: 0.0155567
dataset 58, input: 0.0138154, achieved: 0.0138154
dataset 59, input: 0.0173808, achieved: 0.0173808
dataset 60, input: 0.0151027, achieved: 0.0151027
dataset 61, input: 0.0143712, achieved: 0.0143712
dataset 62, input: 0.0144484, achieved: 0.0144484
dataset 63, input: 0.0170886, achieved: 0.0170886
dataset 64, input: 0.0156731, achieved: 0.0156731
dataset 65, input: 0.0020631, achieved: 0.0020631
[2025-03-12 09:21:41][I][data/gpt_dataset:191:megatron.data.gpt_dataset] [BuildConcatDataset] Caught args.shuffle_sample_in_corpus=True across 1786631 samples
> building indices for blendable datasets ...
> sample ratios:
dataset 0, input: 0.658845, achieved: 0.658845
dataset 1, input: 0.341155, achieved: 0.341155
[2025-03-12 09:21:41][I][data/gpt_dataset:191:megatron.data.gpt_dataset] [BuildConcatDataset] Caught args.shuffle_sample_in_corpus=True across 528552 samples
> WARNING: could not find index map files for blendable dataset, building indices on rank 0 ...
> building indices for blendable datasets ...
> sample ratios:
dataset 0, input: 0.0170699, achieved: 0.0170699
dataset 1, input: 0.0300583, achieved: 0.0300583
dataset 2, input: 0.00720419, achieved: 0.00720419
dataset 3, input: 0.0390113, achieved: 0.0390113
dataset 4, input: 0.353979, achieved: 0.353979
dataset 5, input: 0.176417, achieved: 0.176417
dataset 6, input: 0.00654659, achieved: 0.00654659
dataset 7, input: 0.0173636, achieved: 0.0173636
dataset 8, input: 0.0685377, achieved: 0.0685377
dataset 9, input: 0.0442162, achieved: 0.0442162
dataset 10, input: 0.026269, achieved: 0.026269
dataset 11, input: 0.189738, achieved: 0.189738
dataset 12, input: 0.0182033, achieved: 0.0182033
dataset 13, input: 0.00538529, achieved: 0.00538529
[2025-03-12 09:21:55][I][data/blendable_dataset:52:megatron.data.blendable_dataset] > elapsed time for building blendable dataset indices: 13.57 (sec)
[2025-03-12 09:22:00][I][data/blendable_dataset:87:megatron.data.blendable_dataset] > finished saving index map files in 4.8593598260194995 seconds
[2025-03-12 09:22:00][I][data/blendable_dataset:112:megatron.data.blendable_dataset] > loading blendable dataset index: checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/.cache/dolma/index-cache/3e486b3f0fe13ac22e1cefff571f6e28_index.npy
[2025-03-12 09:22:00][I][data/blendable_dataset:115:megatron.data.blendable_dataset] > loading blendable dataset sample index: checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/.cache/dolma/index-cache/3e486b3f0fe13ac22e1cefff571f6e28_sample_index.npy
[2025-03-12 09:22:00][I][data/blendable_dataset:118:megatron.data.blendable_dataset] > finished loading in 0.02210795701830648 seconds
[2025-03-12 09:22:00][I][data/blendable_dataset:130:megatron.data.blendable_dataset] > size of blendable dataset: 490723579 samples
> WARNING: could not find index map files for blendable dataset, building indices on rank 0 ...
> building indices for blendable datasets ...
> sample ratios:
dataset 0, input: 0.0170698, achieved: 0.0170698
dataset 1, input: 0.0300584, achieved: 0.0300584
dataset 2, input: 0.00720414, achieved: 0.00720414
dataset 3, input: 0.0390117, achieved: 0.0390117
dataset 4, input: 0.35398, achieved: 0.35398
dataset 5, input: 0.176418, achieved: 0.176418
dataset 6, input: 0.00654759, achieved: 0.00654759
dataset 7, input: 0.0173635, achieved: 0.0173635
dataset 8, input: 0.0685371, achieved: 0.0685371
dataset 9, input: 0.044216, achieved: 0.044216
dataset 10, input: 0.0262689, achieved: 0.0262689
dataset 11, input: 0.189737, achieved: 0.189737
dataset 12, input: 0.0182034, achieved: 0.0182034
dataset 13, input: 0.00538523, achieved: 0.00538523
[2025-03-12 09:22:03][I][data/blendable_dataset:52:megatron.data.blendable_dataset] > elapsed time for building blendable dataset indices: 2.72 (sec)
[2025-03-12 09:22:04][I][data/blendable_dataset:87:megatron.data.blendable_dataset] > finished saving index map files in 0.9886123439937364 seconds
[2025-03-12 09:22:04][I][data/blendable_dataset:112:megatron.data.blendable_dataset] > loading blendable dataset index: checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/.cache/dolma/index-cache/28ce9099b1b5fd7fcef2f8dd382241ce_index.npy
[2025-03-12 09:22:04][I][data/blendable_dataset:115:megatron.data.blendable_dataset] > loading blendable dataset sample index: checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/.cache/dolma/index-cache/28ce9099b1b5fd7fcef2f8dd382241ce_sample_index.npy
[2025-03-12 09:22:04][I][data/blendable_dataset:118:megatron.data.blendable_dataset] > finished loading in 0.01950894997571595 seconds
[2025-03-12 09:22:04][I][data/blendable_dataset:130:megatron.data.blendable_dataset] > size of blendable dataset: 98148401 samples
[2025-03-12 09:22:04][I][Megatron-DeepSpeed/pretrain_gpt_alcf:515:__main__] > finished creating GPT datasets. Took: 146370014386706.56250s
[2025-03-12 09:22:04][I][ezpz/dist:125] `train_valid_test_datasets_provider`(([488280960, 97658880, 7680],)) took: dt=499.5761s
[2025-03-12 09:22:04][I][ezpz/dist:125] `build_train_valid_test_datasets`((<function train_valid_test_datasets_provider at 0x152f6ca8ab90>,)) took: dt=499.5786s
[2025-03-12 09:22:04][I][ezpz/dist:125] `build_train_valid_test_data_loaders`((<function train_valid_test_datasets_provider at 0x152f6ca8ab90>,)) took: dt=499.5840s
[2025-03-12 09:22:08][I][ezpz/dist:125] `build_train_valid_test_data_iterators`((<function train_valid_test_datasets_provider at 0x152f6ca8ab90>,)) took: dt=503.1383s
[2025-03-12 09:22:08][I][megatron/training:96] [after dataloaders are built] datetime=2025-03-12 09:22:08
[2025-03-12 09:22:08][I][megatron/training:287] done with setup ...
(min, max) time across ranks (ms):
model-and-optimizer-setup ......................: (31352.08, 31398.10)
train/valid/test-data-iterators-setup ..........: (502846.19, 503390.32)
[2025-03-12 09:22:08][I][megatron/training:293] training ...
[2025-03-12 09:22:08][I][megatron/training:96] [before the start of training step] datetime=2025-03-12 09:22:08
[2025-03-12 09:22:46,305] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | optimizer_allgather: 385.34 | optimizer_gradients: 81.01 | optimizer_step: 187.65
[2025-03-12 09:22:46,306] [INFO] [logging.py:128:log_dist] [Rank 0] step=1, skipped=0, lr=[3.1457298683118837e-09, 3.1457298683118837e-09], mom=[(0.9, 0.95), (0.9, 0.95)]
[2025-03-12 09:22:46,306] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd_microstep: 14022.90 | bwd_microstep: 19792.51 | bwd_inner_microstep: 19070.74 | bwd_allreduce_microstep: 721.43 | step_microstep: 1070.39
[2025-03-12 09:22:46,306] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd: 14022.92 | bwd: 19792.50 | bwd_inner: 19070.80 | bwd_allreduce: 721.43 | step: 1070.39
[2025-03-12 09:22:46][I][megatron/training_log:661] iteration= 1/ 1271565 | consumed_samples= 384 | consumed_tokens= 1572864 | elapsed_time_per_iteration_ms=38000.2 | learning_rate=3.14573e-09 | global_batch_size= 384 | lm loss=11.170761 | loss_scale=1.0 | grad_norm=10.663 | actual_seqlen= 4096 | number_of_skipped_iterations= 0 | number_of_nan_iterations= 0 | samples_per_second=10.105 | tokens_per_gpu_per_second_tgs=1724.620 | [LM]TFLOPs=71.15 | [DS]TFLOPs=68.37 |
[2025-03-12 09:22:46][I][megatron/utils:249] [Rank 0] (after 1 iterations) memory (MB) | allocated: 14149.64111328125 | max allocated: 45063.8935546875 | reserved: 50822.0 | max reserved: 50822.0
(min, max) time across ranks (ms):
forward-backward ...............................: (36879.78, 36881.36)
optimizer ......................................: (1069.44, 1070.79)
[2025-03-12 09:23:13,766] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | optimizer_allgather: 222.97 | optimizer_gradients: 0.55 | optimizer_step: 1.05
[2025-03-12 09:23:13,766] [INFO] [logging.py:128:log_dist] [Rank 0] step=2, skipped=0, lr=[6.291459736623767e-09, 6.291459736623767e-09], mom=[(0.9, 0.95), (0.9, 0.95)]
[2025-03-12 09:23:13,766] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd_microstep: 7744.55 | bwd_microstep: 19367.09 | bwd_inner_microstep: 18726.46 | bwd_allreduce_microstep: 640.33 | step_microstep: 237.10
[2025-03-12 09:23:13,766] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd: 7744.59 | bwd: 19367.09 | bwd_inner: 18726.52 | bwd_allreduce: 640.32 | step: 237.10
[2025-03-12 09:23:13][I][megatron/training_log:661] iteration= 2/ 1271565 | consumed_samples= 768 | consumed_tokens= 3145728 | elapsed_time_per_iteration_ms=27455.8 | learning_rate=6.29146e-09 | global_batch_size= 384 | lm loss=11.167736 | loss_scale=1.0 | grad_norm=10.859 | actual_seqlen= 4096 | number_of_skipped_iterations= 0 | number_of_nan_iterations= 0 | samples_per_second=13.986 | tokens_per_gpu_per_second_tgs=2386.962 | [LM]TFLOPs=98.47 | [DS]TFLOPs=94.63 |
(min, max) time across ranks (ms):
forward-backward ...............................: (27194.87, 27195.76)
optimizer ......................................: (236.57, 237.40)
[2025-03-12 09:23:13,791] [INFO] [profiler.py:82:start_profile] Flops profiler started
[2025-03-12 09:23:15,366] [INFO] [profiler.py:82:start_profile] Flops profiler started
[2025-03-12 09:23:17,028] [INFO] [profiler.py:82:start_profile] Flops profiler started
[2025-03-12 09:23:18,697] [INFO] [profiler.py:82:start_profile] Flops profiler started
[2025-03-12 09:23:20,369] [INFO] [profiler.py:82:start_profile] Flops profiler started
[2025-03-12 09:23:22,043] [INFO] [profiler.py:82:start_profile] Flops profiler started
[2025-03-12 09:23:23,712] [INFO] [profiler.py:82:start_profile] Flops profiler started
[2025-03-12 09:23:25,388] [INFO] [profiler.py:82:start_profile] Flops profiler started
[2025-03-12 09:23:27,058] [INFO] [profiler.py:82:start_profile] Flops profiler started
[2025-03-12 09:23:28,733] [INFO] [profiler.py:82:start_profile] Flops profiler started
[2025-03-12 09:23:30,408] [INFO] [profiler.py:82:start_profile] Flops profiler started
[2025-03-12 09:23:32,080] [INFO] [profiler.py:82:start_profile] Flops profiler started
[2025-03-12 09:23:33,751] [INFO] [profiler.py:82:start_profile] Flops profiler started
[2025-03-12 09:23:35,425] [INFO] [profiler.py:82:start_profile] Flops profiler started
[2025-03-12 09:23:37,099] [INFO] [profiler.py:82:start_profile] Flops profiler started
[2025-03-12 09:23:38,769] [INFO] [profiler.py:82:start_profile] Flops profiler started
[2025-03-12 09:23:41,311] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | optimizer_allgather: 222.98 | optimizer_gradients: 0.57 | optimizer_step: 1.04
[2025-03-12 09:23:41,311] [INFO] [logging.py:128:log_dist] [Rank 0] step=3, skipped=0, lr=[9.43718960493565e-09, 9.43718960493565e-09], mom=[(0.9, 0.95), (0.9, 0.95)]
[2025-03-12 09:23:41,312] [INFO] [timer.py:264:stop] epoch=0/micro_step=3/global_step=3, RunningAvgSamplesPerSec=152.28062072819174, CurrSamplesPerSec=152.28056033918585, MemAllocated=13.82GB, MaxMemAllocated=45.85GB
.netrc conda-install.log
-------------------------- DeepSpeed Flops Profiler --------------------------
Profile Summary at step 3:
Notations:
data parallel size (dp_size), model parallel size(mp_size),
number of parameters (params), number of multiply-accumulate operations(MACs),
number of floating-point operations (flops), floating-point operations per second (FLOPS),
fwd latency (forward propagation latency), bwd latency (backward propagation latency),
step (weights update latency), iter latency (sum of fwd, bwd and step latency)
.zsh_history-aurora-combined test.sh.o2466437
world size: 24
data parallel size: 24
model parallel size: 1
batch size per GPU: 1
params per GPU: 5.93 B
params of model = params per GPU * mp_size: 5.93 B
fwd MACs per GPU: 94.13 TMACs
fwd flops per GPU: 188.27 T
fwd flops of model = fwd flops per GPU * mp_size: 188.27 T
fwd latency: 8 s
fwd FLOPS per GPU = fwd flops per GPU / fwd latency: 23.52 TFLOPS
bwd latency: 18.84 s
bwd FLOPS per GPU = 2 * fwd flops per GPU / bwd latency: 19.98 TFLOPS
fwd+bwd FLOPS per GPU = 3 * fwd flops per GPU / (fwd+bwd latency): 21.04 TFLOPS
step latency: 237.44 ms
iter latency: 27.09 s
FLOPS per GPU = 3 * fwd flops per GPU / iter latency: 20.85 TFLOPS
samples/second: 0.89
core.96710 intel_extension_for_pytorch-2.
----------------------------- Aggregated Profile per GPU -----------------------------
Top 1 modules in terms of params, MACs or fwd latency at different model depths:
depth 0:
params - {'GPTModel': '5.93 B'}
MACs - {'GPTModel': '94.13 TMACs'}
fwd latency - {'GPTModel': '503.85 ms'}
depth 1:
params - {'TransformerLanguageModel': '5.93 B'}
MACs - {'TransformerLanguageModel': '93.6 TMACs'}
fwd latency - {'TransformerLanguageModel': '489.39 ms'}
depth 2:
params - {'ParallelTransformer': '5.67 B'}
MACs - {'ParallelTransformer': '93.6 TMACs'}
fwd latency - {'ParallelTransformer': '488.06 ms'}
depth 3:
params - {'ModuleList': '5.67 B'}
MACs - {'ModuleList': '93.6 TMACs'}
fwd latency - {'ModuleList': '486.47 ms'}
depth 4:
params - {'ParallelTransformerLayer': '5.67 B'}
MACs - {'ParallelTransformerLayer': '93.6 TMACs'}
fwd latency - {'ParallelTransformerLayer': '486.47 ms'}
depth 5:
params - {'ParallelMLP': '4.33 B'}
MACs - {'ParallelAttention': '75.87 TMACs'}
fwd latency - {'ParallelAttention': '228.3 ms'}
scripts .tach deps .gitignore test.sh.e2991275
------------------------------ Detailed Profile per GPU ------------------------------
Each module profile is listed after its name in the following order:
params, percentage of total params, MACs, percentage of total MACs, fwd latency, percentage of total fwd latency, fwd FLOPS
Note: 1. A module can have torch.nn.module or torch.nn.functional to compute logits (e.g. CrossEntropyLoss). They are not counted as submodules, thus not to be printed out. However they make up the difference between a parent's MACs (or latency) and the sum of its submodules'.
2. Number of floating-point operations is a theoretical estimation, thus FLOPS computed using that could be larger than the maximum system throughput.
3. The fwd latency listed in the top module's profile is directly captured at the module forward function in PyTorch, thus it's less than the fwd latency shown above which is captured in DeepSpeed.
[1] 12996 exit 255 ssh login.aurora
GPTModel(
5.93 B = 100% Params, 94.13 TMACs = 100% MACs, 503.85 ms = 100% latency, 373.66 TFLOPS
(language_model): TransformerLanguageModel(
5.93 B = 100% Params, 93.6 TMACs = 99.43% MACs, 489.39 ms = 97.13% latency, 382.51 TFLOPS
(embedding): Embedding(
131.07 M = 2.21% Params, 0 MACs = 0% MACs, 813.72 us = 0.16% latency, 0 FLOPS
(word_embeddings): VocabParallelEmbedding(131.07 M = 2.21% Params, 0 MACs = 0% MACs, 692.61 us = 0.14% latency, 0 FLOPS)
(embedding_dropout): Dropout(0 = 0% Params, 0 MACs = 0% MACs, 36.24 us = 0.01% latency, 0 FLOPS, p=0.0, inplace=False)
)
(rotary_pos_emb): RotaryEmbedding(0 = 0% Params, 0 MACs = 0% MACs, 402.21 us = 0.08% latency, 0 FLOPS)
(encoder): ParallelTransformer(
5.67 B = 95.58% Params, 93.6 TMACs = 99.43% MACs, 488.06 ms = 96.87% latency, 383.54 TFLOPS
(layers): ModuleList(
(0): ParallelTransformerLayer(
177.22 M = 2.99% Params, 2.92 TMACs = 3.11% MACs, 13.6 ms = 2.7% latency, 430.29 TFLOPS
(input_layernorm): RMSNorm(4.1 K = 0% Params, 0 MACs = 0% MACs, 222.68 us = 0.04% latency, 0 FLOPS)
(self_attention): ParallelAttention(
41.94 M = 0.71% Params, 2.37 TMACs = 2.52% MACs, 6.34 ms = 1.26% latency, 747.58 TFLOPS
(query_key_value): ColumnParallelLinear(25.17 M = 0.42% Params, 103.08 GMACs = 0.11% MACs, 1.59 ms = 0.32% latency, 129.33 TFLOPS)
(core_attention_flash): FlashSelfAttention(0 = 0% Params, 2.2 TMACs = 2.34% MACs, 2.68 ms = 0.53% latency, 1640.15 TFLOPS)
(dense): RowParallelLinear(16.78 M = 0.28% Params, 68.72 GMACs = 0.07% MACs, 771.05 us = 0.15% latency, 178.25 TFLOPS)
)
(post_attention_layernorm): RMSNorm(4.1 K = 0% Params, 0 MACs = 0% MACs, 251.29 us = 0.05% latency, 0 FLOPS)
(mlp): ParallelMLP(
135.27 M = 2.28% Params, 554.05 GMACs = 0.59% MACs, 6.41 ms = 1.27% latency, 172.98 TFLOPS
(dense_h_to_4h): ColumnParallelLinear(90.18 M = 1.52% Params, 369.37 GMACs = 0.39% MACs, 3.41 ms = 0.68% latency, 216.46 TFLOPS)
(dense_4h_to_h): RowParallelLinear(45.09 M = 0.76% Params, 184.68 GMACs = 0.2% MACs, 1.96 ms = 0.39% latency, 188.66 TFLOPS)
)
)
(1): ParallelTransformerLayer(
177.22 M = 2.99% Params, 2.92 TMACs = 3.11% MACs, 14.32 ms = 2.84% latency, 408.54 TFLOPS
(input_layernorm): RMSNorm(4.1 K = 0% Params, 0 MACs = 0% MACs, 285.15 us = 0.06% latency, 0 FLOPS)
(self_attention): ParallelAttention(
41.94 M = 0.71% Params, 2.37 TMACs = 2.52% MACs, 6.76 ms = 1.34% latency, 701.14 TFLOPS
(query_key_value): ColumnParallelLinear(25.17 M = 0.42% Params, 103.08 GMACs = 0.11% MACs, 1.15 ms = 0.23% latency, 179.51 TFLOPS)
(core_attention_flash): FlashSelfAttention(0 = 0% Params, 2.2 TMACs = 2.34% MACs, 3.31 ms = 0.66% latency, 1328.15 TFLOPS)
(dense): RowParallelLinear(16.78 M = 0.28% Params, 68.72 GMACs = 0.07% MACs, 838.28 us = 0.17% latency, 163.95 TFLOPS)
)
(post_attention_layernorm): RMSNorm(4.1 K = 0% Params, 0 MACs = 0% MACs, 263.21 us = 0.05% latency, 0 FLOPS)
(mlp): ParallelMLP(
135.27 M = 2.28% Params, 554.05 GMACs = 0.59% MACs, 6.64 ms = 1.32% latency, 167 TFLOPS
(dense_h_to_4h): ColumnParallelLinear(90.18 M = 1.52% Params, 369.37 GMACs = 0.39% MACs, 3.63 ms = 0.72% latency, 203.3 TFLOPS)ora-pbs-0001.hostmgmt.cm.aurora.alcf.anl.gov:
(dense_4h_to_h): RowParallelLinear(45.09 M = 0.76% Params, 184.68 GMACs = 0.2% MACs, 1.94 ms = 0.38% latency, 190.53 TFLOPS)
)
)
(2): ParallelTransformerLayer(
177.22 M = 2.99% Params, 2.92 TMACs = 3.11% MACs, 14.72 ms = 2.92% latency, 397.35 TFLOPS
(input_layernorm): RMSNorm(4.1 K = 0% Params, 0 MACs = 0% MACs, 289.92 us = 0.06% latency, 0 FLOPS)
(self_attention): ParallelAttention(
41.94 M = 0.71% Params, 2.37 TMACs = 2.52% MACs, 6.51 ms = 1.29% latency, 728.49 TFLOPS
(query_key_value): ColumnParallelLinear(25.17 M = 0.42% Params, 103.08 GMACs = 0.11% MACs, 1.19 ms = 0.24% latency, 173.15 TFLOPS)
(core_attention_flash): FlashSelfAttention(0 = 0% Params, 2.2 TMACs = 2.34% MACs, 3.06 ms = 0.61% latency, 1436.55 TFLOPS)
(dense): RowParallelLinear(16.78 M = 0.28% Params, 68.72 GMACs = 0.07% MACs, 816.82 us = 0.16% latency, 168.26 TFLOPS)
)
(post_attention_layernorm): RMSNorm(4.1 K = 0% Params, 0 MACs = 0% MACs, 261.55 us = 0.05% latency, 0 FLOPS)
(mlp): ParallelMLP(
135.27 M = 2.28% Params, 554.05 GMACs = 0.59% MACs, 7.24 ms = 1.44% latency, 153.13 TFLOPS
(dense_h_to_4h): ColumnParallelLinear(90.18 M = 1.52% Params, 369.37 GMACs = 0.39% MACs, 3.75 ms = 0.74% latency, 196.93 TFLOPS)
(dense_4h_to_h): RowParallelLinear(45.09 M = 0.76% Params, 184.68 GMACs = 0.2% MACs, 2.37 ms = 0.47% latency, 156.02 TFLOPS)
)
)
(3): ParallelTransformerLayer(
177.22 M = 2.99% Params, 2.92 TMACs = 3.11% MACs, 17.14 ms = 3.4% latency, 341.2 TFLOPS
(input_layernorm): RMSNorm(4.1 K = 0% Params, 0 MACs = 0% MACs, 401.26 us = 0.08% latency, 0 FLOPS)
(self_attention): ParallelAttention(
41.94 M = 0.71% Params, 2.37 TMACs = 2.52% MACs, 8.4 ms = 1.67% latency, 564.23 TFLOPS
(query_key_value): ColumnParallelLinear(25.17 M = 0.42% Params, 103.08 GMACs = 0.11% MACs, 1.61 ms = 0.32% latency, 127.86 TFLOPS)
(core_attention_flash): FlashSelfAttention(0 = 0% Params, 2.2 TMACs = 2.34% MACs, 3.93 ms = 0.78% latency, 1119.61 TFLOPS)
(dense): RowParallelLinear(16.78 M = 0.28% Params, 68.72 GMACs = 0.07% MACs, 872.85 us = 0.17% latency, 157.46 TFLOPS)
)
(post_attention_layernorm): RMSNorm(4.1 K = 0% Params, 0 MACs = 0% MACs, 297.78 us = 0.06% latency, 0 FLOPS)
(mlp): ParallelMLP(
135.27 M = 2.28% Params, 554.05 GMACs = 0.59% MACs, 7.6 ms = 1.51% latency, 145.82 TFLOPS
(dense_h_to_4h): ColumnParallelLinear(90.18 M = 1.52% Params, 369.37 GMACs = 0.39% MACs, 3.77 ms = 0.75% latency, 195.92 TFLOPS)
(dense_4h_to_h): RowParallelLinear(45.09 M = 0.76% Params, 184.68 GMACs = 0.2% MACs, 2.62 ms = 0.52% latency, 141.06 TFLOPS)
)
)
(4): ParallelTransformerLayer(
177.22 M = 2.99% Params, 2.92 TMACs = 3.11% MACs, 16.14 ms = 3.2% latency, 362.37 TFLOPS
(input_layernorm): RMSNorm(4.1 K = 0% Params, 0 MACs = 0% MACs, 411.99 us = 0.08% latency, 0 FLOPS)
(self_attention): ParallelAttention(
41.94 M = 0.71% Params, 2.37 TMACs = 2.52% MACs, 7.78 ms = 1.54% latency, 609.61 TFLOPS
(query_key_value): ColumnParallelLinear(25.17 M = 0.42% Params, 103.08 GMACs = 0.11% MACs, 1.5 ms = 0.3% latency, 137.21 TFLOPS)
(core_attention_flash): FlashSelfAttention(0 = 0% Params, 2.2 TMACs = 2.34% MACs, 3.46 ms = 0.69% latency, 1270.79 TFLOPS)
(dense): RowParallelLinear(16.78 M = 0.28% Params, 68.72 GMACs = 0.07% MACs, 923.63 us = 0.18% latency, 148.8 TFLOPS)
)
(post_attention_layernorm): RMSNorm(4.1 K = 0% Params, 0 MACs = 0% MACs, 308.51 us = 0.06% latency, 0 FLOPS)
(mlp): ParallelMLP(
135.27 M = 2.28% Params, 554.05 GMACs = 0.59% MACs, 7.22 ms = 1.43% latency, 153.44 TFLOPS
(dense_h_to_4h): ColumnParallelLinear(90.18 M = 1.52% Params, 369.37 GMACs = 0.39% MACs, 3.79 ms = 0.75% latency, 195.08 TFLOPS)
(dense_4h_to_h): RowParallelLinear(45.09 M = 0.76% Params, 184.68 GMACs = 0.2% MACs, 2.29 ms = 0.45% latency, 161.51 TFLOPS)
)
)
(5): ParallelTransformerLayer(
177.22 M = 2.99% Params, 2.92 TMACs = 3.11% MACs, 15.75 ms = 3.13% latency, 371.37 TFLOPS
(input_layernorm): RMSNorm(4.1 K = 0% Params, 0 MACs = 0% MACs, 360.01 us = 0.07% latency, 0 FLOPS)
(self_attention): ParallelAttention(
41.94 M = 0.71% Params, 2.37 TMACs = 2.52% MACs, 6.99 ms = 1.39% latency, 678.54 TFLOPS
(query_key_value): ColumnParallelLinear(25.17 M = 0.42% Params, 103.08 GMACs = 0.11% MACs, 1.43 ms = 0.28% latency, 143.83 TFLOPS)
(core_attention_flash): FlashSelfAttention(0 = 0% Params, 2.2 TMACs = 2.34% MACs, 3.15 ms = 0.63% latency, 1396.21 TFLOPS)
(dense): RowParallelLinear(16.78 M = 0.28% Params, 68.72 GMACs = 0.07% MACs, 804.42 us = 0.16% latency, 170.85 TFLOPS)
)
(post_attention_layernorm): RMSNorm(4.1 K = 0% Params, 0 MACs = 0% MACs, 283.24 us = 0.06% latency, 0 FLOPS)
(mlp): ParallelMLP(
135.27 M = 2.28% Params, 554.05 GMACs = 0.59% MACs, 7.72 ms = 1.53% latency, 143.56 TFLOPS
(dense_h_to_4h): ColumnParallelLinear(90.18 M = 1.52% Params, 369.37 GMACs = 0.39% MACs, 3.9 ms = 0.77% latency, 189.22 TFLOPS)ostfile conda-install.log
(dense_4h_to_h): RowParallelLinear(45.09 M = 0.76% Params, 184.68 GMACs = 0.2% MACs, 2.52 ms = 0.5% latency, 146.49 TFLOPS)
)
)
(6): ParallelTransformerLayer(
177.22 M = 2.99% Params, 2.92 TMACs = 3.11% MACs, 15.9 ms = 3.15% latency, 368.02 TFLOPS
(input_layernorm): RMSNorm(4.1 K = 0% Params, 0 MACs = 0% MACs, 411.99 us = 0.08% latency, 0 FLOPS)
(self_attention): ParallelAttention(
41.94 M = 0.71% Params, 2.37 TMACs = 2.52% MACs, 7.01 ms = 1.39% latency, 676.76 TFLOPS
(query_key_value): ColumnParallelLinear(25.17 M = 0.42% Params, 103.08 GMACs = 0.11% MACs, 1.4 ms = 0.28% latency, 147.18 TFLOPS)
(core_attention_flash): FlashSelfAttention(0 = 0% Params, 2.2 TMACs = 2.34% MACs, 3.17 ms = 0.63% latency, 1386.14 TFLOPS)
(dense): RowParallelLinear(16.78 M = 0.28% Params, 68.72 GMACs = 0.07% MACs, 801.32 us = 0.16% latency, 171.51 TFLOPS)
)
(post_attention_layernorm): RMSNorm(4.1 K = 0% Params, 0 MACs = 0% MACs, 310.18 us = 0.06% latency, 0 FLOPS)
(mlp): ParallelMLP(
135.27 M = 2.28% Params, 554.05 GMACs = 0.59% MACs, 7.74 ms = 1.54% latency, 143.22 TFLOPS
(dense_h_to_4h): ColumnParallelLinear(90.18 M = 1.52% Params, 369.37 GMACs = 0.39% MACs, 3.98 ms = 0.79% latency, 185.46 TFLOPS)
(dense_4h_to_h): RowParallelLinear(45.09 M = 0.76% Params, 184.68 GMACs = 0.2% MACs, 2.44 ms = 0.48% latency, 151.26 TFLOPS)
)
)
(7): ParallelTransformerLayer(
177.22 M = 2.99% Params, 2.92 TMACs = 3.11% MACs, 15.6 ms = 3.1% latency, 375.06 TFLOPS
(input_layernorm): RMSNorm(4.1 K = 0% Params, 0 MACs = 0% MACs, 342.61 us = 0.07% latency, 0 FLOPS)
(self_attention): ParallelAttention(
41.94 M = 0.71% Params, 2.37 TMACs = 2.52% MACs, 6.83 ms = 1.36% latency, 694.05 TFLOPS
(query_key_value): ColumnParallelLinear(25.17 M = 0.42% Params, 103.08 GMACs = 0.11% MACs, 1.31 ms = 0.26% latency, 156.82 TFLOPS)
(core_attention_flash): FlashSelfAttention(0 = 0% Params, 2.2 TMACs = 2.34% MACs, 3.14 ms = 0.62% latency, 1402.05 TFLOPS)
(dense): RowParallelLinear(16.78 M = 0.28% Params, 68.72 GMACs = 0.07% MACs, 886.92 us = 0.18% latency, 154.96 TFLOPS)
)
(post_attention_layernorm): RMSNorm(4.1 K = 0% Params, 0 MACs = 0% MACs, 308.28 us = 0.06% latency, 0 FLOPS)
(mlp): ParallelMLP(
135.27 M = 2.28% Params, 554.05 GMACs = 0.59% MACs, 7.71 ms = 1.53% latency, 143.76 TFLOPS
(dense_h_to_4h): ColumnParallelLinear(90.18 M = 1.52% Params, 369.37 GMACs = 0.39% MACs, 4.04 ms = 0.8% latency, 183.04 TFLOPS)ore.96866 2024-1.yml
(dense_4h_to_h): RowParallelLinear(45.09 M = 0.76% Params, 184.68 GMACs = 0.2% MACs, 2.34 ms = 0.46% latency, 158.05 TFLOPS)
)
)
(8): ParallelTransformerLayer(
177.22 M = 2.99% Params, 2.92 TMACs = 3.11% MACs, 15.36 ms = 3.05% latency, 380.94 TFLOPS
(input_layernorm): RMSNorm(4.1 K = 0% Params, 0 MACs = 0% MACs, 337.84 us = 0.07% latency, 0 FLOPS)
(self_attention): ParallelAttention(
41.94 M = 0.71% Params, 2.37 TMACs = 2.52% MACs, 7.12 ms = 1.41% latency, 665.7 TFLOPS
(query_key_value): ColumnParallelLinear(25.17 M = 0.42% Params, 103.08 GMACs = 0.11% MACs, 1.25 ms = 0.25% latency, 164.95 TFLOPS)
(core_attention_flash): FlashSelfAttention(0 = 0% Params, 2.2 TMACs = 2.34% MACs, 3.43 ms = 0.68% latency, 1282.45 TFLOPS)
(dense): RowParallelLinear(16.78 M = 0.28% Params, 68.72 GMACs = 0.07% MACs, 936.27 us = 0.19% latency, 146.79 TFLOPS)
)
(post_attention_layernorm): RMSNorm(4.1 K = 0% Params, 0 MACs = 0% MACs, 339.98 us = 0.07% latency, 0 FLOPS)
(mlp): ParallelMLP(
135.27 M = 2.28% Params, 554.05 GMACs = 0.59% MACs, 7.16 ms = 1.42% latency, 154.82 TFLOPS
(dense_h_to_4h): ColumnParallelLinear(90.18 M = 1.52% Params, 369.37 GMACs = 0.39% MACs, 3.89 ms = 0.77% latency, 190.11 TFLOPS)
(dense_4h_to_h): RowParallelLinear(45.09 M = 0.76% Params, 184.68 GMACs = 0.2% MACs, 2.11 ms = 0.42% latency, 174.78 TFLOPS)
)
)
(9): ParallelTransformerLayer(
177.22 M = 2.99% Params, 2.92 TMACs = 3.11% MACs, 15.43 ms = 3.06% latency, 379.19 TFLOPS
(input_layernorm): RMSNorm(4.1 K = 0% Params, 0 MACs = 0% MACs, 302.55 us = 0.06% latency, 0 FLOPS)
(self_attention): ParallelAttention(
41.94 M = 0.71% Params, 2.37 TMACs = 2.52% MACs, 7.19 ms = 1.43% latency, 659.48 TFLOPS
(query_key_value): ColumnParallelLinear(25.17 M = 0.42% Params, 103.08 GMACs = 0.11% MACs, 1.24 ms = 0.25% latency, 166.19 TFLOPS)
(core_attention_flash): FlashSelfAttention(0 = 0% Params, 2.2 TMACs = 2.34% MACs, 3.48 ms = 0.69% latency, 1264.86 TFLOPS)
(dense): RowParallelLinear(16.78 M = 0.28% Params, 68.72 GMACs = 0.07% MACs, 920.06 us = 0.18% latency, 149.38 TFLOPS)
)
(post_attention_layernorm): RMSNorm(4.1 K = 0% Params, 0 MACs = 0% MACs, 317.57 us = 0.06% latency, 0 FLOPS)
(mlp): ParallelMLP(
135.27 M = 2.28% Params, 554.05 GMACs = 0.59% MACs, 7.22 ms = 1.43% latency, 153.4 TFLOPS
(dense_h_to_4h): ColumnParallelLinear(90.18 M = 1.52% Params, 369.37 GMACs = 0.39% MACs, 3.95 ms = 0.78% latency, 186.89 TFLOPS)
(dense_4h_to_h): RowParallelLinear(45.09 M = 0.76% Params, 184.68 GMACs = 0.2% MACs, 2.14 ms = 0.42% latency, 172.75 TFLOPS)
)
)
(10): ParallelTransformerLayer(
177.22 M = 2.99% Params, 2.92 TMACs = 3.11% MACs, 15.42 ms = 3.06% latency, 379.45 TFLOPS
(input_layernorm): RMSNorm(4.1 K = 0% Params, 0 MACs = 0% MACs, 297.55 us = 0.06% latency, 0 FLOPS)
(self_attention): ParallelAttention(
41.94 M = 0.71% Params, 2.37 TMACs = 2.52% MACs, 7.51 ms = 1.49% latency, 631.06 TFLOPS
(query_key_value): ColumnParallelLinear(25.17 M = 0.42% Params, 103.08 GMACs = 0.11% MACs, 1.27 ms = 0.25% latency, 161.9 TFLOPS)
(core_attention_flash): FlashSelfAttention(0 = 0% Params, 2.2 TMACs = 2.34% MACs, 3.63 ms = 0.72% latency, 1212.33 TFLOPS)
(dense): RowParallelLinear(16.78 M = 0.28% Params, 68.72 GMACs = 0.07% MACs, 926.97 us = 0.18% latency, 148.27 TFLOPS)
)
(post_attention_layernorm): RMSNorm(4.1 K = 0% Params, 0 MACs = 0% MACs, 327.35 us = 0.06% latency, 0 FLOPS)
(mlp): ParallelMLP(
135.27 M = 2.28% Params, 554.05 GMACs = 0.59% MACs, 6.86 ms = 1.36% latency, 161.55 TFLOPS
(dense_h_to_4h): ColumnParallelLinear(90.18 M = 1.52% Params, 369.37 GMACs = 0.39% MACs, 3.74 ms = 0.74% latency, 197.76 TFLOPS)
(dense_4h_to_h): RowParallelLinear(45.09 M = 0.76% Params, 184.68 GMACs = 0.2% MACs, 2.08 ms = 0.41% latency, 177.89 TFLOPS)
)
)
(11): ParallelTransformerLayer(
177.22 M = 2.99% Params, 2.92 TMACs = 3.11% MACs, 15.63 ms = 3.1% latency, 374.33 TFLOPS
(input_layernorm): RMSNorm(4.1 K = 0% Params, 0 MACs = 0% MACs, 310.66 us = 0.06% latency, 0 FLOPS)
(self_attention): ParallelAttention(
41.94 M = 0.71% Params, 2.37 TMACs = 2.52% MACs, 7.7 ms = 1.53% latency, 615.93 TFLOPS
(query_key_value): ColumnParallelLinear(25.17 M = 0.42% Params, 103.08 GMACs = 0.11% MACs, 1.41 ms = 0.28% latency, 145.96 TFLOPS)
(core_attention_flash): FlashSelfAttention(0 = 0% Params, 2.2 TMACs = 2.34% MACs, 3.65 ms = 0.73% latency, 1203.47 TFLOPS)
(dense): RowParallelLinear(16.78 M = 0.28% Params, 68.72 GMACs = 0.07% MACs, 897.17 us = 0.18% latency, 153.19 TFLOPS)
)
(post_attention_layernorm): RMSNorm(4.1 K = 0% Params, 0 MACs = 0% MACs, 326.16 us = 0.06% latency, 0 FLOPS)
(mlp): ParallelMLP(
135.27 M = 2.28% Params, 554.05 GMACs = 0.59% MACs, 6.92 ms = 1.37% latency, 160.15 TFLOPS
(dense_h_to_4h): ColumnParallelLinear(90.18 M = 1.52% Params, 369.37 GMACs = 0.39% MACs, 3.76 ms = 0.75% latency, 196.73 TFLOPS)
(dense_4h_to_h): RowParallelLinear(45.09 M = 0.76% Params, 184.68 GMACs = 0.2% MACs, 2.08 ms = 0.41% latency, 177.24 TFLOPS)
)
)
(12): ParallelTransformerLayer(
177.22 M = 2.99% Params, 2.92 TMACs = 3.11% MACs, 15.82 ms = 3.14% latency, 369.79 TFLOPS
(input_layernorm): RMSNorm(4.1 K = 0% Params, 0 MACs = 0% MACs, 322.1 us = 0.06% latency, 0 FLOPS)
(self_attention): ParallelAttention(
41.94 M = 0.71% Params, 2.37 TMACs = 2.52% MACs, 7.68 ms = 1.52% latency, 617.46 TFLOPS
(query_key_value): ColumnParallelLinear(25.17 M = 0.42% Params, 103.08 GMACs = 0.11% MACs, 1.36 ms = 0.27% latency, 151.33 TFLOPS)
(core_attention_flash): FlashSelfAttention(0 = 0% Params, 2.2 TMACs = 2.34% MACs, 3.74 ms = 0.74% latency, 1177.13 TFLOPS)
(dense): RowParallelLinear(16.78 M = 0.28% Params, 68.72 GMACs = 0.07% MACs, 934.84 us = 0.19% latency, 147.02 TFLOPS)
)
(post_attention_layernorm): RMSNorm(4.1 K = 0% Params, 0 MACs = 0% MACs, 307.32 us = 0.06% latency, 0 FLOPS)
(mlp): ParallelMLP(
135.27 M = 2.28% Params, 554.05 GMACs = 0.59% MACs, 7.14 ms = 1.42% latency, 155.2 TFLOPS
(dense_h_to_4h): ColumnParallelLinear(90.18 M = 1.52% Params, 369.37 GMACs = 0.39% MACs, 3.89 ms = 0.77% latency, 190.04 TFLOPS)
(dense_4h_to_h): RowParallelLinear(45.09 M = 0.76% Params, 184.68 GMACs = 0.2% MACs, 2.13 ms = 0.42% latency, 173.35 TFLOPS)
)
)
(13): ParallelTransformerLayer(
177.22 M = 2.99% Params, 2.92 TMACs = 3.11% MACs, 15.85 ms = 3.15% latency, 369.05 TFLOPS
(input_layernorm): RMSNorm(4.1 K = 0% Params, 0 MACs = 0% MACs, 364.78 us = 0.07% latency, 0 FLOPS)
(self_attention): ParallelAttention(
41.94 M = 0.71% Params, 2.37 TMACs = 2.52% MACs, 7.54 ms = 1.5% latency, 629.2 TFLOPS
(query_key_value): ColumnParallelLinear(25.17 M = 0.42% Params, 103.08 GMACs = 0.11% MACs, 1.42 ms = 0.28% latency, 144.99 TFLOPS)
(core_attention_flash): FlashSelfAttention(0 = 0% Params, 2.2 TMACs = 2.34% MACs, 3.49 ms = 0.69% latency, 1260.8 TFLOPS)
(dense): RowParallelLinear(16.78 M = 0.28% Params, 68.72 GMACs = 0.07% MACs, 840.66 us = 0.17% latency, 163.49 TFLOPS)
)
(post_attention_layernorm): RMSNorm(4.1 K = 0% Params, 0 MACs = 0% MACs, 295.88 us = 0.06% latency, 0 FLOPS)
(mlp): ParallelMLP(
135.27 M = 2.28% Params, 554.05 GMACs = 0.59% MACs, 7.28 ms = 1.44% latency, 152.21 TFLOPS
(dense_h_to_4h): ColumnParallelLinear(90.18 M = 1.52% Params, 369.37 GMACs = 0.39% MACs, 3.94 ms = 0.78% latency, 187.28 TFLOPS)
(dense_4h_to_h): RowParallelLinear(45.09 M = 0.76% Params, 184.68 GMACs = 0.2% MACs, 2.19 ms = 0.43% latency, 168.58 TFLOPS)
)
)
(14): ParallelTransformerLayer(
177.22 M = 2.99% Params, 2.92 TMACs = 3.11% MACs, 16.01 ms = 3.18% latency, 365.46 TFLOPS
(input_layernorm): RMSNorm(4.1 K = 0% Params, 0 MACs = 0% MACs, 359.54 us = 0.07% latency, 0 FLOPS)
(self_attention): ParallelAttention(
41.94 M = 0.71% Params, 2.37 TMACs = 2.52% MACs, 7.54 ms = 1.5% latency, 628.51 TFLOPS
(query_key_value): ColumnParallelLinear(25.17 M = 0.42% Params, 103.08 GMACs = 0.11% MACs, 1.41 ms = 0.28% latency, 146.29 TFLOPS)
(core_attention_flash): FlashSelfAttention(0 = 0% Params, 2.2 TMACs = 2.34% MACs, 3.48 ms = 0.69% latency, 1264 TFLOPS)
(dense): RowParallelLinear(16.78 M = 0.28% Params, 68.72 GMACs = 0.07% MACs, 881.67 us = 0.17% latency, 155.88 TFLOPS)
)
(post_attention_layernorm): RMSNorm(4.1 K = 0% Params, 0 MACs = 0% MACs, 299.22 us = 0.06% latency, 0 FLOPS)
(mlp): ParallelMLP(
135.27 M = 2.28% Params, 554.05 GMACs = 0.59% MACs, 7.41 ms = 1.47% latency, 149.48 TFLOPS
(dense_h_to_4h): ColumnParallelLinear(90.18 M = 1.52% Params, 369.37 GMACs = 0.39% MACs, 3.97 ms = 0.79% latency, 186.08 TFLOPS)
(dense_4h_to_h): RowParallelLinear(45.09 M = 0.76% Params, 184.68 GMACs = 0.2% MACs, 2.3 ms = 0.46% latency, 160.84 TFLOPS)
)
)
(15): ParallelTransformerLayer(
177.22 M = 2.99% Params, 2.92 TMACs = 3.11% MACs, 15.95 ms = 3.17% latency, 366.75 TFLOPS
(input_layernorm): RMSNorm(4.1 K = 0% Params, 0 MACs = 0% MACs, 346.18 us = 0.07% latency, 0 FLOPS)
(self_attention): ParallelAttention(
41.94 M = 0.71% Params, 2.37 TMACs = 2.52% MACs, 7.39 ms = 1.47% latency, 641.75 TFLOPS
(query_key_value): ColumnParallelLinear(25.17 M = 0.42% Params, 103.08 GMACs = 0.11% MACs, 1.41 ms = 0.28% latency, 146.71 TFLOPS)
(core_attention_flash): FlashSelfAttention(0 = 0% Params, 2.2 TMACs = 2.34% MACs, 3.37 ms = 0.67% latency, 1306.43 TFLOPS)
(dense): RowParallelLinear(16.78 M = 0.28% Params, 68.72 GMACs = 0.07% MACs, 931.5 us = 0.18% latency, 147.55 TFLOPS)
)
(post_attention_layernorm): RMSNorm(4.1 K = 0% Params, 0 MACs = 0% MACs, 292.54 us = 0.06% latency, 0 FLOPS)
(mlp): ParallelMLP(
135.27 M = 2.28% Params, 554.05 GMACs = 0.59% MACs, 7.56 ms = 1.5% latency, 146.55 TFLOPS
(dense_h_to_4h): ColumnParallelLinear(90.18 M = 1.52% Params, 369.37 GMACs = 0.39% MACs, 3.85 ms = 0.77% latency, 191.64 TFLOPS)
(dense_4h_to_h): RowParallelLinear(45.09 M = 0.76% Params, 184.68 GMACs = 0.2% MACs, 2.41 ms = 0.48% latency, 153.25 TFLOPS)
)
)
(16): ParallelTransformerLayer(
177.22 M = 2.99% Params, 2.92 TMACs = 3.11% MACs, 15.57 ms = 3.09% latency, 375.83 TFLOPS
(input_layernorm): RMSNorm(4.1 K = 0% Params, 0 MACs = 0% MACs, 347.14 us = 0.07% latency, 0 FLOPS)
(self_attention): ParallelAttention(
41.94 M = 0.71% Params, 2.37 TMACs = 2.52% MACs, 6.74 ms = 1.34% latency, 703.1 TFLOPS
(query_key_value): ColumnParallelLinear(25.17 M = 0.42% Params, 103.08 GMACs = 0.11% MACs, 1.32 ms = 0.26% latency, 156.68 TFLOPS)
(core_attention_flash): FlashSelfAttention(0 = 0% Params, 2.2 TMACs = 2.34% MACs, 3.03 ms = 0.6% latency, 1449.65 TFLOPS)
(dense): RowParallelLinear(16.78 M = 0.28% Params, 68.72 GMACs = 0.07% MACs, 813.96 us = 0.16% latency, 168.85 TFLOPS)
)
(post_attention_layernorm): RMSNorm(4.1 K = 0% Params, 0 MACs = 0% MACs, 287.29 us = 0.06% latency, 0 FLOPS)
(mlp): ParallelMLP(
135.27 M = 2.28% Params, 554.05 GMACs = 0.59% MACs, 7.82 ms = 1.55% latency, 141.64 TFLOPS
(dense_h_to_4h): ColumnParallelLinear(90.18 M = 1.52% Params, 369.37 GMACs = 0.39% MACs, 4.12 ms = 0.82% latency, 179.41 TFLOPS)
(dense_4h_to_h): RowParallelLinear(45.09 M = 0.76% Params, 184.68 GMACs = 0.2% MACs, 2.41 ms = 0.48% latency, 153.03 TFLOPS)
)
)
(17): ParallelTransformerLayer(
177.22 M = 2.99% Params, 2.92 TMACs = 3.11% MACs, 15.46 ms = 3.07% latency, 378.41 TFLOPS
(input_layernorm): RMSNorm(4.1 K = 0% Params, 0 MACs = 0% MACs, 322.34 us = 0.06% latency, 0 FLOPS)
(self_attention): ParallelAttention(
41.94 M = 0.71% Params, 2.37 TMACs = 2.52% MACs, 6.78 ms = 1.35% latency, 699.51 TFLOPS
(query_key_value): ColumnParallelLinear(25.17 M = 0.42% Params, 103.08 GMACs = 0.11% MACs, 1.24 ms = 0.25% latency, 166.29 TFLOPS)
(core_attention_flash): FlashSelfAttention(0 = 0% Params, 2.2 TMACs = 2.34% MACs, 3.15 ms = 0.63% latency, 1394.1 TFLOPS)
(dense): RowParallelLinear(16.78 M = 0.28% Params, 68.72 GMACs = 0.07% MACs, 873.8 us = 0.17% latency, 157.29 TFLOPS)
)
(post_attention_layernorm): RMSNorm(4.1 K = 0% Params, 0 MACs = 0% MACs, 300.88 us = 0.06% latency, 0 FLOPS)
(mlp): ParallelMLP(
135.27 M = 2.28% Params, 554.05 GMACs = 0.59% MACs, 7.7 ms = 1.53% latency, 143.92 TFLOPS
(dense_h_to_4h): ColumnParallelLinear(90.18 M = 1.52% Params, 369.37 GMACs = 0.39% MACs, 4.02 ms = 0.8% latency, 183.71 TFLOPS)amed 'core.96915' -> 'ignore/core-dumps/core.96915'
(dense_4h_to_h): RowParallelLinear(45.09 M = 0.76% Params, 184.68 GMACs = 0.2% MACs, 2.34 ms = 0.46% latency, 157.73 TFLOPS)
)
)
(18): ParallelTransformerLayer(
177.22 M = 2.99% Params, 2.92 TMACs = 3.11% MACs, 15.37 ms = 3.05% latency, 380.63 TFLOPS
(input_layernorm): RMSNorm(4.1 K = 0% Params, 0 MACs = 0% MACs, 305.65 us = 0.06% latency, 0 FLOPS)
(self_attention): ParallelAttention(
41.94 M = 0.71% Params, 2.37 TMACs = 2.52% MACs, 6.78 ms = 1.35% latency, 699.64 TFLOPS
(query_key_value): ColumnParallelLinear(25.17 M = 0.42% Params, 103.08 GMACs = 0.11% MACs, 1.22 ms = 0.24% latency, 169.58 TFLOPS)
(core_attention_flash): FlashSelfAttention(0 = 0% Params, 2.2 TMACs = 2.34% MACs, 3.16 ms = 0.63% latency, 1392.63 TFLOPS)
(dense): RowParallelLinear(16.78 M = 0.28% Params, 68.72 GMACs = 0.07% MACs, 859.26 us = 0.17% latency, 159.95 TFLOPS)
)
(post_attention_layernorm): RMSNorm(4.1 K = 0% Params, 0 MACs = 0% MACs, 329.49 us = 0.07% latency, 0 FLOPS)
(mlp): ParallelMLP(
135.27 M = 2.28% Params, 554.05 GMACs = 0.59% MACs, 7.61 ms = 1.51% latency, 145.68 TFLOPS
(dense_h_to_4h): ColumnParallelLinear(90.18 M = 1.52% Params, 369.37 GMACs = 0.39% MACs, 4 ms = 0.79% latency, 184.76 TFLOPS) cache test.sh.e1966946
(dense_4h_to_h): RowParallelLinear(45.09 M = 0.76% Params, 184.68 GMACs = 0.2% MACs, 2.32 ms = 0.46% latency, 159.5 TFLOPS)
)
)
(19): ParallelTransformerLayer(
177.22 M = 2.99% Params, 2.92 TMACs = 3.11% MACs, 15.29 ms = 3.03% latency, 382.67 TFLOPS
(input_layernorm): RMSNorm(4.1 K = 0% Params, 0 MACs = 0% MACs, 305.18 us = 0.06% latency, 0 FLOPS)
(self_attention): ParallelAttention(
41.94 M = 0.71% Params, 2.37 TMACs = 2.52% MACs, 6.9 ms = 1.37% latency, 686.83 TFLOPS
(query_key_value): ColumnParallelLinear(25.17 M = 0.42% Params, 103.08 GMACs = 0.11% MACs, 1.22 ms = 0.24% latency, 168.39 TFLOPS)
(core_attention_flash): FlashSelfAttention(0 = 0% Params, 2.2 TMACs = 2.34% MACs, 3.35 ms = 0.66% latency, 1314.43 TFLOPS)
(dense): RowParallelLinear(16.78 M = 0.28% Params, 68.72 GMACs = 0.07% MACs, 929.36 us = 0.18% latency, 147.89 TFLOPS)
)
(post_attention_layernorm): RMSNorm(4.1 K = 0% Params, 0 MACs = 0% MACs, 294.69 us = 0.06% latency, 0 FLOPS)
(mlp): ParallelMLP(
135.27 M = 2.28% Params, 554.05 GMACs = 0.59% MACs, 7.44 ms = 1.48% latency, 148.89 TFLOPS
(dense_h_to_4h): ColumnParallelLinear(90.18 M = 1.52% Params, 369.37 GMACs = 0.39% MACs, 4.03 ms = 0.8% latency, 183.5 TFLOPS) tc aGPT-7B.e863451
(dense_4h_to_h): RowParallelLinear(45.09 M = 0.76% Params, 184.68 GMACs = 0.2% MACs, 2.23 ms = 0.44% latency, 165.52 TFLOPS)
)
)
(20): ParallelTransformerLayer(
177.22 M = 2.99% Params, 2.92 TMACs = 3.11% MACs, 15.18 ms = 3.01% latency, 385.24 TFLOPS
(input_layernorm): RMSNorm(4.1 K = 0% Params, 0 MACs = 0% MACs, 348.81 us = 0.07% latency, 0 FLOPS)
(self_attention): ParallelAttention(
41.94 M = 0.71% Params, 2.37 TMACs = 2.52% MACs, 7.13 ms = 1.42% latency, 664.95 TFLOPS
(query_key_value): ColumnParallelLinear(25.17 M = 0.42% Params, 103.08 GMACs = 0.11% MACs, 1.22 ms = 0.24% latency, 169.45 TFLOPS)
(core_attention_flash): FlashSelfAttention(0 = 0% Params, 2.2 TMACs = 2.34% MACs, 3.52 ms = 0.7% latency, 1248.6 TFLOPS)
(dense): RowParallelLinear(16.78 M = 0.28% Params, 68.72 GMACs = 0.07% MACs, 863.31 us = 0.17% latency, 159.2 TFLOPS)
)
(post_attention_layernorm): RMSNorm(4.1 K = 0% Params, 0 MACs = 0% MACs, 325.92 us = 0.06% latency, 0 FLOPS)
(mlp): ParallelMLP(
135.27 M = 2.28% Params, 554.05 GMACs = 0.59% MACs, 7.03 ms = 1.4% latency, 157.53 TFLOPS
(dense_h_to_4h): ColumnParallelLinear(90.18 M = 1.52% Params, 369.37 GMACs = 0.39% MACs, 3.8 ms = 0.75% latency, 194.57 TFLOPS)05842.log
(dense_4h_to_h): RowParallelLinear(45.09 M = 0.76% Params, 184.68 GMACs = 0.2% MACs, 2.11 ms = 0.42% latency, 175.25 TFLOPS)
)
)
(21): ParallelTransformerLayer(
177.22 M = 2.99% Params, 2.92 TMACs = 3.11% MACs, 15.52 ms = 3.08% latency, 376.99 TFLOPS
(input_layernorm): RMSNorm(4.1 K = 0% Params, 0 MACs = 0% MACs, 344.99 us = 0.07% latency, 0 FLOPS)
(self_attention): ParallelAttention(
41.94 M = 0.71% Params, 2.37 TMACs = 2.52% MACs, 7.6 ms = 1.51% latency, 624.19 TFLOPS
(query_key_value): ColumnParallelLinear(25.17 M = 0.42% Params, 103.08 GMACs = 0.11% MACs, 1.26 ms = 0.25% latency, 163.09 TFLOPS)
(core_attention_flash): FlashSelfAttention(0 = 0% Params, 2.2 TMACs = 2.34% MACs, 3.78 ms = 0.75% latency, 1163.98 TFLOPS)
(dense): RowParallelLinear(16.78 M = 0.28% Params, 68.72 GMACs = 0.07% MACs, 912.43 us = 0.18% latency, 150.63 TFLOPS)
)
(post_attention_layernorm): RMSNorm(4.1 K = 0% Params, 0 MACs = 0% MACs, 296.35 us = 0.06% latency, 0 FLOPS)
(mlp): ParallelMLP(
135.27 M = 2.28% Params, 554.05 GMACs = 0.59% MACs, 6.93 ms = 1.38% latency, 159.95 TFLOPS
(dense_h_to_4h): ColumnParallelLinear(90.18 M = 1.52% Params, 369.37 GMACs = 0.39% MACs, 3.67 ms = 0.73% latency, 201.02 TFLOPS)
(dense_4h_to_h): RowParallelLinear(45.09 M = 0.76% Params, 184.68 GMACs = 0.2% MACs, 2.16 ms = 0.43% latency, 171.34 TFLOPS)
)
)
(22): ParallelTransformerLayer(
177.22 M = 2.99% Params, 2.92 TMACs = 3.11% MACs, 15.53 ms = 3.08% latency, 376.7 TFLOPS
(input_layernorm): RMSNorm(4.1 K = 0% Params, 0 MACs = 0% MACs, 304.7 us = 0.06% latency, 0 FLOPS)
(self_attention): ParallelAttention(
41.94 M = 0.71% Params, 2.37 TMACs = 2.52% MACs, 7.66 ms = 1.52% latency, 618.73 TFLOPS
(query_key_value): ColumnParallelLinear(25.17 M = 0.42% Params, 103.08 GMACs = 0.11% MACs, 1.34 ms = 0.27% latency, 153.29 TFLOPS)
(core_attention_flash): FlashSelfAttention(0 = 0% Params, 2.2 TMACs = 2.34% MACs, 3.7 ms = 0.73% latency, 1189.35 TFLOPS)
(dense): RowParallelLinear(16.78 M = 0.28% Params, 68.72 GMACs = 0.07% MACs, 910.28 us = 0.18% latency, 150.99 TFLOPS)
)
(post_attention_layernorm): RMSNorm(4.1 K = 0% Params, 0 MACs = 0% MACs, 295.16 us = 0.06% latency, 0 FLOPS)
(mlp): ParallelMLP(
135.27 M = 2.28% Params, 554.05 GMACs = 0.59% MACs, 6.92 ms = 1.37% latency, 160.1 TFLOPS
(dense_h_to_4h): ColumnParallelLinear(90.18 M = 1.52% Params, 369.37 GMACs = 0.39% MACs, 3.84 ms = 0.76% latency, 192.54 TFLOPS)
(dense_4h_to_h): RowParallelLinear(45.09 M = 0.76% Params, 184.68 GMACs = 0.2% MACs, 2 ms = 0.4% latency, 184.98 TFLOPS)
)
)
(23): ParallelTransformerLayer(
177.22 M = 2.99% Params, 2.92 TMACs = 3.11% MACs, 15.54 ms = 3.08% latency, 376.43 TFLOPS
(input_layernorm): RMSNorm(4.1 K = 0% Params, 0 MACs = 0% MACs, 342.37 us = 0.07% latency, 0 FLOPS)
(self_attention): ParallelAttention(
41.94 M = 0.71% Params, 2.37 TMACs = 2.52% MACs, 7.66 ms = 1.52% latency, 618.92 TFLOPS
(query_key_value): ColumnParallelLinear(25.17 M = 0.42% Params, 103.08 GMACs = 0.11% MACs, 1.34 ms = 0.27% latency, 154.05 TFLOPS)
(core_attention_flash): FlashSelfAttention(0 = 0% Params, 2.2 TMACs = 2.34% MACs, 3.71 ms = 0.74% latency, 1185.22 TFLOPS)
(dense): RowParallelLinear(16.78 M = 0.28% Params, 68.72 GMACs = 0.07% MACs, 877.62 us = 0.17% latency, 156.6 TFLOPS)
)
(post_attention_layernorm): RMSNorm(4.1 K = 0% Params, 0 MACs = 0% MACs, 345.47 us = 0.07% latency, 0 FLOPS)
(mlp): ParallelMLP(
135.27 M = 2.28% Params, 554.05 GMACs = 0.59% MACs, 6.84 ms = 1.36% latency, 161.98 TFLOPS
(dense_h_to_4h): ColumnParallelLinear(90.18 M = 1.52% Params, 369.37 GMACs = 0.39% MACs, 3.67 ms = 0.73% latency, 201.21 TFLOPS)
(dense_4h_to_h): RowParallelLinear(45.09 M = 0.76% Params, 184.68 GMACs = 0.2% MACs, 2.08 ms = 0.41% latency, 177.18 TFLOPS)
)
)
(24): ParallelTransformerLayer(
177.22 M = 2.99% Params, 2.92 TMACs = 3.11% MACs, 15.68 ms = 3.11% latency, 373.05 TFLOPS
(input_layernorm): RMSNorm(4.1 K = 0% Params, 0 MACs = 0% MACs, 324.25 us = 0.06% latency, 0 FLOPS)
(self_attention): ParallelAttention(
41.94 M = 0.71% Params, 2.37 TMACs = 2.52% MACs, 7.62 ms = 1.51% latency, 622.06 TFLOPS
(query_key_value): ColumnParallelLinear(25.17 M = 0.42% Params, 103.08 GMACs = 0.11% MACs, 1.4 ms = 0.28% latency, 147.68 TFLOPS)
(core_attention_flash): FlashSelfAttention(0 = 0% Params, 2.2 TMACs = 2.34% MACs, 3.66 ms = 0.73% latency, 1200.33 TFLOPS)
(dense): RowParallelLinear(16.78 M = 0.28% Params, 68.72 GMACs = 0.07% MACs, 857.35 us = 0.17% latency, 160.31 TFLOPS)
)
(post_attention_layernorm): RMSNorm(4.1 K = 0% Params, 0 MACs = 0% MACs, 275.37 us = 0.05% latency, 0 FLOPS)
(mlp): ParallelMLP(
135.27 M = 2.28% Params, 554.05 GMACs = 0.59% MACs, 7.1 ms = 1.41% latency, 156.12 TFLOPS
(dense_h_to_4h): ColumnParallelLinear(90.18 M = 1.52% Params, 369.37 GMACs = 0.39% MACs, 3.77 ms = 0.75% latency, 195.7 TFLOPS)amed 'test.sh.e2992234' -> 'pbslogs/test.sh.e2992234'
(dense_4h_to_h): RowParallelLinear(45.09 M = 0.76% Params, 184.68 GMACs = 0.2% MACs, 2.22 ms = 0.44% latency, 166.07 TFLOPS)
)
)
(25): ParallelTransformerLayer(
177.22 M = 2.99% Params, 2.92 TMACs = 3.11% MACs, 15.61 ms = 3.1% latency, 374.68 TFLOPS
(input_layernorm): RMSNorm(4.1 K = 0% Params, 0 MACs = 0% MACs, 319.72 us = 0.06% latency, 0 FLOPS)
(self_attention): ParallelAttention(
41.94 M = 0.71% Params, 2.37 TMACs = 2.52% MACs, 7.45 ms = 1.48% latency, 636.49 TFLOPS
(query_key_value): ColumnParallelLinear(25.17 M = 0.42% Params, 103.08 GMACs = 0.11% MACs, 1.37 ms = 0.27% latency, 150.7 TFLOPS)
(core_attention_flash): FlashSelfAttention(0 = 0% Params, 2.2 TMACs = 2.34% MACs, 3.55 ms = 0.7% latency, 1240.45 TFLOPS)
(dense): RowParallelLinear(16.78 M = 0.28% Params, 68.72 GMACs = 0.07% MACs, 809.91 us = 0.16% latency, 169.7 TFLOPS)
)
(post_attention_layernorm): RMSNorm(4.1 K = 0% Params, 0 MACs = 0% MACs, 297.07 us = 0.06% latency, 0 FLOPS)
(mlp): ParallelMLP(
135.27 M = 2.28% Params, 554.05 GMACs = 0.59% MACs, 7.21 ms = 1.43% latency, 153.8 TFLOPS
(dense_h_to_4h): ColumnParallelLinear(90.18 M = 1.52% Params, 369.37 GMACs = 0.39% MACs, 3.84 ms = 0.76% latency, 192.14 TFLOPS)
(dense_4h_to_h): RowParallelLinear(45.09 M = 0.76% Params, 184.68 GMACs = 0.2% MACs, 2.24 ms = 0.45% latency, 164.69 TFLOPS)
)
)
(26): ParallelTransformerLayer(
177.22 M = 2.99% Params, 2.92 TMACs = 3.11% MACs, 15.51 ms = 3.08% latency, 377.17 TFLOPS
(input_layernorm): RMSNorm(4.1 K = 0% Params, 0 MACs = 0% MACs, 347.85 us = 0.07% latency, 0 FLOPS)
(self_attention): ParallelAttention(
41.94 M = 0.71% Params, 2.37 TMACs = 2.52% MACs, 7.36 ms = 1.46% latency, 644.48 TFLOPS
(query_key_value): ColumnParallelLinear(25.17 M = 0.42% Params, 103.08 GMACs = 0.11% MACs, 1.4 ms = 0.28% latency, 147.66 TFLOPS)
(core_attention_flash): FlashSelfAttention(0 = 0% Params, 2.2 TMACs = 2.34% MACs, 3.45 ms = 0.68% latency, 1275.36 TFLOPS)
(dense): RowParallelLinear(16.78 M = 0.28% Params, 68.72 GMACs = 0.07% MACs, 828.27 us = 0.16% latency, 165.94 TFLOPS)
)
(post_attention_layernorm): RMSNorm(4.1 K = 0% Params, 0 MACs = 0% MACs, 323.53 us = 0.06% latency, 0 FLOPS)
(mlp): ParallelMLP(
135.27 M = 2.28% Params, 554.05 GMACs = 0.59% MACs, 7.13 ms = 1.41% latency, 155.5 TFLOPS
(dense_h_to_4h): ColumnParallelLinear(90.18 M = 1.52% Params, 369.37 GMACs = 0.39% MACs, 3.64 ms = 0.72% latency, 202.86 TFLOPS)
(dense_4h_to_h): RowParallelLinear(45.09 M = 0.76% Params, 184.68 GMACs = 0.2% MACs, 2.35 ms = 0.47% latency, 157.24 TFLOPS)
)
)
(27): ParallelTransformerLayer(
177.22 M = 2.99% Params, 2.92 TMACs = 3.11% MACs, 15.94 ms = 3.16% latency, 366.89 TFLOPS
(input_layernorm): RMSNorm(4.1 K = 0% Params, 0 MACs = 0% MACs, 348.57 us = 0.07% latency, 0 FLOPS)
(self_attention): ParallelAttention(
41.94 M = 0.71% Params, 2.37 TMACs = 2.52% MACs, 7.45 ms = 1.48% latency, 636.58 TFLOPS
(query_key_value): ColumnParallelLinear(25.17 M = 0.42% Params, 103.08 GMACs = 0.11% MACs, 1.49 ms = 0.3% latency, 138.48 TFLOPS)
(core_attention_flash): FlashSelfAttention(0 = 0% Params, 2.2 TMACs = 2.34% MACs, 3.28 ms = 0.65% latency, 1339.73 TFLOPS)
(dense): RowParallelLinear(16.78 M = 0.28% Params, 68.72 GMACs = 0.07% MACs, 932.22 us = 0.19% latency, 147.43 TFLOPS)
)
(post_attention_layernorm): RMSNorm(4.1 K = 0% Params, 0 MACs = 0% MACs, 293.02 us = 0.06% latency, 0 FLOPS)
(mlp): ParallelMLP(
135.27 M = 2.28% Params, 554.05 GMACs = 0.59% MACs, 7.5 ms = 1.49% latency, 147.7 TFLOPS
(dense_h_to_4h): ColumnParallelLinear(90.18 M = 1.52% Params, 369.37 GMACs = 0.39% MACs, 3.84 ms = 0.76% latency, 192.62 TFLOPS)
(dense_4h_to_h): RowParallelLinear(45.09 M = 0.76% Params, 184.68 GMACs = 0.2% MACs, 2.41 ms = 0.48% latency, 152.97 TFLOPS)
)
)
(28): ParallelTransformerLayer(
177.22 M = 2.99% Params, 2.92 TMACs = 3.11% MACs, 15.05 ms = 2.99% latency, 388.8 TFLOPS
(input_layernorm): RMSNorm(4.1 K = 0% Params, 0 MACs = 0% MACs, 416.04 us = 0.08% latency, 0 FLOPS)
(self_attention): ParallelAttention(
41.94 M = 0.71% Params, 2.37 TMACs = 2.52% MACs, 7.96 ms = 1.58% latency, 595.86 TFLOPS
(query_key_value): ColumnParallelLinear(25.17 M = 0.42% Params, 103.08 GMACs = 0.11% MACs, 1.46 ms = 0.29% latency, 140.78 TFLOPS)
(core_attention_flash): FlashSelfAttention(0 = 0% Params, 2.2 TMACs = 2.34% MACs, 3.85 ms = 0.76% latency, 1141.72 TFLOPS)
(dense): RowParallelLinear(16.78 M = 0.28% Params, 68.72 GMACs = 0.07% MACs, 862.12 us = 0.17% latency, 159.42 TFLOPS)
)
(post_attention_layernorm): RMSNorm(4.1 K = 0% Params, 0 MACs = 0% MACs, 280.38 us = 0.06% latency, 0 FLOPS)
(mlp): ParallelMLP(
135.27 M = 2.28% Params, 554.05 GMACs = 0.59% MACs, 6.07 ms = 1.2% latency, 182.69 TFLOPS
(dense_h_to_4h): ColumnParallelLinear(90.18 M = 1.52% Params, 369.37 GMACs = 0.39% MACs, 3.27 ms = 0.65% latency, 225.59 TFLOPS)
(dense_4h_to_h): RowParallelLinear(45.09 M = 0.76% Params, 184.68 GMACs = 0.2% MACs, 1.84 ms = 0.37% latency, 200.55 TFLOPS)
)
)
(29): ParallelTransformerLayer(
177.22 M = 2.99% Params, 2.92 TMACs = 3.11% MACs, 12.59 ms = 2.5% latency, 464.79 TFLOPS
(input_layernorm): RMSNorm(4.1 K = 0% Params, 0 MACs = 0% MACs, 295.4 us = 0.06% latency, 0 FLOPS)
(self_attention): ParallelAttention(
41.94 M = 0.71% Params, 2.37 TMACs = 2.52% MACs, 6.1 ms = 1.21% latency, 777.81 TFLOPS
(query_key_value): ColumnParallelLinear(25.17 M = 0.42% Params, 103.08 GMACs = 0.11% MACs, 1.1 ms = 0.22% latency, 187.12 TFLOPS)
(core_attention_flash): FlashSelfAttention(0 = 0% Params, 2.2 TMACs = 2.34% MACs, 2.83 ms = 0.56% latency, 1556.03 TFLOPS)
(dense): RowParallelLinear(16.78 M = 0.28% Params, 68.72 GMACs = 0.07% MACs, 795.6 us = 0.16% latency, 172.75 TFLOPS)
)
(post_attention_layernorm): RMSNorm(4.1 K = 0% Params, 0 MACs = 0% MACs, 236.51 us = 0.05% latency, 0 FLOPS)
(mlp): ParallelMLP(
135.27 M = 2.28% Params, 554.05 GMACs = 0.59% MACs, 5.65 ms = 1.12% latency, 196.09 TFLOPS
(dense_h_to_4h): ColumnParallelLinear(90.18 M = 1.52% Params, 369.37 GMACs = 0.39% MACs, 3.01 ms = 0.6% latency, 245.4 TFLOPS) orkspace anl_2024_12_release.tar.gz
(dense_4h_to_h): RowParallelLinear(45.09 M = 0.76% Params, 184.68 GMACs = 0.2% MACs, 1.74 ms = 0.35% latency, 212.14 TFLOPS)
)
)
(30): ParallelTransformerLayer(
177.22 M = 2.99% Params, 2.92 TMACs = 3.11% MACs, 11.73 ms = 2.33% latency, 498.68 TFLOPS
(input_layernorm): RMSNorm(4.1 K = 0% Params, 0 MACs = 0% MACs, 221.97 us = 0.04% latency, 0 FLOPS)
(self_attention): ParallelAttention(
41.94 M = 0.71% Params, 2.37 TMACs = 2.52% MACs, 5.25 ms = 1.04% latency, 903.75 TFLOPS
(query_key_value): ColumnParallelLinear(25.17 M = 0.42% Params, 103.08 GMACs = 0.11% MACs, 1.06 ms = 0.21% latency, 195.23 TFLOPS)
(core_attention_flash): FlashSelfAttention(0 = 0% Params, 2.2 TMACs = 2.34% MACs, 2.1 ms = 0.42% latency, 2094.55 TFLOPS)
(dense): RowParallelLinear(16.78 M = 0.28% Params, 68.72 GMACs = 0.07% MACs, 958.68 us = 0.19% latency, 143.36 TFLOPS)
)
(post_attention_layernorm): RMSNorm(4.1 K = 0% Params, 0 MACs = 0% MACs, 231.27 us = 0.05% latency, 0 FLOPS)
(mlp): ParallelMLP(
135.27 M = 2.28% Params, 554.05 GMACs = 0.59% MACs, 5.73 ms = 1.14% latency, 193.38 TFLOPS
(dense_h_to_4h): ColumnParallelLinear(90.18 M = 1.52% Params, 369.37 GMACs = 0.39% MACs, 3.2 ms = 0.63% latency, 230.97 TFLOPS)icromamba ipex-xpu-ops-install-2025-03-0
(dense_4h_to_h): RowParallelLinear(45.09 M = 0.76% Params, 184.68 GMACs = 0.2% MACs, 1.63 ms = 0.32% latency, 226.07 TFLOPS)
)
)
(31): ParallelTransformerLayer(
177.22 M = 2.99% Params, 2.92 TMACs = 3.11% MACs, 12.29 ms = 2.44% latency, 475.97 TFLOPS
(input_layernorm): RMSNorm(4.1 K = 0% Params, 0 MACs = 0% MACs, 215.77 us = 0.04% latency, 0 FLOPS)
(self_attention): ParallelAttention(
41.94 M = 0.71% Params, 2.37 TMACs = 2.52% MACs, 5.57 ms = 1.11% latency, 851.62 TFLOPS
(query_key_value): ColumnParallelLinear(25.17 M = 0.42% Params, 103.08 GMACs = 0.11% MACs, 1.02 ms = 0.2% latency, 202.5 TFLOPS)
(core_attention_flash): FlashSelfAttention(0 = 0% Params, 2.2 TMACs = 2.34% MACs, 2.47 ms = 0.49% latency, 1780.4 TFLOPS)
(dense): RowParallelLinear(16.78 M = 0.28% Params, 68.72 GMACs = 0.07% MACs, 897.88 us = 0.18% latency, 153.07 TFLOPS)
)
(post_attention_layernorm): RMSNorm(4.1 K = 0% Params, 0 MACs = 0% MACs, 235.8 us = 0.05% latency, 0 FLOPS)
(mlp): ParallelMLP(
135.27 M = 2.28% Params, 554.05 GMACs = 0.59% MACs, 5.94 ms = 1.18% latency, 186.66 TFLOPS
(dense_h_to_4h): ColumnParallelLinear(90.18 M = 1.52% Params, 369.37 GMACs = 0.39% MACs, 3.19 ms = 0.63% latency, 231.49 TFLOPS)
(dense_4h_to_h): RowParallelLinear(45.09 M = 0.76% Params, 184.68 GMACs = 0.2% MACs, 1.85 ms = 0.37% latency, 199.31 TFLOPS)
)
)
)
(final_layernorm): RMSNorm(4.1 K = 0% Params, 0 MACs = 0% MACs, 341.65 us = 0.07% latency, 0 FLOPS)
)
(output_layer): ColumnParallelLinear(131.07 M = 2.21% Params, 0 MACs = 0% MACs, 0 s = 0% latency, 0 FLOPS)
)
)
------------------------------------------------------------------------------
[2025-03-12 09:23:41,330] [INFO] [profiler.py:230:end_profile] Flops profiler finished
[2025-03-12 09:23:41,330] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd_microstep: 8004.76 | bwd_microstep: 18843.88 | bwd_inner_microstep: 18210.75 | bwd_allreduce_microstep: 632.82 | step_microstep: 237.44
[2025-03-12 09:23:41,330] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd: 0.00 | bwd: 0.00 | bwd_inner: 18210.81 | bwd_allreduce: 632.82 | step: 0.00
[2025-03-12 09:23:41][I][megatron/training_log:661] iteration= 3/ 1271565 | consumed_samples= 1152 | consumed_tokens= 4718592 | elapsed_time_per_iteration_ms=27563.1 | learning_rate=9.43719e-09 | global_batch_size= 384 | lm loss=11.172977 | loss_scale=1.0 | grad_norm=10.962 | actual_seqlen= 4096 | number_of_skipped_iterations= 0 | number_of_nan_iterations= 0 | samples_per_second=13.932 | tokens_per_gpu_per_second_tgs=2377.670 | [LM]TFLOPs=98.09 | [DS]TFLOPs=94.26 |
(min, max) time across ranks (ms):
forward-backward ...............................: (27285.52, 27286.41)
optimizer ......................................: (236.47, 255.65)
[2025-03-12 09:24:08,861] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | optimizer_allgather: 223.17 | optimizer_gradients: 0.56 | optimizer_step: 1.08
[2025-03-12 09:24:08,862] [INFO] [logging.py:128:log_dist] [Rank 0] step=4, skipped=0, lr=[1.2582919473247535e-08, 1.2582919473247535e-08], mom=[(0.9, 0.95), (0.9, 0.95)]
[2025-03-12 09:24:08,862] [INFO] [timer.py:264:stop] epoch=0/micro_step=4/global_step=4, RunningAvgSamplesPerSec=151.43919798600436, CurrSamplesPerSec=150.6069635989099, MemAllocated=13.82GB, MaxMemAllocated=45.85GB
[2025-03-12 09:24:08,862] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd_microstep: 7758.90 | bwd_microstep: 19429.51 | bwd_inner_microstep: 18789.32 | bwd_allreduce_microstep: 639.89 | step_microstep: 237.57
[2025-03-12 09:24:08,863] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd: 7758.93 | bwd: 19429.51 | bwd_inner: 18789.37 | bwd_allreduce: 639.89 | step: 237.57
[2025-03-12 09:24:08][I][megatron/training_log:661] iteration= 4/ 1271565 | consumed_samples= 1536 | consumed_tokens= 6291456 | elapsed_time_per_iteration_ms=27532.8 | learning_rate=1.25829e-08 | global_batch_size= 384 | lm loss=11.171650 | loss_scale=1.0 | grad_norm=10.696 | actual_seqlen= 4096 | number_of_skipped_iterations= 0 | number_of_nan_iterations= 0 | samples_per_second=13.947 | tokens_per_gpu_per_second_tgs=2380.292 | [LM]TFLOPs=98.19 | [DS]TFLOPs=94.36 |
(min, max) time across ranks (ms):
forward-backward ...............................: (27272.97, 27273.83)
optimizer ......................................: (236.54, 237.84)
[2025-03-12 09:24:36,410] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | optimizer_allgather: 222.74 | optimizer_gradients: 0.57 | optimizer_step: 1.05
[2025-03-12 09:24:36,410] [INFO] [logging.py:128:log_dist] [Rank 0] step=5, skipped=0, lr=[1.5728649341559417e-08, 1.5728649341559417e-08], mom=[(0.9, 0.95), (0.9, 0.95)]
[2025-03-12 09:24:36,411] [INFO] [timer.py:264:stop] epoch=0/micro_step=5/global_step=5, RunningAvgSamplesPerSec=151.1343465428925, CurrSamplesPerSec=150.52825193289877, MemAllocated=13.82GB, MaxMemAllocated=45.85GB
[2025-03-12 09:24:36,411] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd_microstep: 7759.95 | bwd_microstep: 19445.18 | bwd_inner_microstep: 18803.95 | bwd_allreduce_microstep: 640.93 | step_microstep: 237.32
[2025-03-12 09:24:36,411] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd: 7759.98 | bwd: 19445.17 | bwd_inner: 18804.01 | bwd_allreduce: 640.93 | step: 237.32
[2025-03-12 09:24:36][I][megatron/training_log:661] iteration= 5/ 1271565 | consumed_samples= 1920 | consumed_tokens= 7864320 | elapsed_time_per_iteration_ms=27548.1 | learning_rate=1.57286e-08 | global_batch_size= 384 | lm loss=11.170979 | loss_scale=1.0 | grad_norm=10.837 | actual_seqlen= 4096 | number_of_skipped_iterations= 0 | number_of_nan_iterations= 0 | samples_per_second=13.939 | tokens_per_gpu_per_second_tgs=2378.962 | [LM]TFLOPs=98.14 | [DS]TFLOPs=94.31 |
(min, max) time across ranks (ms):
forward-backward ...............................: (27288.93, 27289.64)
optimizer ......................................: (236.47, 237.58)
[2025-03-12 09:25:03,946] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | optimizer_allgather: 223.06 | optimizer_gradients: 0.56 | optimizer_step: 1.02
[2025-03-12 09:25:03,946] [INFO] [logging.py:128:log_dist] [Rank 0] step=6, skipped=0, lr=[1.88743792098713e-08, 1.88743792098713e-08], mom=[(0.9, 0.95), (0.9, 0.95)]
[2025-03-12 09:25:03,946] [INFO] [timer.py:264:stop] epoch=0/micro_step=6/global_step=6, RunningAvgSamplesPerSec=150.95998968613813, CurrSamplesPerSec=150.43926562465137, MemAllocated=13.82GB, MaxMemAllocated=45.85GB
[2025-03-12 09:25:03,947] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd_microstep: 7770.25 | bwd_microstep: 19422.09 | bwd_inner_microstep: 18777.17 | bwd_allreduce_microstep: 644.63 | step_microstep: 237.35
[2025-03-12 09:25:03,947] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd: 7770.28 | bwd: 19422.09 | bwd_inner: 18777.22 | bwd_allreduce: 644.63 | step: 237.35
[2025-03-12 09:25:03][I][megatron/training_log:661] iteration= 6/ 1271565 | consumed_samples= 2304 | consumed_tokens= 9437184 | elapsed_time_per_iteration_ms=27535.1 | learning_rate=1.88744e-08 | global_batch_size= 384 | lm loss=11.170456 | loss_scale=1.0 | grad_norm=10.977 | actual_seqlen= 4096 | number_of_skipped_iterations= 0 | number_of_nan_iterations= 0 | samples_per_second=13.946 | tokens_per_gpu_per_second_tgs=2380.093 | [LM]TFLOPs=98.19 | [DS]TFLOPs=94.35 |
(min, max) time across ranks (ms):
forward-backward ...............................: (27275.75, 27276.84)
optimizer ......................................: (236.54, 237.64)
[2025-03-12 09:25:31,504] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | optimizer_allgather: 223.49 | optimizer_gradients: 0.55 | optimizer_step: 1.04
[2025-03-12 09:25:31,504] [INFO] [logging.py:128:log_dist] [Rank 0] step=7, skipped=0, lr=[2.2020109078183185e-08, 2.2020109078183185e-08], mom=[(0.9, 0.95), (0.9, 0.95)]
[2025-03-12 09:25:31,505] [INFO] [timer.py:264:stop] epoch=0/micro_step=7/global_step=7, RunningAvgSamplesPerSec=150.86228890358845, CurrSamplesPerSec=150.47268817650092, MemAllocated=13.82GB, MaxMemAllocated=45.85GB
[2025-03-12 09:25:31,505] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd_microstep: 7749.64 | bwd_microstep: 19466.08 | bwd_inner_microstep: 18826.43 | bwd_allreduce_microstep: 639.35 | step_microstep: 237.81
[2025-03-12 09:25:31,505] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd: 7749.67 | bwd: 19466.08 | bwd_inner: 18826.49 | bwd_allreduce: 639.35 | step: 237.82
[2025-03-12 09:25:31][I][megatron/training_log:661] iteration= 7/ 1271565 | consumed_samples= 2688 | consumed_tokens= 11010048 | elapsed_time_per_iteration_ms=27558.0 | learning_rate=2.20201e-08 | global_batch_size= 384 | lm loss=11.170864 | loss_scale=1.0 | grad_norm=10.770 | actual_seqlen= 4096 | number_of_skipped_iterations= 0 | number_of_nan_iterations= 0 | samples_per_second=13.934 | tokens_per_gpu_per_second_tgs=2378.112 | [LM]TFLOPs=98.10 | [DS]TFLOPs=94.27 |
(min, max) time across ranks (ms):
forward-backward ...............................: (27298.44, 27299.20)
optimizer ......................................: (236.96, 238.12)
[2025-03-12 09:25:59,059] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | optimizer_allgather: 223.21 | optimizer_gradients: 0.56 | optimizer_step: 1.04
[2025-03-12 09:25:59,060] [INFO] [logging.py:128:log_dist] [Rank 0] step=8, skipped=0, lr=[2.516583894649507e-08, 2.516583894649507e-08], mom=[(0.9, 0.95), (0.9, 0.95)]
[2025-03-12 09:25:59,060] [INFO] [timer.py:264:stop] epoch=0/micro_step=8/global_step=8, RunningAvgSamplesPerSec=150.75436123929416, CurrSamplesPerSec=150.21697225860072, MemAllocated=13.82GB, MaxMemAllocated=45.85GB
[2025-03-12 09:25:59,060] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd_microstep: 7761.38 | bwd_microstep: 19449.75 | bwd_inner_microstep: 18804.92 | bwd_allreduce_microstep: 644.53 | step_microstep: 237.69
[2025-03-12 09:25:59,060] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd: 7761.40 | bwd: 19449.74 | bwd_inner: 18804.98 | bwd_allreduce: 644.53 | step: 237.69
[2025-03-12 09:25:59][I][megatron/training_log:661] iteration= 8/ 1271565 | consumed_samples= 3072 | consumed_tokens= 12582912 | elapsed_time_per_iteration_ms=27555.7 | learning_rate=2.51658e-08 | global_batch_size= 384 | lm loss=11.167645 | loss_scale=1.0 | grad_norm=10.765 | actual_seqlen= 4096 | number_of_skipped_iterations= 0 | number_of_nan_iterations= 0 | samples_per_second=13.935 | tokens_per_gpu_per_second_tgs=2378.312 | [LM]TFLOPs=98.11 | [DS]TFLOPs=94.28 |
(min, max) time across ranks (ms):
forward-backward ...............................: (27295.04, 27295.93)
optimizer ......................................: (236.43, 237.99)
[2025-03-12 09:26:26,601] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | optimizer_allgather: 223.18 | optimizer_gradients: 0.56 | optimizer_step: 1.07
[2025-03-12 09:26:26,602] [INFO] [logging.py:128:log_dist] [Rank 0] step=9, skipped=0, lr=[2.831156881480695e-08, 2.831156881480695e-08], mom=[(0.9, 0.95), (0.9, 0.95)]
[2025-03-12 09:26:26,602] [INFO] [timer.py:264:stop] epoch=0/micro_step=9/global_step=9, RunningAvgSamplesPerSec=150.74043559820174, CurrSamplesPerSec=150.65687664050782, MemAllocated=13.82GB, MaxMemAllocated=45.85GB
[2025-03-12 09:26:26,603] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd_microstep: 7760.35 | bwd_microstep: 19434.29 | bwd_inner_microstep: 18790.78 | bwd_allreduce_microstep: 643.22 | step_microstep: 237.73
[2025-03-12 09:26:26,603] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd: 7760.37 | bwd: 19434.29 | bwd_inner: 18790.83 | bwd_allreduce: 643.21 | step: 237.73
[2025-03-12 09:26:26][I][megatron/training_log:661] iteration= 9/ 1271565 | consumed_samples= 3456 | consumed_tokens= 14155776 | elapsed_time_per_iteration_ms=27541.5 | learning_rate=2.83116e-08 | global_batch_size= 384 | lm loss=11.173370 | loss_scale=1.0 | grad_norm=10.662 | actual_seqlen= 4096 | number_of_skipped_iterations= 0 | number_of_nan_iterations= 0 | samples_per_second=13.943 | tokens_per_gpu_per_second_tgs=2379.535 | [LM]TFLOPs=98.16 | [DS]TFLOPs=94.33 |
(min, max) time across ranks (ms):
forward-backward ...............................: (27281.87, 27282.46)
optimizer ......................................: (236.88, 238.02)
[2025-03-12 09:26:54,144] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | optimizer_allgather: 223.17 | optimizer_gradients: 0.57 | optimizer_step: 1.04
[2025-03-12 09:26:54,145] [INFO] [logging.py:128:log_dist] [Rank 0] step=10, skipped=0, lr=[3.1457298683118834e-08, 3.1457298683118834e-08], mom=[(0.9, 0.95), (0.9, 0.95)]
[2025-03-12 09:26:54,145] [INFO] [timer.py:264:stop] epoch=0/micro_step=10/global_step=10, RunningAvgSamplesPerSec=150.69597672127665, CurrSamplesPerSec=150.38543847751936, MemAllocated=13.82GB, MaxMemAllocated=45.85GB
[2025-03-12 09:26:54,146] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd_microstep: 7759.82 | bwd_microstep: 19440.08 | bwd_inner_microstep: 18797.15 | bwd_allreduce_microstep: 642.62 | step_microstep: 237.56
[2025-03-12 09:26:54,146] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd: 7759.84 | bwd: 19440.08 | bwd_inner: 18797.21 | bwd_allreduce: 642.62 | step: 237.56
[2025-03-12 09:26:54][I][megatron/training_log:661] iteration= 10/ 1271565 | consumed_samples= 3840 | consumed_tokens= 15728640 | elapsed_time_per_iteration_ms=27543.2 | learning_rate=3.14573e-08 | global_batch_size= 384 | lm loss=11.172956 | loss_scale=1.0 | grad_norm=10.671 | actual_seqlen= 4096 | number_of_skipped_iterations= 0 | number_of_nan_iterations= 0 | samples_per_second=13.942 | tokens_per_gpu_per_second_tgs=2379.391 | [LM]TFLOPs=98.16 | [DS]TFLOPs=94.32 |
(min, max) time across ranks (ms):
forward-backward ...............................: (27283.03, 27283.88)
optimizer ......................................: (236.72, 237.88)
[2025-03-12 09:27:21,695] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | optimizer_allgather: 222.93 | optimizer_gradients: 0.54 | optimizer_step: 1.04
[2025-03-12 09:27:21,695] [INFO] [logging.py:128:log_dist] [Rank 0] step=11, skipped=0, lr=[3.4603028551430715e-08, 3.4603028551430715e-08], mom=[(0.9, 0.95), (0.9, 0.95)]
[2025-03-12 09:27:21,696] [INFO] [timer.py:264:stop] epoch=0/micro_step=11/global_step=11, RunningAvgSamplesPerSec=150.7141854516556, CurrSamplesPerSec=150.85995459318923, MemAllocated=13.82GB, MaxMemAllocated=45.85GB
[2025-03-12 09:27:21,696] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd_microstep: 7762.67 | bwd_microstep: 19442.22 | bwd_inner_microstep: 18802.69 | bwd_allreduce_microstep: 639.24 | step_microstep: 237.45
[2025-03-12 09:27:21,696] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd: 7762.69 | bwd: 19442.22 | bwd_inner: 18802.74 | bwd_allreduce: 639.23 | step: 237.45
[2025-03-12 09:27:21][I][megatron/training_log:661] iteration= 11/ 1271565 | consumed_samples= 4224 | consumed_tokens= 17301504 | elapsed_time_per_iteration_ms=27565.3 | learning_rate=3.4603e-08 | global_batch_size= 384 | lm loss=11.170139 | loss_scale=1.0 | grad_norm=10.682 | actual_seqlen= 4096 | number_of_skipped_iterations= 0 | number_of_nan_iterations= 0 | samples_per_second=13.931 | tokens_per_gpu_per_second_tgs=2377.482 | [LM]TFLOPs=98.08 | [DS]TFLOPs=94.25 |
(min, max) time across ranks (ms):
forward-backward ...............................: (27290.26, 27291.14)
optimizer ......................................: (236.56, 237.73)
[2025-03-12 09:27:49,306] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | optimizer_allgather: 223.11 | optimizer_gradients: 0.53 | optimizer_step: 0.99
[2025-03-12 09:27:49,306] [INFO] [logging.py:128:log_dist] [Rank 0] step=12, skipped=0, lr=[3.77487584197426e-08, 3.77487584197426e-08], mom=[(0.9, 0.95), (0.9, 0.95)]
[2025-03-12 09:27:49,307] [INFO] [timer.py:264:stop] epoch=0/micro_step=12/global_step=12, RunningAvgSamplesPerSec=150.676789018258, CurrSamplesPerSec=150.34099551614207, MemAllocated=13.82GB, MaxMemAllocated=45.85GB
[2025-03-12 09:27:49,307] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd_microstep: 7761.82 | bwd_microstep: 19490.49 | bwd_inner_microstep: 18847.88 | bwd_allreduce_microstep: 642.30 | step_microstep: 237.51
[2025-03-12 09:27:49,307] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd: 7761.85 | bwd: 19490.48 | bwd_inner: 18847.94 | bwd_allreduce: 642.30 | step: 237.51
[2025-03-12 09:27:49][I][megatron/training_log:661] iteration= 12/ 1271565 | consumed_samples= 4608 | consumed_tokens= 18874368 | elapsed_time_per_iteration_ms=27595.0 | learning_rate=3.77488e-08 | global_batch_size= 384 | lm loss=11.173476 | loss_scale=1.0 | grad_norm=10.517 | actual_seqlen= 4096 | number_of_skipped_iterations= 0 | number_of_nan_iterations= 0 | samples_per_second=13.916 | tokens_per_gpu_per_second_tgs=2374.922 | [LM]TFLOPs=97.97 | [DS]TFLOPs=94.15 |
(min, max) time across ranks (ms):
forward-backward ...............................: (27335.32, 27335.85)
optimizer ......................................: (236.66, 237.80)
[2025-03-12 09:28:16,853] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | optimizer_allgather: 222.72 | optimizer_gradients: 0.55 | optimizer_step: 1.04
[2025-03-12 09:28:16,854] [INFO] [logging.py:128:log_dist] [Rank 0] step=13, skipped=0, lr=[4.0894488288054484e-08, 4.0894488288054484e-08], mom=[(0.9, 0.95), (0.9, 0.95)]
[2025-03-12 09:28:16,854] [INFO] [timer.py:264:stop] epoch=0/micro_step=13/global_step=13, RunningAvgSamplesPerSec=150.68934899689748, CurrSamplesPerSec=150.81500481310513, MemAllocated=13.82GB, MaxMemAllocated=45.85GB
[2025-03-12 09:28:16,854] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd_microstep: 7766.62 | bwd_microstep: 19433.64 | bwd_inner_microstep: 18790.03 | bwd_allreduce_microstep: 643.32 | step_microstep: 237.20
[2025-03-12 09:28:16,854] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd: 7766.64 | bwd: 19433.64 | bwd_inner: 18790.09 | bwd_allreduce: 643.31 | step: 237.20
[2025-03-12 09:28:16][I][megatron/training_log:661] iteration= 13/ 1271565 | consumed_samples= 4992 | consumed_tokens= 20447232 | elapsed_time_per_iteration_ms=27547.4 | learning_rate=4.08945e-08 | global_batch_size= 384 | lm loss=11.174082 | loss_scale=1.0 | grad_norm=10.397 | actual_seqlen= 4096 | number_of_skipped_iterations= 0 | number_of_nan_iterations= 0 | samples_per_second=13.940 | tokens_per_gpu_per_second_tgs=2379.025 | [LM]TFLOPs=98.14 | [DS]TFLOPs=94.31 |
(min, max) time across ranks (ms):
forward-backward ...............................: (27286.80, 27287.63)
optimizer ......................................: (236.32, 237.52)
[2025-03-12 09:28:44,390] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | optimizer_allgather: 223.24 | optimizer_gradients: 0.56 | optimizer_step: 1.06
[2025-03-12 09:28:44,390] [INFO] [logging.py:128:log_dist] [Rank 0] step=14, skipped=0, lr=[4.404021815636637e-08, 4.404021815636637e-08], mom=[(0.9, 0.95), (0.9, 0.95)]
[2025-03-12 09:28:44,391] [INFO] [timer.py:264:stop] epoch=0/micro_step=14/global_step=14, RunningAvgSamplesPerSec=150.6799710642357, CurrSamplesPerSec=150.57683174507028, MemAllocated=13.82GB, MaxMemAllocated=45.85GB
[2025-03-12 09:28:44,391] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd_microstep: 7778.28 | bwd_microstep: 19412.22 | bwd_inner_microstep: 18773.44 | bwd_allreduce_microstep: 638.49 | step_microstep: 237.73
[2025-03-12 09:28:44,391] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd: 7778.30 | bwd: 19412.21 | bwd_inner: 18773.50 | bwd_allreduce: 638.48 | step: 237.73
[2025-03-12 09:28:44][I][megatron/training_log:661] iteration= 14/ 1271565 | consumed_samples= 5376 | consumed_tokens= 22020096 | elapsed_time_per_iteration_ms=27536.5 | learning_rate=4.40402e-08 | global_batch_size= 384 | lm loss=11.169671 | loss_scale=1.0 | grad_norm=10.867 | actual_seqlen= 4096 | number_of_skipped_iterations= 0 | number_of_nan_iterations= 0 | samples_per_second=13.945 | tokens_per_gpu_per_second_tgs=2379.965 | [LM]TFLOPs=98.18 | [DS]TFLOPs=94.35 |
(min, max) time across ranks (ms):
forward-backward ...............................: (27275.91, 27276.62)
optimizer ......................................: (236.68, 238.04)
[2025-03-12 09:29:11,975] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | optimizer_allgather: 223.14 | optimizer_gradients: 0.57 | optimizer_step: 1.05
[2025-03-12 09:29:11,975] [INFO] [logging.py:128:log_dist] [Rank 0] step=15, skipped=0, lr=[4.718594802467826e-08, 4.718594802467826e-08], mom=[(0.9, 0.95), (0.9, 0.95)]
[2025-03-12 09:29:11,976] [INFO] [timer.py:264:stop] epoch=0/micro_step=15/global_step=15, RunningAvgSamplesPerSec=150.6481067913074, CurrSamplesPerSec=150.26672523308596, MemAllocated=13.82GB, MaxMemAllocated=45.85GB
[2025-03-12 09:29:11,976] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd_microstep: 7771.81 | bwd_microstep: 19468.89 | bwd_inner_microstep: 18826.52 | bwd_allreduce_microstep: 642.07 | step_microstep: 237.50
[2025-03-12 09:29:11,976] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd: 7771.83 | bwd: 19468.89 | bwd_inner: 18826.58 | bwd_allreduce: 642.06 | step: 237.50
[2025-03-12 09:29:11][I][megatron/training_log:661] iteration= 15/ 1271565 | consumed_samples= 5760 | consumed_tokens= 23592960 | elapsed_time_per_iteration_ms=27583.6 | learning_rate=4.71859e-08 | global_batch_size= 384 | lm loss=11.166926 | loss_scale=1.0 | grad_norm=11.188 | actual_seqlen= 4096 | number_of_skipped_iterations= 0 | number_of_nan_iterations= 0 | samples_per_second=13.921 | tokens_per_gpu_per_second_tgs=2375.900 | [LM]TFLOPs=98.01 | [DS]TFLOPs=94.19 |
(min, max) time across ranks (ms):
forward-backward ...............................: (27324.02, 27324.72)
optimizer ......................................: (236.43, 237.81)
[2025-03-12 09:29:39,564] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | optimizer_allgather: 223.35 | optimizer_gradients: 0.53 | optimizer_step: 1.00
[2025-03-12 09:29:39,565] [INFO] [logging.py:128:log_dist] [Rank 0] step=16, skipped=0, lr=[5.033167789299014e-08, 5.033167789299014e-08], mom=[(0.9, 0.95), (0.9, 0.95)]
[2025-03-12 09:29:39,565] [INFO] [timer.py:264:stop] epoch=0/micro_step=16/global_step=16, RunningAvgSamplesPerSec=150.60937559393847, CurrSamplesPerSec=150.1076176110364, MemAllocated=13.82GB, MaxMemAllocated=45.85GB
[2025-03-12 09:29:39,565] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd_microstep: 7768.12 | bwd_microstep: 19474.25 | bwd_inner_microstep: 18832.07 | bwd_allreduce_microstep: 641.89 | step_microstep: 237.89
[2025-03-12 09:29:39,565] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd: 7768.15 | bwd: 19474.25 | bwd_inner: 18832.12 | bwd_allreduce: 641.88 | step: 237.89
[2025-03-12 09:29:39][I][megatron/training_log:661] iteration= 16/ 1271565 | consumed_samples= 6144 | consumed_tokens= 25165824 | elapsed_time_per_iteration_ms=27589.6 | learning_rate=5.03317e-08 | global_batch_size= 384 | lm loss=11.168621 | loss_scale=1.0 | grad_norm=10.912 | actual_seqlen= 4096 | number_of_skipped_iterations= 0 | number_of_nan_iterations= 0 | samples_per_second=13.918 | tokens_per_gpu_per_second_tgs=2375.385 | [LM]TFLOPs=97.99 | [DS]TFLOPs=94.17 |
(min, max) time across ranks (ms):
forward-backward ...............................: (27330.04, 27330.75)
optimizer ......................................: (236.86, 238.15)
[2025-03-12 09:30:07,158] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | optimizer_allgather: 223.10 | optimizer_gradients: 0.57 | optimizer_step: 1.06
[2025-03-12 09:30:07,158] [INFO] [logging.py:128:log_dist] [Rank 0] step=17, skipped=0, lr=[5.347740776130202e-08, 5.347740776130202e-08], mom=[(0.9, 0.95), (0.9, 0.95)]
[2025-03-12 09:30:07,159] [INFO] [timer.py:264:stop] epoch=0/micro_step=17/global_step=17, RunningAvgSamplesPerSec=150.58897119256196, CurrSamplesPerSec=150.30383016000565, MemAllocated=13.82GB, MaxMemAllocated=45.85GB
[2025-03-12 09:30:07,159] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd_microstep: 7767.37 | bwd_microstep: 19483.23 | bwd_inner_microstep: 18841.54 | bwd_allreduce_microstep: 641.38 | step_microstep: 237.67
[2025-03-12 09:30:07,159] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd: 7767.39 | bwd: 19483.23 | bwd_inner: 18841.61 | bwd_allreduce: 641.37 | step: 237.67
[2025-03-12 09:30:07][I][megatron/training_log:661] iteration= 17/ 1271565 | consumed_samples= 6528 | consumed_tokens= 26738688 | elapsed_time_per_iteration_ms=27593.2 | learning_rate=5.34774e-08 | global_batch_size= 384 | lm loss=11.169818 | loss_scale=1.0 | grad_norm=10.784 | actual_seqlen= 4096 | number_of_skipped_iterations= 0 | number_of_nan_iterations= 0 | samples_per_second=13.916 | tokens_per_gpu_per_second_tgs=2375.081 | [LM]TFLOPs=97.98 | [DS]TFLOPs=94.15 |
(min, max) time across ranks (ms):
forward-backward ...............................: (27333.77, 27334.72)
optimizer ......................................: (236.76, 237.94)
[2025-03-12 09:30:34,747] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | optimizer_allgather: 223.41 | optimizer_gradients: 0.56 | optimizer_step: 1.05
[2025-03-12 09:30:34,747] [INFO] [logging.py:128:log_dist] [Rank 0] step=18, skipped=0, lr=[5.66231376296139e-08, 5.66231376296139e-08], mom=[(0.9, 0.95), (0.9, 0.95)]
[2025-03-12 09:30:34,748] [INFO] [timer.py:264:stop] epoch=0/micro_step=18/global_step=18, RunningAvgSamplesPerSec=150.56977407087533, CurrSamplesPerSec=150.28234465103105, MemAllocated=13.82GB, MaxMemAllocated=45.85GB
[2025-03-12 09:30:34,748] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd_microstep: 7754.05 | bwd_microstep: 19489.14 | bwd_inner_microstep: 18845.99 | bwd_allreduce_microstep: 642.85 | step_microstep: 237.92
[2025-03-12 09:30:34,748] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd: 7754.07 | bwd: 19489.14 | bwd_inner: 18846.05 | bwd_allreduce: 642.85 | step: 237.92
[2025-03-12 09:30:34][I][megatron/training_log:661] iteration= 18/ 1271565 | consumed_samples= 6912 | consumed_tokens= 28311552 | elapsed_time_per_iteration_ms=27588.7 | learning_rate=5.66231e-08 | global_batch_size= 384 | lm loss=11.166717 | loss_scale=1.0 | grad_norm=11.337 | actual_seqlen= 4096 | number_of_skipped_iterations= 0 | number_of_nan_iterations= 0 | samples_per_second=13.919 | tokens_per_gpu_per_second_tgs=2375.470 | [LM]TFLOPs=98.00 | [DS]TFLOPs=94.17 |
(min, max) time across ranks (ms):
forward-backward ...............................: (27329.15, 27329.85)
optimizer ......................................: (236.72, 238.20)
[2025-03-12 09:31:02,335] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | optimizer_allgather: 223.06 | optimizer_gradients: 0.54 | optimizer_step: 1.02
[2025-03-12 09:31:02,335] [INFO] [logging.py:128:log_dist] [Rank 0] step=19, skipped=0, lr=[5.976886749792578e-08, 5.976886749792578e-08], mom=[(0.9, 0.95), (0.9, 0.95)]
[2025-03-12 09:31:02,335] [INFO] [timer.py:264:stop] epoch=0/micro_step=19/global_step=19, RunningAvgSamplesPerSec=150.5800384194339, CurrSamplesPerSec=150.74439935155524, MemAllocated=13.82GB, MaxMemAllocated=45.85GB
[2025-03-12 09:31:02,336] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd_microstep: 7761.39 | bwd_microstep: 19482.84 | bwd_inner_microstep: 18840.61 | bwd_allreduce_microstep: 641.94 | step_microstep: 237.38
[2025-03-12 09:31:02,336] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd: 7761.41 | bwd: 19482.84 | bwd_inner: 18840.66 | bwd_allreduce: 641.93 | step: 237.38
[2025-03-12 09:31:02][I][megatron/training_log:661] iteration= 19/ 1271565 | consumed_samples= 7296 | consumed_tokens= 29884416 | elapsed_time_per_iteration_ms=27587.7 | learning_rate=5.97689e-08 | global_batch_size= 384 | lm loss=11.164600 | loss_scale=1.0 | grad_norm=11.097 | actual_seqlen= 4096 | number_of_skipped_iterations= 0 | number_of_nan_iterations= 0 | samples_per_second=13.919 | tokens_per_gpu_per_second_tgs=2375.552 | [LM]TFLOPs=98.00 | [DS]TFLOPs=94.17 |
(min, max) time across ranks (ms):
forward-backward ...............................: (27327.71, 27328.55)
optimizer ......................................: (236.51, 237.67)
[2025-03-12 09:31:29,923] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | optimizer_allgather: 223.25 | optimizer_gradients: 0.54 | optimizer_step: 1.02
[2025-03-12 09:31:29,923] [INFO] [logging.py:128:log_dist] [Rank 0] step=20, skipped=0, lr=[6.291459736623767e-08, 6.291459736623767e-08], mom=[(0.9, 0.95), (0.9, 0.95)]
[2025-03-12 09:31:29,924] [INFO] [timer.py:264:stop] epoch=0/micro_step=20/global_step=20, RunningAvgSamplesPerSec=150.54739404105132, CurrSamplesPerSec=149.99453863031658, MemAllocated=13.82GB, MaxMemAllocated=45.85GB
[2025-03-12 09:31:29,924] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd_microstep: 7760.33 | bwd_microstep: 19481.08 | bwd_inner_microstep: 18836.25 | bwd_allreduce_microstep: 644.53 | step_microstep: 237.80
[2025-03-12 09:31:29,924] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd: 7760.35 | bwd: 19481.08 | bwd_inner: 18836.31 | bwd_allreduce: 644.53 | step: 237.80
[2025-03-12 09:31:29][I][megatron/training_log:661] iteration= 20/ 1271565 | consumed_samples= 7680 | consumed_tokens= 31457280 | elapsed_time_per_iteration_ms=27588.0 | learning_rate=6.29146e-08 | global_batch_size= 384 | lm loss=11.165082 | loss_scale=1.0 | grad_norm=11.526 | actual_seqlen= 4096 | number_of_skipped_iterations= 0 | number_of_nan_iterations= 0 | samples_per_second=13.919 | tokens_per_gpu_per_second_tgs=2375.524 | [LM]TFLOPs=98.00 | [DS]TFLOPs=94.17 |
(min, max) time across ranks (ms):
forward-backward ...............................: (27328.02, 27328.77)
optimizer ......................................: (236.66, 238.08)
[2025-03-12 09:31:57,493] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | optimizer_allgather: 223.62 | optimizer_gradients: 0.54 | optimizer_step: 0.99
[2025-03-12 09:31:57,493] [INFO] [logging.py:128:log_dist] [Rank 0] step=21, skipped=0, lr=[6.606032723454956e-08, 6.606032723454956e-08], mom=[(0.9, 0.95), (0.9, 0.95)]
[2025-03-12 09:31:57,494] [INFO] [timer.py:264:stop] epoch=0/micro_step=21/global_step=21, RunningAvgSamplesPerSec=150.53150601956966, CurrSamplesPerSec=150.24603520544807, MemAllocated=13.82GB, MaxMemAllocated=45.85GB
[2025-03-12 09:31:57,494] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd_microstep: 7763.84 | bwd_microstep: 19463.13 | bwd_inner_microstep: 18816.92 | bwd_allreduce_microstep: 645.90 | step_microstep: 237.87
[2025-03-12 09:31:57,494] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd: 7763.86 | bwd: 19463.12 | bwd_inner: 18816.98 | bwd_allreduce: 645.90 | step: 237.87
[2025-03-12 09:31:57][I][megatron/training_log:661] iteration= 21/ 1271565 | consumed_samples= 8064 | consumed_tokens= 33030144 | elapsed_time_per_iteration_ms=27569.9 | learning_rate=6.60603e-08 | global_batch_size= 384 | lm loss=11.169722 | loss_scale=1.0 | grad_norm=10.965 | actual_seqlen= 4096 | number_of_skipped_iterations= 0 | number_of_nan_iterations= 0 | samples_per_second=13.928 | tokens_per_gpu_per_second_tgs=2377.081 | [LM]TFLOPs=98.06 | [DS]TFLOPs=94.23 |
(min, max) time across ranks (ms):
forward-backward ...............................: (27310.04, 27310.77)
optimizer ......................................: (236.83, 238.14)
[2025-03-12 09:32:25,050] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | optimizer_allgather: 223.10 | optimizer_gradients: 0.53 | optimizer_step: 1.05
[2025-03-12 09:32:25,050] [INFO] [logging.py:128:log_dist] [Rank 0] step=22, skipped=0, lr=[6.920605710286143e-08, 6.920605710286143e-08], mom=[(0.9, 0.95), (0.9, 0.95)]
[2025-03-12 09:32:25,051] [INFO] [timer.py:264:stop] epoch=0/micro_step=22/global_step=22, RunningAvgSamplesPerSec=150.53416433279884, CurrSamplesPerSec=150.5846310776518, MemAllocated=13.82GB, MaxMemAllocated=45.85GB
[2025-03-12 09:32:25,051] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd_microstep: 7767.07 | bwd_microstep: 19446.17 | bwd_inner_microstep: 18806.38 | bwd_allreduce_microstep: 639.49 | step_microstep: 237.76
[2025-03-12 09:32:25,051] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd: 7767.09 | bwd: 19446.16 | bwd_inner: 18806.44 | bwd_allreduce: 639.49 | step: 237.76
[2025-03-12 09:32:25][I][megatron/training_log:661] iteration= 22/ 1271565 | consumed_samples= 8448 | consumed_tokens= 34603008 | elapsed_time_per_iteration_ms=27556.4 | learning_rate=6.92061e-08 | global_batch_size= 384 | lm loss=11.159341 | loss_scale=1.0 | grad_norm=11.150 | actual_seqlen= 4096 | number_of_skipped_iterations= 0 | number_of_nan_iterations= 0 | samples_per_second=13.935 | tokens_per_gpu_per_second_tgs=2378.247 | [LM]TFLOPs=98.11 | [DS]TFLOPs=94.28 |
(min, max) time across ranks (ms):
forward-backward ...............................: (27296.38, 27297.25)
optimizer ......................................: (236.85, 238.05)
[2025-03-12 09:32:54,137] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | optimizer_allgather: 223.32 | optimizer_gradients: 0.55 | optimizer_step: 1.04
[2025-03-12 09:32:54,137] [INFO] [logging.py:128:log_dist] [Rank 0] step=23, skipped=0, lr=[7.235178697117333e-08, 7.235178697117333e-08], mom=[(0.9, 0.95), (0.9, 0.95)]
[2025-03-12 09:32:54,137] [INFO] [timer.py:264:stop] epoch=0/micro_step=23/global_step=23, RunningAvgSamplesPerSec=150.55238451198528, CurrSamplesPerSec=150.91765726183553, MemAllocated=13.82GB, MaxMemAllocated=45.85GB
[2025-03-12 09:32:54,138] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd_microstep: 7751.73 | bwd_microstep: 20989.64 | bwd_inner_microstep: 20349.86 | bwd_allreduce_microstep: 639.49 | step_microstep: 237.70
[2025-03-12 09:32:54,138] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd: 7751.75 | bwd: 20989.64 | bwd_inner: 20349.92 | bwd_allreduce: 639.49 | step: 237.70
[2025-03-12 09:32:54][I][megatron/training_log:661] iteration= 23/ 1271565 | consumed_samples= 8832 | consumed_tokens= 36175872 | elapsed_time_per_iteration_ms=29086.3 | learning_rate=7.23518e-08 | global_batch_size= 384 | lm loss=11.165020 | loss_scale=1.0 | grad_norm=10.614 | actual_seqlen= 4096 | number_of_skipped_iterations= 0 | number_of_nan_iterations= 0 | samples_per_second=13.202 | tokens_per_gpu_per_second_tgs=2253.157 | [LM]TFLOPs=92.95 | [DS]TFLOPs=89.32 |
(min, max) time across ranks (ms):
forward-backward ...............................: (28826.48, 28827.37)
optimizer ......................................: (236.83, 237.98)
[2025-03-12 09:33:21,697] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | optimizer_allgather: 223.10 | optimizer_gradients: 0.54 | optimizer_step: 1.04
[2025-03-12 09:33:21,698] [INFO] [logging.py:128:log_dist] [Rank 0] step=24, skipped=0, lr=[7.54975168394852e-08, 7.54975168394852e-08], mom=[(0.9, 0.95), (0.9, 0.95)]
[2025-03-12 09:33:21,698] [INFO] [timer.py:264:stop] epoch=0/micro_step=24/global_step=24, RunningAvgSamplesPerSec=150.55342149821294, CurrSamplesPerSec=150.57514246540285, MemAllocated=13.82GB, MaxMemAllocated=45.85GB
[2025-03-12 09:33:21,699] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd_microstep: 7765.60 | bwd_microstep: 19453.20 | bwd_inner_microstep: 18811.14 | bwd_allreduce_microstep: 641.76 | step_microstep: 237.55
[2025-03-12 09:33:21,699] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd: 7765.62 | bwd: 19453.19 | bwd_inner: 18811.20 | bwd_allreduce: 641.75 | step: 237.55
[2025-03-12 09:33:21][I][megatron/training_log:661] iteration= 24/ 1271565 | consumed_samples= 9216 | consumed_tokens= 37748736 | elapsed_time_per_iteration_ms=27560.8 | learning_rate=7.54975e-08 | global_batch_size= 384 | lm loss=11.163930 | loss_scale=1.0 | grad_norm=10.657 | actual_seqlen= 4096 | number_of_skipped_iterations= 0 | number_of_nan_iterations= 0 | samples_per_second=13.933 | tokens_per_gpu_per_second_tgs=2377.873 | [LM]TFLOPs=98.09 | [DS]TFLOPs=94.26 |
(min, max) time across ranks (ms):
forward-backward ...............................: (27301.38, 27302.27)
optimizer ......................................: (236.59, 237.84)
[2025-03-12 09:33:49,344] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | optimizer_allgather: 222.87 | optimizer_gradients: 0.53 | optimizer_step: 1.02
[2025-03-12 09:33:49,344] [INFO] [logging.py:128:log_dist] [Rank 0] step=25, skipped=0, lr=[7.864324670779709e-08, 7.864324670779709e-08], mom=[(0.9, 0.95), (0.9, 0.95)]
[2025-03-12 09:33:49,345] [INFO] [timer.py:264:stop] epoch=0/micro_step=25/global_step=25, RunningAvgSamplesPerSec=150.5513872935777, CurrSamplesPerSec=150.50658970475837, MemAllocated=13.82GB, MaxMemAllocated=45.85GB
[2025-03-12 09:33:49,345] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd_microstep: 7746.74 | bwd_microstep: 19555.25 | bwd_inner_microstep: 18912.86 | bwd_allreduce_microstep: 642.10 | step_microstep: 237.29
[2025-03-12 09:33:49,345] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd: 7746.77 | bwd: 19555.25 | bwd_inner: 18912.91 | bwd_allreduce: 642.10 | step: 237.29
[2025-03-12 09:33:49][I][megatron/training_log:661] iteration= 25/ 1271565 | consumed_samples= 9600 | consumed_tokens= 39321600 | elapsed_time_per_iteration_ms=27646.4 | learning_rate=7.86432e-08 | global_batch_size= 384 | lm loss=11.166022 | loss_scale=1.0 | grad_norm=12.064 | actual_seqlen= 4096 | number_of_skipped_iterations= 0 | number_of_nan_iterations= 0 | samples_per_second=13.890 | tokens_per_gpu_per_second_tgs=2370.505 | [LM]TFLOPs=97.79 | [DS]TFLOPs=93.97 |
(min, max) time across ranks (ms):
forward-backward ...............................: (27386.73, 27387.53)
optimizer ......................................: (236.45, 237.60)
[2025-03-12 09:34:16,879] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | optimizer_allgather: 223.01 | optimizer_gradients: 0.53 | optimizer_step: 1.00
[2025-03-12 09:34:16,880] [INFO] [logging.py:128:log_dist] [Rank 0] step=26, skipped=0, lr=[8.178897657610897e-08, 8.178897657610897e-08], mom=[(0.9, 0.95), (0.9, 0.95)]
[2025-03-12 09:34:16,880] [INFO] [timer.py:264:stop] epoch=0/micro_step=26/global_step=26, RunningAvgSamplesPerSec=150.55682657522908, CurrSamplesPerSec=150.68197949258283, MemAllocated=13.82GB, MaxMemAllocated=45.85GB
[2025-03-12 09:34:16,880] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd_microstep: 7769.19 | bwd_microstep: 19422.44 | bwd_inner_microstep: 18781.92 | bwd_allreduce_microstep: 640.22 | step_microstep: 237.35
[2025-03-12 09:34:16,880] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd: 7769.21 | bwd: 19422.43 | bwd_inner: 18781.97 | bwd_allreduce: 640.21 | step: 237.35
[2025-03-12 09:34:16][I][megatron/training_log:661] iteration= 26/ 1271565 | consumed_samples= 9984 | consumed_tokens= 40894464 | elapsed_time_per_iteration_ms=27534.8 | learning_rate=8.1789e-08 | global_batch_size= 384 | lm loss=11.161508 | loss_scale=1.0 | grad_norm=10.619 | actual_seqlen= 4096 | number_of_skipped_iterations= 0 | number_of_nan_iterations= 0 | samples_per_second=13.946 | tokens_per_gpu_per_second_tgs=2380.116 | [LM]TFLOPs=98.19 | [DS]TFLOPs=94.35 |
(min, max) time across ranks (ms):
forward-backward ...............................: (27275.39, 27276.09)
optimizer ......................................: (236.12, 237.61)
[2025-03-12 09:34:44,409] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | optimizer_allgather: 222.64 | optimizer_gradients: 0.55 | optimizer_step: 1.04
[2025-03-12 09:34:44,410] [INFO] [logging.py:128:log_dist] [Rank 0] step=27, skipped=0, lr=[8.493470644442084e-08, 8.493470644442084e-08], mom=[(0.9, 0.95), (0.9, 0.95)]
[2025-03-12 09:34:44,410] [INFO] [timer.py:264:stop] epoch=0/micro_step=27/global_step=27, RunningAvgSamplesPerSec=150.56386797728126, CurrSamplesPerSec=150.73300027261766, MemAllocated=13.82GB, MaxMemAllocated=45.85GB
[2025-03-12 09:34:44,410] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd_microstep: 7759.57 | bwd_microstep: 19426.60 | bwd_inner_microstep: 18783.88 | bwd_allreduce_microstep: 642.43 | step_microstep: 236.83
[2025-03-12 09:34:44,410] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd: 7759.59 | bwd: 19426.59 | bwd_inner: 18783.93 | bwd_allreduce: 642.43 | step: 236.83
[2025-03-12 09:34:44][I][megatron/training_log:661] iteration= 27/ 1271565 | consumed_samples= 10368 | consumed_tokens= 42467328 | elapsed_time_per_iteration_ms=27529.4 | learning_rate=8.49347e-08 | global_batch_size= 384 | lm loss=11.158463 | loss_scale=1.0 | grad_norm=11.492 | actual_seqlen= 4096 | number_of_skipped_iterations= 0 | number_of_nan_iterations= 0 | samples_per_second=13.949 | tokens_per_gpu_per_second_tgs=2380.586 | [LM]TFLOPs=98.21 | [DS]TFLOPs=94.37 |
(min, max) time across ranks (ms):
forward-backward ...............................: (27271.02, 27271.82)
optimizer ......................................: (236.02, 237.10)
[2025-03-12 09:35:11,920] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | optimizer_allgather: 223.09 | optimizer_gradients: 0.53 | optimizer_step: 1.01
[2025-03-12 09:35:11,920] [INFO] [logging.py:128:log_dist] [Rank 0] step=28, skipped=0, lr=[8.808043631273274e-08, 8.808043631273274e-08], mom=[(0.9, 0.95), (0.9, 0.95)]
[2025-03-12 09:35:11,921] [INFO] [timer.py:264:stop] epoch=0/micro_step=28/global_step=28, RunningAvgSamplesPerSec=150.57626867573913, CurrSamplesPerSec=150.88689209106104, MemAllocated=13.82GB, MaxMemAllocated=45.85GB
[2025-03-12 09:35:11,921] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd_microstep: 7769.62 | bwd_microstep: 19398.55 | bwd_inner_microstep: 18760.25 | bwd_allreduce_microstep: 638.01 | step_microstep: 237.54
[2025-03-12 09:35:11,921] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd: 7769.64 | bwd: 19398.55 | bwd_inner: 18760.30 | bwd_allreduce: 638.00 | step: 237.54
[2025-03-12 09:35:11][I][megatron/training_log:661] iteration= 28/ 1271565 | consumed_samples= 10752 | consumed_tokens= 44040192 | elapsed_time_per_iteration_ms=27510.1 | learning_rate=8.80804e-08 | global_batch_size= 384 | lm loss=11.159657 | loss_scale=1.0 | grad_norm=10.657 | actual_seqlen= 4096 | number_of_skipped_iterations= 0 | number_of_nan_iterations= 0 | samples_per_second=13.959 | tokens_per_gpu_per_second_tgs=2382.251 | [LM]TFLOPs=98.28 | [DS]TFLOPs=94.44 |
(min, max) time across ranks (ms):
forward-backward ...............................: (27250.35, 27251.12)
optimizer ......................................: (236.62, 237.80)
[2025-03-12 09:35:39,490] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | optimizer_allgather: 223.00 | optimizer_gradients: 0.53 | optimizer_step: 1.01
[2025-03-12 09:35:39,491] [INFO] [logging.py:128:log_dist] [Rank 0] step=29, skipped=0, lr=[9.122616618104463e-08, 9.122616618104463e-08], mom=[(0.9, 0.95), (0.9, 0.95)]
[2025-03-12 09:35:39,491] [INFO] [timer.py:264:stop] epoch=0/micro_step=29/global_step=29, RunningAvgSamplesPerSec=150.57088463013994, CurrSamplesPerSec=150.43097553247807, MemAllocated=13.82GB, MaxMemAllocated=45.85GB
[2025-03-12 09:35:39,491] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd_microstep: 7760.55 | bwd_microstep: 19464.20 | bwd_inner_microstep: 18818.97 | bwd_allreduce_microstep: 644.93 | step_microstep: 237.36
[2025-03-12 09:35:39,492] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd: 7760.58 | bwd: 19464.20 | bwd_inner: 18819.03 | bwd_allreduce: 644.93 | step: 237.36
[2025-03-12 09:35:39][I][megatron/training_log:661] iteration= 29/ 1271565 | consumed_samples= 11136 | consumed_tokens= 45613056 | elapsed_time_per_iteration_ms=27570.7 | learning_rate=9.12262e-08 | global_batch_size= 384 | lm loss=11.151192 | loss_scale=1.0 | grad_norm=11.017 | actual_seqlen= 4096 | number_of_skipped_iterations= 0 | number_of_nan_iterations= 0 | samples_per_second=13.928 | tokens_per_gpu_per_second_tgs=2377.019 | [LM]TFLOPs=98.06 | [DS]TFLOPs=94.23 |
(min, max) time across ranks (ms):
forward-backward ...............................: (27311.48, 27312.39)
optimizer ......................................: (236.29, 237.66)
[2025-03-12 09:36:07,065] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | optimizer_allgather: 223.17 | optimizer_gradients: 0.56 | optimizer_step: 1.03
[2025-03-12 09:36:07,066] [INFO] [logging.py:128:log_dist] [Rank 0] step=30, skipped=0, lr=[9.437189604935652e-08, 9.437189604935652e-08], mom=[(0.9, 0.95), (0.9, 0.95)]
[2025-03-12 09:36:07,066] [INFO] [timer.py:264:stop] epoch=0/micro_step=30/global_step=30, RunningAvgSamplesPerSec=150.55450385621103, CurrSamplesPerSec=150.11350758631528, MemAllocated=13.82GB, MaxMemAllocated=45.85GB
[2025-03-12 09:36:07,066] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd_microstep: 7758.68 | bwd_microstep: 19473.38 | bwd_inner_microstep: 18831.63 | bwd_allreduce_microstep: 641.45 | step_microstep: 237.62
[2025-03-12 09:36:07,066] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd: 7758.71 | bwd: 19473.38 | bwd_inner: 18831.69 | bwd_allreduce: 641.44 | step: 237.63
[2025-03-12 09:36:07][I][megatron/training_log:661] iteration= 30/ 1271565 | consumed_samples= 11520 | consumed_tokens= 47185920 | elapsed_time_per_iteration_ms=27574.8 | learning_rate=9.43719e-08 | global_batch_size= 384 | lm loss=11.150348 | loss_scale=1.0 | grad_norm=10.968 | actual_seqlen= 4096 | number_of_skipped_iterations= 0 | number_of_nan_iterations= 0 | samples_per_second=13.926 | tokens_per_gpu_per_second_tgs=2376.663 | [LM]TFLOPs=98.04 | [DS]TFLOPs=94.22 |
(min, max) time across ranks (ms):
forward-backward ...............................: (27315.36, 27316.23)
optimizer ......................................: (236.50, 237.91)
[2025-03-12 09:36:34,618] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | optimizer_allgather: 222.93 | optimizer_gradients: 0.52 | optimizer_step: 1.00
[2025-03-12 09:36:34,619] [INFO] [logging.py:128:log_dist] [Rank 0] step=31, skipped=0, lr=[9.751762591766839e-08, 9.751762591766839e-08], mom=[(0.9, 0.95), (0.9, 0.95)]
[2025-03-12 09:36:34,619] [INFO] [timer.py:264:stop] epoch=0/micro_step=31/global_step=31, RunningAvgSamplesPerSec=150.56813897779932, CurrSamplesPerSec=150.95086831394593, MemAllocated=13.82GB, MaxMemAllocated=45.85GB
[2025-03-12 09:36:34,619] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd_microstep: 7766.39 | bwd_microstep: 19441.07 | bwd_inner_microstep: 18804.55 | bwd_allreduce_microstep: 636.22 | step_microstep: 237.30
[2025-03-12 09:36:34,619] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd: 7766.41 | bwd: 19441.07 | bwd_inner: 18804.61 | bwd_allreduce: 636.22 | step: 237.30
[2025-03-12 09:36:34][I][megatron/training_log:661] iteration= 31/ 1271565 | consumed_samples= 11904 | consumed_tokens= 48758784 | elapsed_time_per_iteration_ms=27552.4 | learning_rate=9.75176e-08 | global_batch_size= 384 | lm loss=11.149038 | loss_scale=1.0 | grad_norm=10.982 | actual_seqlen= 4096 | number_of_skipped_iterations= 0 | number_of_nan_iterations= 0 | samples_per_second=13.937 | tokens_per_gpu_per_second_tgs=2378.593 | [LM]TFLOPs=98.12 | [DS]TFLOPs=94.29 |
(min, max) time across ranks (ms):
forward-backward ...............................: (27293.34, 27294.19)
optimizer ......................................: (236.49, 237.58)
[2025-03-12 09:37:02,122] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | optimizer_allgather: 222.68 | optimizer_gradients: 0.55 | optimizer_step: 1.06
[2025-03-12 09:37:02,123] [INFO] [logging.py:128:log_dist] [Rank 0] step=32, skipped=0, lr=[1.0066335578598028e-07, 1.0066335578598028e-07], mom=[(0.9, 0.95), (0.9, 0.95)]
[2025-03-12 09:37:02,123] [INFO] [timer.py:264:stop] epoch=0/micro_step=32/global_step=32, RunningAvgSamplesPerSec=150.5804191636691, CurrSamplesPerSec=150.93735864820079, MemAllocated=13.82GB, MaxMemAllocated=45.85GB
[2025-03-12 09:37:02,123] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd_microstep: 7759.48 | bwd_microstep: 19401.22 | bwd_inner_microstep: 18762.84 | bwd_allreduce_microstep: 638.09 | step_microstep: 237.15
[2025-03-12 09:37:02,123] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd: 7759.50 | bwd: 19401.22 | bwd_inner: 18762.90 | bwd_allreduce: 638.08 | step: 237.15
[2025-03-12 09:37:02][I][megatron/training_log:661] iteration= 32/ 1271565 | consumed_samples= 12288 | consumed_tokens= 50331648 | elapsed_time_per_iteration_ms=27503.5 | learning_rate=1.00663e-07 | global_batch_size= 384 | lm loss=11.146956 | loss_scale=1.0 | grad_norm=10.936 | actual_seqlen= 4096 | number_of_skipped_iterations= 0 | number_of_nan_iterations= 0 | samples_per_second=13.962 | tokens_per_gpu_per_second_tgs=2382.821 | [LM]TFLOPs=98.30 | [DS]TFLOPs=94.46 |
(min, max) time across ranks (ms):
forward-backward ...............................: (27243.61, 27244.42)
optimizer ......................................: (236.08, 237.44)
[2025-03-12 09:37:29,631] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | optimizer_allgather: 223.02 | optimizer_gradients: 0.56 | optimizer_step: 1.06
[2025-03-12 09:37:29,631] [INFO] [logging.py:128:log_dist] [Rank 0] step=33, skipped=0, lr=[1.0380908565429217e-07, 1.0380908565429217e-07], mom=[(0.9, 0.95), (0.9, 0.95)]
[2025-03-12 09:37:29,632] [INFO] [timer.py:264:stop] epoch=0/micro_step=33/global_step=33, RunningAvgSamplesPerSec=150.5951183632239, CurrSamplesPerSec=151.0373733107677, MemAllocated=13.82GB, MaxMemAllocated=45.85GB
[2025-03-12 09:37:29,632] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd_microstep: 7769.67 | bwd_microstep: 19394.21 | bwd_inner_microstep: 18758.87 | bwd_allreduce_microstep: 635.06 | step_microstep: 237.55
[2025-03-12 09:37:29,632] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd: 7769.69 | bwd: 19394.21 | bwd_inner: 18758.92 | bwd_allreduce: 635.06 | step: 237.55
[2025-03-12 09:37:29][I][megatron/training_log:661] iteration= 33/ 1271565 | consumed_samples= 12672 | consumed_tokens= 51904512 | elapsed_time_per_iteration_ms=27508.8 | learning_rate=1.03809e-07 | global_batch_size= 384 | lm loss=11.143302 | loss_scale=1.0 | grad_norm=10.686 | actual_seqlen= 4096 | number_of_skipped_iterations= 0 | number_of_nan_iterations= 0 | samples_per_second=13.959 | tokens_per_gpu_per_second_tgs=2382.365 | [LM]TFLOPs=98.28 | [DS]TFLOPs=94.44 |
(min, max) time across ranks (ms):
forward-backward ...............................: (27248.63, 27249.49)
optimizer ......................................: (236.31, 237.86)
[2025-03-12 09:37:57,143] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | optimizer_allgather: 220.05 | optimizer_gradients: 0.57 | optimizer_step: 1.04
[2025-03-12 09:37:57,143] [INFO] [logging.py:128:log_dist] [Rank 0] step=34, skipped=0, lr=[1.0695481552260404e-07, 1.0695481552260404e-07], mom=[(0.9, 0.95), (0.9, 0.95)]
[2025-03-12 09:37:57,144] [INFO] [timer.py:264:stop] epoch=0/micro_step=34/global_step=34, RunningAvgSamplesPerSec=150.6027666009015, CurrSamplesPerSec=150.84018864625585, MemAllocated=13.82GB, MaxMemAllocated=45.85GB
[2025-03-12 09:37:57,144] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd_microstep: 7766.24 | bwd_microstep: 19404.88 | bwd_inner_microstep: 18762.95 | bwd_allreduce_microstep: 641.63 | step_microstep: 234.39
[2025-03-12 09:37:57,144] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd: 7766.26 | bwd: 19404.88 | bwd_inner: 18763.01 | bwd_allreduce: 641.62 | step: 234.39
[2025-03-12 09:37:57][I][megatron/training_log:661] iteration= 34/ 1271565 | consumed_samples= 13056 | consumed_tokens= 53477376 | elapsed_time_per_iteration_ms=27511.3 | learning_rate=1.06955e-07 | global_batch_size= 384 | lm loss=11.142344 | loss_scale=1.0 | grad_norm=11.046 | actual_seqlen= 4096 | number_of_skipped_iterations= 0 | number_of_nan_iterations= 0 | samples_per_second=13.958 | tokens_per_gpu_per_second_tgs=2382.147 | [LM]TFLOPs=98.27 | [DS]TFLOPs=94.43 |
(min, max) time across ranks (ms):
forward-backward ...............................: (27254.80, 27255.69)
optimizer ......................................: (233.51, 234.69)
[2025-03-12 09:38:24,648] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | optimizer_allgather: 222.91 | optimizer_gradients: 0.53 | optimizer_step: 1.02
[2025-03-12 09:38:24,649] [INFO] [logging.py:128:log_dist] [Rank 0] step=35, skipped=0, lr=[1.1010054539091593e-07, 1.1010054539091593e-07], mom=[(0.9, 0.95), (0.9, 0.95)]
[2025-03-12 09:38:24,649] [INFO] [timer.py:264:stop] epoch=0/micro_step=35/global_step=35, RunningAvgSamplesPerSec=150.61823196981342, CurrSamplesPerSec=151.11474690757643, MemAllocated=13.82GB, MaxMemAllocated=45.85GB
[2025-03-12 09:38:24,649] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd_microstep: 7769.37 | bwd_microstep: 19387.54 | bwd_inner_microstep: 18753.48 | bwd_allreduce_microstep: 633.77 | step_microstep: 237.49
[2025-03-12 09:38:24,649] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd: 7769.39 | bwd: 19387.54 | bwd_inner: 18753.53 | bwd_allreduce: 633.77 | step: 237.49
[2025-03-12 09:38:24][I][megatron/training_log:661] iteration= 35/ 1271565 | consumed_samples= 13440 | consumed_tokens= 55050240 | elapsed_time_per_iteration_ms=27505.0 | learning_rate=1.10101e-07 | global_batch_size= 384 | lm loss=11.141614 | loss_scale=1.0 | grad_norm=11.859 | actual_seqlen= 4096 | number_of_skipped_iterations= 0 | number_of_nan_iterations= 0 | samples_per_second=13.961 | tokens_per_gpu_per_second_tgs=2382.697 | [LM]TFLOPs=98.29 | [DS]TFLOPs=94.46 |
(min, max) time across ranks (ms):
forward-backward ...............................: (27244.77, 27245.73)
optimizer ......................................: (236.63, 237.77)
[2025-03-12 09:38:52,136] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | optimizer_allgather: 223.04 | optimizer_gradients: 0.56 | optimizer_step: 1.09
[2025-03-12 09:38:52,136] [INFO] [logging.py:128:log_dist] [Rank 0] step=36, skipped=0, lr=[1.132462752592278e-07, 1.132462752592278e-07], mom=[(0.9, 0.95), (0.9, 0.95)]
[2025-03-12 09:38:52,137] [INFO] [timer.py:264:stop] epoch=0/micro_step=36/global_step=36, RunningAvgSamplesPerSec=150.63801235253072, CurrSamplesPerSec=151.29363269838998, MemAllocated=13.82GB, MaxMemAllocated=45.85GB
[2025-03-12 09:38:52,137] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd_microstep: 7765.64 | bwd_microstep: 19377.51 | bwd_inner_microstep: 18741.31 | bwd_allreduce_microstep: 635.90 | step_microstep: 237.72
[2025-03-12 09:38:52,137] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd: 7765.66 | bwd: 19377.51 | bwd_inner: 18741.36 | bwd_allreduce: 635.90 | step: 237.72
[2025-03-12 09:38:52][I][megatron/training_log:661] iteration= 36/ 1271565 | consumed_samples= 13824 | consumed_tokens= 56623104 | elapsed_time_per_iteration_ms=27487.5 | learning_rate=1.13246e-07 | global_batch_size= 384 | lm loss=11.140266 | loss_scale=1.0 | grad_norm=10.772 | actual_seqlen= 4096 | number_of_skipped_iterations= 0 | number_of_nan_iterations= 0 | samples_per_second=13.970 | tokens_per_gpu_per_second_tgs=2384.208 | [LM]TFLOPs=98.36 | [DS]TFLOPs=94.52 |
(min, max) time across ranks (ms):
forward-backward ...............................: (27227.59, 27228.51)
optimizer ......................................: (236.76, 238.00)
[2025-03-12 09:39:19,642] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | optimizer_allgather: 222.83 | optimizer_gradients: 0.56 | optimizer_step: 1.05
[2025-03-12 09:39:19,642] [INFO] [logging.py:128:log_dist] [Rank 0] step=37, skipped=0, lr=[1.1639200512753969e-07, 1.1639200512753969e-07], mom=[(0.9, 0.95), (0.9, 0.95)]
[2025-03-12 09:39:19,643] [INFO] [timer.py:264:stop] epoch=0/micro_step=37/global_step=37, RunningAvgSamplesPerSec=150.63988297133875, CurrSamplesPerSec=150.70345252071348, MemAllocated=13.82GB, MaxMemAllocated=45.85GB
[2025-03-12 09:39:19,643] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd_microstep: 7754.35 | bwd_microstep: 19404.82 | bwd_inner_microstep: 18763.16 | bwd_allreduce_microstep: 641.36 | step_microstep: 237.32
[2025-03-12 09:39:19,643] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd: 7754.37 | bwd: 19404.82 | bwd_inner: 18763.21 | bwd_allreduce: 641.36 | step: 237.32
[2025-03-12 09:39:19][I][megatron/training_log:661] iteration= 37/ 1271565 | consumed_samples= 14208 | consumed_tokens= 58195968 | elapsed_time_per_iteration_ms=27506.0 | learning_rate=1.16392e-07 | global_batch_size= 384 | lm loss=11.133783 | loss_scale=1.0 | grad_norm=10.816 | actual_seqlen= 4096 | number_of_skipped_iterations= 0 | number_of_nan_iterations= 0 | samples_per_second=13.961 | tokens_per_gpu_per_second_tgs=2382.610 | [LM]TFLOPs=98.29 | [DS]TFLOPs=94.45 |
(min, max) time across ranks (ms):
forward-backward ...............................: (27245.36, 27246.19)
optimizer ......................................: (236.49, 237.61)
[2025-03-12 09:39:47,196] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | optimizer_allgather: 222.76 | optimizer_gradients: 0.55 | optimizer_step: 1.04
[2025-03-12 09:39:47,196] [INFO] [logging.py:128:log_dist] [Rank 0] step=38, skipped=0, lr=[1.1953773499585156e-07, 1.1953773499585156e-07], mom=[(0.9, 0.95), (0.9, 0.95)]
[2025-03-12 09:39:47,197] [INFO] [timer.py:264:stop] epoch=0/micro_step=38/global_step=38, RunningAvgSamplesPerSec=150.6428305064565, CurrSamplesPerSec=150.74600777623198, MemAllocated=13.82GB, MaxMemAllocated=45.85GB
[2025-03-12 09:39:47,197] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd_microstep: 7775.64 | bwd_microstep: 19434.79 | bwd_inner_microstep: 18794.63 | bwd_allreduce_microstep: 639.86 | step_microstep: 237.35
[2025-03-12 09:39:47,197] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd: 7775.66 | bwd: 19434.78 | bwd_inner: 18794.68 | bwd_allreduce: 639.86 | step: 237.35
[2025-03-12 09:39:47][I][megatron/training_log:661] iteration= 38/ 1271565 | consumed_samples= 14592 | consumed_tokens= 59768832 | elapsed_time_per_iteration_ms=27553.8 | learning_rate=1.19538e-07 | global_batch_size= 384 | lm loss=11.129514 | loss_scale=1.0 | grad_norm=10.590 | actual_seqlen= 4096 | number_of_skipped_iterations= 0 | number_of_nan_iterations= 0 | samples_per_second=13.936 | tokens_per_gpu_per_second_tgs=2378.474 | [LM]TFLOPs=98.12 | [DS]TFLOPs=94.29 |
(min, max) time across ranks (ms):
forward-backward ...............................: (27293.95, 27294.88)
optimizer ......................................: (236.41, 237.63)
[2025-03-12 09:40:18,214] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | optimizer_allgather: 222.99 | optimizer_gradients: 0.54 | optimizer_step: 1.02
[2025-03-12 09:40:18,214] [INFO] [logging.py:128:log_dist] [Rank 0] step=39, skipped=0, lr=[1.2268346486416345e-07, 1.2268346486416345e-07], mom=[(0.9, 0.95), (0.9, 0.95)]
[2025-03-12 09:40:18,215] [INFO] [timer.py:264:stop] epoch=0/micro_step=39/global_step=39, RunningAvgSamplesPerSec=150.65532960496606, CurrSamplesPerSec=151.10662320823855, MemAllocated=13.82GB, MaxMemAllocated=45.85GB
[2025-03-12 09:40:18,215] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd_microstep: 7785.34 | bwd_microstep: 22885.13 | bwd_inner_microstep: 22249.75 | bwd_allreduce_microstep: 635.08 | step_microstep: 237.48
[2025-03-12 09:40:18,215] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd: 7785.36 | bwd: 22885.12 | bwd_inner: 22249.80 | bwd_allreduce: 635.08 | step: 237.48
[2025-03-12 09:40:18][I][megatron/training_log:661] iteration= 39/ 1271565 | consumed_samples= 14976 | consumed_tokens= 61341696 | elapsed_time_per_iteration_ms=31017.4 | learning_rate=1.22683e-07 | global_batch_size= 384 | lm loss=11.113354 | loss_scale=1.0 | grad_norm=11.221 | actual_seqlen= 4096 | number_of_skipped_iterations= 0 | number_of_nan_iterations= 0 | samples_per_second=12.380 | tokens_per_gpu_per_second_tgs=2112.881 | [LM]TFLOPs=87.16 | [DS]TFLOPs=83.76 |
(min, max) time across ranks (ms):
forward-backward ...............................: (30757.46, 30758.31)
optimizer ......................................: (236.27, 237.76)
[2025-03-12 09:40:45,794] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | optimizer_allgather: 222.58 | optimizer_gradients: 0.57 | optimizer_step: 1.10
[2025-03-12 09:40:45,795] [INFO] [logging.py:128:log_dist] [Rank 0] step=40, skipped=0, lr=[1.2582919473247534e-07, 1.2582919473247534e-07], mom=[(0.9, 0.95), (0.9, 0.95)]
[2025-03-12 09:40:45,795] [INFO] [timer.py:264:stop] epoch=0/micro_step=40/global_step=40, RunningAvgSamplesPerSec=150.6476453041463, CurrSamplesPerSec=150.36381733094183, MemAllocated=13.82GB, MaxMemAllocated=45.85GB
[2025-03-12 09:40:45,795] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd_microstep: 7772.24 | bwd_microstep: 19444.79 | bwd_inner_microstep: 18805.02 | bwd_allreduce_microstep: 639.47 | step_microstep: 237.05
[2025-03-12 09:40:45,796] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd: 7772.26 | bwd: 19444.78 | bwd_inner: 18805.08 | bwd_allreduce: 639.46 | step: 237.05
[2025-03-12 09:40:45][I][megatron/training_log:661] iteration= 40/ 1271565 | consumed_samples= 15360 | consumed_tokens= 62914560 | elapsed_time_per_iteration_ms=27580.3 | learning_rate=1.25829e-07 | global_batch_size= 384 | lm loss=11.107403 | loss_scale=1.0 | grad_norm=11.039 | actual_seqlen= 4096 | number_of_skipped_iterations= 0 | number_of_nan_iterations= 0 | samples_per_second=13.923 | tokens_per_gpu_per_second_tgs=2376.188 | [LM]TFLOPs=98.03 | [DS]TFLOPs=94.20 |
(min, max) time across ranks (ms):
forward-backward ...............................: (27320.59, 27321.29)
optimizer ......................................: (236.20, 237.34)
[2025-03-12 09:41:13,375] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | optimizer_allgather: 222.55 | optimizer_gradients: 0.56 | optimizer_step: 1.04
[2025-03-12 09:41:13,376] [INFO] [logging.py:128:log_dist] [Rank 0] step=41, skipped=0, lr=[1.2897492460078723e-07, 1.2897492460078723e-07], mom=[(0.9, 0.95), (0.9, 0.95)]
[2025-03-12 09:41:13,376] [INFO] [timer.py:264:stop] epoch=0/micro_step=41/global_step=41, RunningAvgSamplesPerSec=150.64290998760035, CurrSamplesPerSec=150.46312932855187, MemAllocated=13.82GB, MaxMemAllocated=45.85GB
[2025-03-12 09:41:13,376] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd_microstep: 7763.44 | bwd_microstep: 19475.17 | bwd_inner_microstep: 18835.26 | bwd_allreduce_microstep: 639.61 | step_microstep: 236.88
[2025-03-12 09:41:13,376] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd: 7763.47 | bwd: 19475.17 | bwd_inner: 18835.32 | bwd_allreduce: 639.60 | step: 236.88
[2025-03-12 09:41:13][I][megatron/training_log:661] iteration= 41/ 1271565 | consumed_samples= 15744 | consumed_tokens= 64487424 | elapsed_time_per_iteration_ms=27580.4 | learning_rate=1.28975e-07 | global_batch_size= 384 | lm loss=11.099026 | loss_scale=1.0 | grad_norm=10.892 | actual_seqlen= 4096 | number_of_skipped_iterations= 0 | number_of_nan_iterations= 0 | samples_per_second=13.923 | tokens_per_gpu_per_second_tgs=2376.183 | [LM]TFLOPs=98.02 | [DS]TFLOPs=94.20 |
(min, max) time across ranks (ms):
forward-backward ...............................: (27321.46, 27322.51)
optimizer ......................................: (236.00, 237.15)
[2025-03-12 09:41:40,895] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | optimizer_allgather: 222.66 | optimizer_gradients: 0.52 | optimizer_step: 0.99
[2025-03-12 09:41:40,896] [INFO] [logging.py:128:log_dist] [Rank 0] step=42, skipped=0, lr=[1.321206544690991e-07, 1.321206544690991e-07], mom=[(0.9, 0.95), (0.9, 0.95)]
[2025-03-12 09:41:40,896] [INFO] [timer.py:264:stop] epoch=0/micro_step=42/global_step=42, RunningAvgSamplesPerSec=150.64652635998266, CurrSamplesPerSec=150.78764123135574, MemAllocated=13.82GB, MaxMemAllocated=45.85GB
[2025-03-12 09:41:40,896] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd_microstep: 7757.05 | bwd_microstep: 19396.21 | bwd_inner_microstep: 18756.71 | bwd_allreduce_microstep: 639.21 | step_microstep: 236.85
[2025-03-12 09:41:40,896] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd: 7757.07 | bwd: 19396.21 | bwd_inner: 18756.76 | bwd_allreduce: 639.21 | step: 236.85
[2025-03-12 09:41:40][I][megatron/training_log:661] iteration= 42/ 1271565 | consumed_samples= 16128 | consumed_tokens= 66060288 | elapsed_time_per_iteration_ms=27519.3 | learning_rate=1.32121e-07 | global_batch_size= 384 | lm loss=11.096643 | loss_scale=1.0 | grad_norm=10.890 | actual_seqlen= 4096 | number_of_skipped_iterations= 0 | number_of_nan_iterations= 0 | samples_per_second=13.954 | tokens_per_gpu_per_second_tgs=2381.455 | [LM]TFLOPs=98.24 | [DS]TFLOPs=94.41 |
(min, max) time across ranks (ms):
forward-backward ...............................: (27259.42, 27260.27)
optimizer ......................................: (235.79, 237.12)
[2025-03-12 09:42:08,435] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | optimizer_allgather: 222.21 | optimizer_gradients: 0.56 | optimizer_step: 1.04
[2025-03-12 09:42:08,436] [INFO] [logging.py:128:log_dist] [Rank 0] step=43, skipped=0, lr=[1.3526638433741097e-07, 1.3526638433741097e-07], mom=[(0.9, 0.95), (0.9, 0.95)]
[2025-03-12 09:42:08,436] [INFO] [timer.py:264:stop] epoch=0/micro_step=43/global_step=43, RunningAvgSamplesPerSec=150.65472223210463, CurrSamplesPerSec=150.98323061286504, MemAllocated=13.82GB, MaxMemAllocated=45.85GB
[2025-03-12 09:42:08,436] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd_microstep: 7768.23 | bwd_microstep: 19412.97 | bwd_inner_microstep: 18778.12 | bwd_allreduce_microstep: 634.56 | step_microstep: 236.67
[2025-03-12 09:42:08,436] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd: 7768.25 | bwd: 19412.97 | bwd_inner: 18778.17 | bwd_allreduce: 634.55 | step: 236.67
[2025-03-12 09:42:08][I][megatron/training_log:661] iteration= 43/ 1271565 | consumed_samples= 16512 | consumed_tokens= 67633152 | elapsed_time_per_iteration_ms=27540.6 | learning_rate=1.35266e-07 | global_batch_size= 384 | lm loss=11.086030 | loss_scale=1.0 | grad_norm=10.834 | actual_seqlen= 4096 | number_of_skipped_iterations= 0 | number_of_nan_iterations= 0 | samples_per_second=13.943 | tokens_per_gpu_per_second_tgs=2379.615 | [LM]TFLOPs=98.17 | [DS]TFLOPs=94.33 |
(min, max) time across ranks (ms):
forward-backward ...............................: (27281.60, 27282.47)
optimizer ......................................: (235.86, 236.94)
[2025-03-12 09:42:40,156] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | optimizer_allgather: 223.27 | optimizer_gradients: 0.56 | optimizer_step: 1.05
[2025-03-12 09:42:40,156] [INFO] [logging.py:128:log_dist] [Rank 0] step=44, skipped=0, lr=[1.3841211420572286e-07, 1.3841211420572286e-07], mom=[(0.9, 0.95), (0.9, 0.95)]
[2025-03-12 09:42:40,156] [INFO] [timer.py:264:stop] epoch=0/micro_step=44/global_step=44, RunningAvgSamplesPerSec=150.65684951614153, CurrSamplesPerSec=150.7440607402074, MemAllocated=13.82GB, MaxMemAllocated=45.85GB
[2025-03-12 09:42:40,157] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd_microstep: 7770.04 | bwd_microstep: 23589.33 | bwd_inner_microstep: 22952.42 | bwd_allreduce_microstep: 636.62 | step_microstep: 237.68
[2025-03-12 09:42:40,157] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd: 7770.06 | bwd: 23589.33 | bwd_inner: 22952.47 | bwd_allreduce: 636.61 | step: 237.68
[2025-03-12 09:42:40][I][megatron/training_log:661] iteration= 44/ 1271565 | consumed_samples= 16896 | consumed_tokens= 69206016 | elapsed_time_per_iteration_ms=31719.6 | learning_rate=1.38412e-07 | global_batch_size= 384 | lm loss=11.081297 | loss_scale=1.0 | grad_norm=10.845 | actual_seqlen= 4096 | number_of_skipped_iterations= 0 | number_of_nan_iterations= 0 | samples_per_second=12.106 | tokens_per_gpu_per_second_tgs=2066.106 | [LM]TFLOPs=85.23 | [DS]TFLOPs=81.91 |
(min, max) time across ranks (ms):
forward-backward ...............................: (31460.12, 31461.08)
optimizer ......................................: (236.39, 237.94)
[2025-03-12 09:43:07,653] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | optimizer_allgather: 223.11 | optimizer_gradients: 0.54 | optimizer_step: 1.01
[2025-03-12 09:43:07,653] [INFO] [logging.py:128:log_dist] [Rank 0] step=45, skipped=0, lr=[1.4155784407403475e-07, 1.4155784407403475e-07], mom=[(0.9, 0.95), (0.9, 0.95)]
[2025-03-12 09:43:07,654] [INFO] [timer.py:264:stop] epoch=0/micro_step=45/global_step=45, RunningAvgSamplesPerSec=150.67099479124266, CurrSamplesPerSec=151.26744481924644, MemAllocated=13.82GB, MaxMemAllocated=45.85GB
[2025-03-12 09:43:07,654] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd_microstep: 7753.86 | bwd_microstep: 19400.94 | bwd_inner_microstep: 18767.76 | bwd_allreduce_microstep: 632.88 | step_microstep: 237.56
[2025-03-12 09:43:07,654] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd: 7753.88 | bwd: 19400.94 | bwd_inner: 18767.82 | bwd_allreduce: 632.87 | step: 237.56
[2025-03-12 09:43:07][I][megatron/training_log:661] iteration= 45/ 1271565 | consumed_samples= 17280 | consumed_tokens= 70778880 | elapsed_time_per_iteration_ms=27497.0 | learning_rate=1.41558e-07 | global_batch_size= 384 | lm loss=11.083328 | loss_scale=1.0 | grad_norm=10.560 | actual_seqlen= 4096 | number_of_skipped_iterations= 0 | number_of_nan_iterations= 0 | samples_per_second=13.965 | tokens_per_gpu_per_second_tgs=2383.391 | [LM]TFLOPs=98.32 | [DS]TFLOPs=94.48 |
(min, max) time across ranks (ms):
forward-backward ...............................: (27237.77, 27238.60)
optimizer ......................................: (236.64, 237.83)
[2025-03-12 09:43:35,158] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | optimizer_allgather: 222.73 | optimizer_gradients: 0.53 | optimizer_step: 1.00
[2025-03-12 09:43:35,158] [INFO] [logging.py:128:log_dist] [Rank 0] step=46, skipped=0, lr=[1.4470357394234666e-07, 1.4470357394234666e-07], mom=[(0.9, 0.95), (0.9, 0.95)]
[2025-03-12 09:43:35,159] [INFO] [timer.py:264:stop] epoch=0/micro_step=46/global_step=46, RunningAvgSamplesPerSec=150.68493372147117, CurrSamplesPerSec=151.28669764056133, MemAllocated=13.82GB, MaxMemAllocated=45.85GB
[2025-03-12 09:43:35,159] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd_microstep: 7776.96 | bwd_microstep: 19387.18 | bwd_inner_microstep: 18754.51 | bwd_allreduce_microstep: 632.37 | step_microstep: 236.98
[2025-03-12 09:43:35,159] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd: 7776.99 | bwd: 19387.17 | bwd_inner: 18754.57 | bwd_allreduce: 632.36 | step: 236.98
[2025-03-12 09:43:35][I][megatron/training_log:661] iteration= 46/ 1271565 | consumed_samples= 17664 | consumed_tokens= 72351744 | elapsed_time_per_iteration_ms=27504.6 | learning_rate=1.44704e-07 | global_batch_size= 384 | lm loss=11.064159 | loss_scale=1.0 | grad_norm=11.739 | actual_seqlen= 4096 | number_of_skipped_iterations= 0 | number_of_nan_iterations= 0 | samples_per_second=13.961 | tokens_per_gpu_per_second_tgs=2382.729 | [LM]TFLOPs=98.29 | [DS]TFLOPs=94.46 |
(min, max) time across ranks (ms):
forward-backward ...............................: (27246.27, 27247.13)
optimizer ......................................: (236.16, 237.25)
[2025-03-12 09:44:02,703] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | optimizer_allgather: 223.02 | optimizer_gradients: 0.53 | optimizer_step: 0.98
[2025-03-12 09:44:02,704] [INFO] [logging.py:128:log_dist] [Rank 0] step=47, skipped=0, lr=[1.4784930381065852e-07, 1.4784930381065852e-07], mom=[(0.9, 0.95), (0.9, 0.95)]
[2025-03-12 09:44:02,704] [INFO] [timer.py:264:stop] epoch=0/micro_step=47/global_step=47, RunningAvgSamplesPerSec=150.6950169635293, CurrSamplesPerSec=151.1399600384914, MemAllocated=13.82GB, MaxMemAllocated=45.85GB
[2025-03-12 09:44:02,704] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd_microstep: 7778.26 | bwd_microstep: 19423.03 | bwd_inner_microstep: 18789.28 | bwd_allreduce_microstep: 633.45 | step_microstep: 237.42
[2025-03-12 09:44:02,704] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd: 7778.28 | bwd: 19423.03 | bwd_inner: 18789.33 | bwd_allreduce: 633.45 | step: 237.42
[2025-03-12 09:44:02][I][megatron/training_log:661] iteration= 47/ 1271565 | consumed_samples= 18048 | consumed_tokens= 73924608 | elapsed_time_per_iteration_ms=27545.2 | learning_rate=1.47849e-07 | global_batch_size= 384 | lm loss=11.060505 | loss_scale=1.0 | grad_norm=11.227 | actual_seqlen= 4096 | number_of_skipped_iterations= 0 | number_of_nan_iterations= 0 | samples_per_second=13.941 | tokens_per_gpu_per_second_tgs=2379.220 | [LM]TFLOPs=98.15 | [DS]TFLOPs=94.32 |
(min, max) time across ranks (ms):
forward-backward ...............................: (27285.29, 27286.13)
optimizer ......................................: (236.22, 237.69)
[2025-03-12 09:44:30,218] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | optimizer_allgather: 223.47 | optimizer_gradients: 0.53 | optimizer_step: 0.98
[2025-03-12 09:44:30,219] [INFO] [logging.py:128:log_dist] [Rank 0] step=48, skipped=0, lr=[1.509950336789704e-07, 1.509950336789704e-07], mom=[(0.9, 0.95), (0.9, 0.95)]
[2025-03-12 09:44:30,219] [INFO] [timer.py:264:stop] epoch=0/micro_step=48/global_step=48, RunningAvgSamplesPerSec=150.70708793339045, CurrSamplesPerSec=151.25223074254686, MemAllocated=13.82GB, MaxMemAllocated=45.85GB
[2025-03-12 09:44:30,219] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd_microstep: 7767.44 | bwd_microstep: 19402.55 | bwd_inner_microstep: 18770.56 | bwd_allreduce_microstep: 631.70 | step_microstep: 237.82
[2025-03-12 09:44:30,220] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd: 7767.46 | bwd: 19402.55 | bwd_inner: 18770.61 | bwd_allreduce: 631.70 | step: 237.83
[2025-03-12 09:44:30][I][megatron/training_log:661] iteration= 48/ 1271565 | consumed_samples= 18432 | consumed_tokens= 75497472 | elapsed_time_per_iteration_ms=27515.5 | learning_rate=1.50995e-07 | global_batch_size= 384 | lm loss=11.051532 | loss_scale=1.0 | grad_norm=11.153 | actual_seqlen= 4096 | number_of_skipped_iterations= 0 | number_of_nan_iterations= 0 | samples_per_second=13.956 | tokens_per_gpu_per_second_tgs=2381.785 | [LM]TFLOPs=98.26 | [DS]TFLOPs=94.42 |
(min, max) time across ranks (ms):
forward-backward ...............................: (27254.93, 27255.75)
optimizer ......................................: (236.69, 238.10)
[2025-03-12 09:44:57,775] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | optimizer_allgather: 223.24 | optimizer_gradients: 0.53 | optimizer_step: 1.02
[2025-03-12 09:44:57,775] [INFO] [logging.py:128:log_dist] [Rank 0] step=49, skipped=0, lr=[1.541407635472823e-07, 1.541407635472823e-07], mom=[(0.9, 0.95), (0.9, 0.95)]
[2025-03-12 09:44:57,776] [INFO] [timer.py:264:stop] epoch=0/micro_step=49/global_step=49, RunningAvgSamplesPerSec=150.71090194884374, CurrSamplesPerSec=150.8864962974408, MemAllocated=13.82GB, MaxMemAllocated=45.85GB
[2025-03-12 09:44:57,776] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd_microstep: 7784.89 | bwd_microstep: 19426.83 | bwd_inner_microstep: 18790.67 | bwd_allreduce_microstep: 635.86 | step_microstep: 237.48
[2025-03-12 09:44:57,776] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd: 7784.91 | bwd: 19426.82 | bwd_inner: 18790.72 | bwd_allreduce: 635.85 | step: 237.48
[2025-03-12 09:44:57][I][megatron/training_log:661] iteration= 49/ 1271565 | consumed_samples= 18816 | consumed_tokens= 77070336 | elapsed_time_per_iteration_ms=27556.0 | learning_rate=1.54141e-07 | global_batch_size= 384 | lm loss=11.055607 | loss_scale=1.0 | grad_norm=11.085 | actual_seqlen= 4096 | number_of_skipped_iterations= 0 | number_of_nan_iterations= 0 | samples_per_second=13.935 | tokens_per_gpu_per_second_tgs=2378.280 | [LM]TFLOPs=98.11 | [DS]TFLOPs=94.28 |
(min, max) time across ranks (ms):
forward-backward ...............................: (27296.79, 27297.61)
optimizer ......................................: (236.40, 237.75)
[2025-03-12 09:45:25,336] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | optimizer_allgather: 222.64 | optimizer_gradients: 0.53 | optimizer_step: 1.00
[2025-03-12 09:45:25,336] [INFO] [logging.py:128:log_dist] [Rank 0] step=50, skipped=0, lr=[1.5728649341559419e-07, 1.5728649341559419e-07], mom=[(0.9, 0.95), (0.9, 0.95)]
[2025-03-12 09:45:25,337] [INFO] [timer.py:264:stop] epoch=0/micro_step=50/global_step=50, RunningAvgSamplesPerSec=150.71373983157494, CurrSamplesPerSec=150.84718172369494, MemAllocated=13.82GB, MaxMemAllocated=45.85GB
[2025-03-12 09:45:25,337] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd_microstep: 7769.20 | bwd_microstep: 19451.29 | bwd_inner_microstep: 18816.92 | bwd_allreduce_microstep: 634.07 | step_microstep: 236.97
[2025-03-12 09:45:25,337] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd: 7769.22 | bwd: 19451.29 | bwd_inner: 18816.98 | bwd_allreduce: 634.06 | step: 236.97
[2025-03-12 09:45:25][I][megatron/training_log:661] iteration= 50/ 1271565 | consumed_samples= 19200 | consumed_tokens= 78643200 | elapsed_time_per_iteration_ms=27560.6 | learning_rate=1.57286e-07 | global_batch_size= 384 | lm loss=11.036057 | loss_scale=1.0 | grad_norm=11.555 | actual_seqlen= 4096 | number_of_skipped_iterations= 0 | number_of_nan_iterations= 0 | samples_per_second=13.933 | tokens_per_gpu_per_second_tgs=2377.889 | [LM]TFLOPs=98.10 | [DS]TFLOPs=94.27 |
(min, max) time across ranks (ms):
forward-backward ...............................: (27301.97, 27302.88)
optimizer ......................................: (236.15, 237.23)
[2025-03-12 09:45:25][I][megatron/checkpointing:589] Saving lr_state_dict to checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/lr_state_dict.yaml
[2025-03-12 09:45:25][I][megatron/utils:368] saving checkpoint at iteration 50 to checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash
[2025-03-12 09:45:25,373] [INFO] [logging.py:128:log_dist] [Rank 0] [Torch] Checkpoint global_step50 is about to be saved!
[2025-03-12 09:45:25,383] [INFO] [logging.py:128:log_dist] [Rank 0] Saving model checkpoint: checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/global_step50/mp_rank_00_model_states.pt
[2025-03-12 09:45:25,384] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/global_step50/mp_rank_00_model_states.pt...
[2025-03-12 09:45:36,631] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/global_step50/mp_rank_00_model_states.pt.
[2025-03-12 09:45:36,655] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/global_step50/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt...
[2025-03-12 09:45:36,655] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/global_step50/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt...
[2025-03-12 09:45:36,655] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/global_step50/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt...
[2025-03-12 09:45:36,655] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/global_step50/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt...
[2025-03-12 09:45:36,655] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/global_step50/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt...
[2025-03-12 09:45:36,655] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/global_step50/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt...
[2025-03-12 09:45:36,655] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/global_step50/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt...
[2025-03-12 09:45:36,655] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/global_step50/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt...
[2025-03-12 09:45:36,655] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/global_step50/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt...
[2025-03-12 09:45:36,655] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/global_step50/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt...
[2025-03-12 09:45:36,655] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/global_step50/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt...
[2025-03-12 09:45:36,655] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/global_step50/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt...
[2025-03-12 09:45:36,655] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/global_step50/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt...
[2025-03-12 09:45:36,655] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/global_step50/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt...
[2025-03-12 09:45:36,655] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/global_step50/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt...
[2025-03-12 09:45:36,655] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/global_step50/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt...
[2025-03-12 09:45:36,655] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/global_step50/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt...
[2025-03-12 09:45:36,655] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/global_step50/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt...
[2025-03-12 09:45:36,655] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/global_step50/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt...
[2025-03-12 09:45:36,655] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/global_step50/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt...
[2025-03-12 09:45:36,656] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/global_step50/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt...
[2025-03-12 09:45:36,656] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/global_step50/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt...
[2025-03-12 09:45:36,656] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/global_step50/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt...
[2025-03-12 09:45:36,655] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/global_step50/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt...
[2025-03-12 09:45:40,037] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/global_step50/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt.
[2025-03-12 09:45:40,037] [INFO] [engine.py:3640:_save_zero_checkpoint] zero checkpoint saved checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/global_step50/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt
[2025-03-12 09:45:40,038] [INFO] [torch_checkpoint_engine.py:33:commit] [Torch] Checkpoint global_step50 is ready now!
[2025-03-12 09:45:40,074] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/global_step50/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt.
[2025-03-12 09:45:40,074] [INFO] [engine.py:3640:_save_zero_checkpoint] zero checkpoint saved checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/global_step50/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt
[2025-03-12 09:45:40,074] [INFO] [torch_checkpoint_engine.py:33:commit] [Torch] Checkpoint global_step50 is ready now!
[2025-03-12 09:45:40,094] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/global_step50/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt.
[2025-03-12 09:45:40,094] [INFO] [engine.py:3640:_save_zero_checkpoint] zero checkpoint saved checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/global_step50/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt
[2025-03-12 09:45:40,095] [INFO] [torch_checkpoint_engine.py:33:commit] [Torch] Checkpoint global_step50 is ready now!
[2025-03-12 09:45:40,098] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/global_step50/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt.
[2025-03-12 09:45:40,099] [INFO] [engine.py:3640:_save_zero_checkpoint] zero checkpoint saved checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/global_step50/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt
[2025-03-12 09:45:40,099] [INFO] [torch_checkpoint_engine.py:33:commit] [Torch] Checkpoint global_step50 is ready now!
[2025-03-12 09:45:40,106] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/global_step50/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt.
[2025-03-12 09:45:40,107] [INFO] [engine.py:3640:_save_zero_checkpoint] zero checkpoint saved checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/global_step50/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt
[2025-03-12 09:45:40,107] [INFO] [torch_checkpoint_engine.py:33:commit] [Torch] Checkpoint global_step50 is ready now!
[2025-03-12 09:45:40,107] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/global_step50/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt.
[2025-03-12 09:45:40,108] [INFO] [engine.py:3640:_save_zero_checkpoint] zero checkpoint saved checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/global_step50/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt
[2025-03-12 09:45:40,108] [INFO] [torch_checkpoint_engine.py:33:commit] [Torch] Checkpoint global_step50 is ready now!
[2025-03-12 09:45:40,108] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/global_step50/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt.
[2025-03-12 09:45:40,108] [INFO] [engine.py:3640:_save_zero_checkpoint] zero checkpoint saved checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/global_step50/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt
[2025-03-12 09:45:40,108] [INFO] [torch_checkpoint_engine.py:33:commit] [Torch] Checkpoint global_step50 is ready now!
[2025-03-12 09:45:40,111] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/global_step50/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt.
[2025-03-12 09:45:40,114] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/global_step50/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt.
[2025-03-12 09:45:40,115] [INFO] [engine.py:3640:_save_zero_checkpoint] zero checkpoint saved checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/global_step50/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt
[2025-03-12 09:45:40,115] [INFO] [torch_checkpoint_engine.py:33:commit] [Torch] Checkpoint global_step50 is ready now!
[2025-03-12 09:45:40,118] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/global_step50/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt.
[2025-03-12 09:45:40,118] [INFO] [engine.py:3640:_save_zero_checkpoint] zero checkpoint saved checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/global_step50/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt
[2025-03-12 09:45:40,118] [INFO] [torch_checkpoint_engine.py:33:commit] [Torch] Checkpoint global_step50 is ready now!
[2025-03-12 09:45:40,120] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/global_step50/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt.
[2025-03-12 09:45:40,120] [INFO] [engine.py:3640:_save_zero_checkpoint] zero checkpoint saved checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/global_step50/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt
[2025-03-12 09:45:40,120] [INFO] [torch_checkpoint_engine.py:33:commit] [Torch] Checkpoint global_step50 is ready now!
[2025-03-12 09:45:40,120] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/global_step50/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt.
[2025-03-12 09:45:40,120] [INFO] [engine.py:3640:_save_zero_checkpoint] zero checkpoint saved checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/global_step50/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt
[2025-03-12 09:45:40,121] [INFO] [torch_checkpoint_engine.py:33:commit] [Torch] Checkpoint global_step50 is ready now!
[2025-03-12 09:45:40,123] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/global_step50/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt.
[2025-03-12 09:45:40,124] [INFO] [engine.py:3640:_save_zero_checkpoint] zero checkpoint saved checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/global_step50/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt
[2025-03-12 09:45:40,124] [INFO] [torch_checkpoint_engine.py:33:commit] [Torch] Checkpoint global_step50 is ready now!
[2025-03-12 09:45:40,134] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/global_step50/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt.
[2025-03-12 09:45:40,134] [INFO] [engine.py:3640:_save_zero_checkpoint] zero checkpoint saved checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/global_step50/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt
[2025-03-12 09:45:40,134] [INFO] [torch_checkpoint_engine.py:33:commit] [Torch] Checkpoint global_step50 is ready now!
[2025-03-12 09:45:40,138] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/global_step50/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt.
[2025-03-12 09:45:40,138] [INFO] [engine.py:3640:_save_zero_checkpoint] zero checkpoint saved checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/global_step50/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt
[2025-03-12 09:45:40,138] [INFO] [torch_checkpoint_engine.py:33:commit] [Torch] Checkpoint global_step50 is ready now!
[2025-03-12 09:45:40,143] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/global_step50/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt.
[2025-03-12 09:45:40,143] [INFO] [engine.py:3640:_save_zero_checkpoint] zero checkpoint saved checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/global_step50/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt
[2025-03-12 09:45:40,143] [INFO] [torch_checkpoint_engine.py:33:commit] [Torch] Checkpoint global_step50 is ready now!
[2025-03-12 09:45:40,145] [INFO] [engine.py:3640:_save_zero_checkpoint] zero checkpoint saved checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/global_step50/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt
[2025-03-12 09:45:40,145] [INFO] [torch_checkpoint_engine.py:33:commit] [Torch] Checkpoint global_step50 is ready now!
[2025-03-12 09:45:40,146] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/global_step50/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt.
[2025-03-12 09:45:40,147] [INFO] [engine.py:3640:_save_zero_checkpoint] zero checkpoint saved checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/global_step50/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt
[2025-03-12 09:45:40,147] [INFO] [torch_checkpoint_engine.py:33:commit] [Torch] Checkpoint global_step50 is ready now!
[2025-03-12 09:45:40,159] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/global_step50/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt.
[2025-03-12 09:45:40,159] [INFO] [engine.py:3640:_save_zero_checkpoint] zero checkpoint saved checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/global_step50/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt
[2025-03-12 09:45:40,160] [INFO] [torch_checkpoint_engine.py:33:commit] [Torch] Checkpoint global_step50 is ready now!
[2025-03-12 09:45:40,162] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/global_step50/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt.
[2025-03-12 09:45:40,162] [INFO] [engine.py:3640:_save_zero_checkpoint] zero checkpoint saved checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/global_step50/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt
[2025-03-12 09:45:40,162] [INFO] [torch_checkpoint_engine.py:33:commit] [Torch] Checkpoint global_step50 is ready now!
[2025-03-12 09:45:40,180] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/global_step50/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt.
[2025-03-12 09:45:40,181] [INFO] [engine.py:3640:_save_zero_checkpoint] zero checkpoint saved checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/global_step50/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt
[2025-03-12 09:45:40,181] [INFO] [torch_checkpoint_engine.py:33:commit] [Torch] Checkpoint global_step50 is ready now!
[2025-03-12 09:45:40,181] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/global_step50/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt.
[2025-03-12 09:45:40,181] [INFO] [engine.py:3640:_save_zero_checkpoint] zero checkpoint saved checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/global_step50/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt
[2025-03-12 09:45:40,181] [INFO] [torch_checkpoint_engine.py:33:commit] [Torch] Checkpoint global_step50 is ready now!
[2025-03-12 09:45:40,283] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/global_step50/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt.
[2025-03-12 09:45:40,283] [INFO] [engine.py:3640:_save_zero_checkpoint] zero checkpoint saved checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/global_step50/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt
[2025-03-12 09:45:40,283] [INFO] [torch_checkpoint_engine.py:33:commit] [Torch] Checkpoint global_step50 is ready now!
[2025-03-12 09:45:40,356] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/global_step50/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt.
[2025-03-12 09:45:40,356] [INFO] [engine.py:3640:_save_zero_checkpoint] zero checkpoint saved checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/global_step50/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt
[2025-03-12 09:45:40,356] [INFO] [torch_checkpoint_engine.py:33:commit] [Torch] Checkpoint global_step50 is ready now!
[2025-03-12 09:45:40,359] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/global_step50/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt.
[2025-03-12 09:45:40,359] [INFO] [engine.py:3640:_save_zero_checkpoint] zero checkpoint saved checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/global_step50/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt
[2025-03-12 09:45:40,359] [INFO] [torch_checkpoint_engine.py:33:commit] [Torch] Checkpoint global_step50 is ready now!
[2025-03-12 09:45:40][I][megatron/utils:368] successfully saved checkpoint at iteration 50 to checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash
[2025-03-12 09:45:40][I][megatron/utils:368] Checkpoint Save GB: 83.064, GB/Sec: 5.53, Latency(second): 15.011
(min, max) time across ranks (ms):
save-checkpoint ................................: (15010.94, 15011.01)
[2025-03-12 09:45:40][I][ezpz/dist:125] `save_checkpoint_and_time`((50, [DeepSpeedEngine(
(module): GPTModel(
(language_model): TransformerLanguageModel(
(embedding): Embedding(
(word_embeddings): VocabParallelEmbedding()
(embedding_dropout): Dropout(p=0.0, inplace=False)
)
(rotary_pos_emb): RotaryEmbedding()
(encoder): ParallelTransformer(
(layers): ModuleList(
(0-31): 32 x ParallelTransformerLayer(
(input_layernorm): RMSNorm()
(self_attention): ParallelAttention(
(query_key_value): ColumnParallelLinear()
(core_attention_flash): FlashSelfAttention()
(dense): RowParallelLinear()
)
(post_attention_layernorm): RMSNorm()
(mlp): ParallelMLP(
(dense_h_to_4h): ColumnParallelLinear()
(dense_4h_to_h): RowParallelLinear()
)
)
)
(final_layernorm): RMSNorm()
)
(output_layer): ColumnParallelLinear()
)
)
)], <deepspeed.runtime.zero.stage_1_and_2.DeepSpeedZeroOptimizer object at 0x152e647463b0>, <megatron.optimizer_param_scheduler.OptimizerParamScheduler object at 0x152e64716bc0>)) took: dt=15.0156s
[2025-03-12 09:46:07,899] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | optimizer_allgather: 222.88 | optimizer_gradients: 0.53 | optimizer_step: 1.00
[2025-03-12 09:46:07,899] [INFO] [logging.py:128:log_dist] [Rank 0] step=51, skipped=0, lr=[1.6043222328390607e-07, 1.6043222328390607e-07], mom=[(0.9, 0.95), (0.9, 0.95)]
[2025-03-12 09:46:07,900] [INFO] [timer.py:264:stop] epoch=0/micro_step=51/global_step=51, RunningAvgSamplesPerSec=150.72006367000233, CurrSamplesPerSec=151.0241738655975, MemAllocated=13.82GB, MaxMemAllocated=45.85GB
[2025-03-12 09:46:07,900] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd_microstep: 7764.78 | bwd_microstep: 19433.86 | bwd_inner_microstep: 18794.45 | bwd_allreduce_microstep: 639.12 | step_microstep: 237.23
[2025-03-12 09:46:07,900] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd: 7764.80 | bwd: 19433.86 | bwd_inner: 18794.50 | bwd_allreduce: 639.11 | step: 237.23
[2025-03-12 09:46:07][I][megatron/training_log:661] iteration= 51/ 1271565 | consumed_samples= 19584 | consumed_tokens= 80216064 | elapsed_time_per_iteration_ms=42562.8 | learning_rate=1.60432e-07 | global_batch_size= 384 | lm loss=11.035048 | loss_scale=1.0 | grad_norm=11.677 | actual_seqlen= 4096 | number_of_skipped_iterations= 0 | number_of_nan_iterations= 0 | samples_per_second=9.022 | tokens_per_gpu_per_second_tgs=1539.747 | [LM]TFLOPs=63.52 | [DS]TFLOPs=61.04 |
(min, max) time across ranks (ms):
forward-backward ...............................: (27283.22, 27284.14)
optimizer ......................................: (236.35, 237.48)
[2025-03-12 09:46:35,470] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | optimizer_allgather: 223.11 | optimizer_gradients: 0.56 | optimizer_step: 1.02
[2025-03-12 09:46:35,470] [INFO] [logging.py:128:log_dist] [Rank 0] step=52, skipped=0, lr=[1.6357795315221793e-07, 1.6357795315221793e-07], mom=[(0.9, 0.95), (0.9, 0.95)]
[2025-03-12 09:46:35,470] [INFO] [timer.py:264:stop] epoch=0/micro_step=52/global_step=52, RunningAvgSamplesPerSec=150.71736368657093, CurrSamplesPerSec=150.58512384240427, MemAllocated=13.82GB, MaxMemAllocated=45.85GB
[2025-03-12 09:46:35,471] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd_microstep: 7767.80 | bwd_microstep: 19448.09 | bwd_inner_microstep: 18805.60 | bwd_allreduce_microstep: 642.19 | step_microstep: 237.50
[2025-03-12 09:46:35,471] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd: 7767.82 | bwd: 19448.09 | bwd_inner: 18805.66 | bwd_allreduce: 642.18 | step: 237.50
[2025-03-12 09:46:35][I][megatron/training_log:661] iteration= 52/ 1271565 | consumed_samples= 19968 | consumed_tokens= 81788928 | elapsed_time_per_iteration_ms=27570.6 | learning_rate=1.63578e-07 | global_batch_size= 384 | lm loss=11.030777 | loss_scale=1.0 | grad_norm=11.075 | actual_seqlen= 4096 | number_of_skipped_iterations= 0 | number_of_nan_iterations= 0 | samples_per_second=13.928 | tokens_per_gpu_per_second_tgs=2377.024 | [LM]TFLOPs=98.06 | [DS]TFLOPs=94.23 |
(min, max) time across ranks (ms):
forward-backward ...............................: (27311.23, 27311.94)
optimizer ......................................: (236.33, 237.78)
[2025-03-12 09:47:02,999] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | optimizer_allgather: 222.64 | optimizer_gradients: 0.56 | optimizer_step: 1.02
[2025-03-12 09:47:02,999] [INFO] [logging.py:128:log_dist] [Rank 0] step=53, skipped=0, lr=[1.6672368302052982e-07, 1.6672368302052982e-07], mom=[(0.9, 0.95), (0.9, 0.95)]
[2025-03-12 09:47:03,000] [INFO] [timer.py:264:stop] epoch=0/micro_step=53/global_step=53, RunningAvgSamplesPerSec=150.72290826800395, CurrSamplesPerSec=151.00059905306884, MemAllocated=13.82GB, MaxMemAllocated=45.85GB
[2025-03-12 09:47:03,000] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd_microstep: 7763.91 | bwd_microstep: 19398.25 | bwd_inner_microstep: 18760.66 | bwd_allreduce_microstep: 637.29 | step_microstep: 237.00
[2025-03-12 09:47:03,000] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd: 7763.93 | bwd: 19398.24 | bwd_inner: 18760.71 | bwd_allreduce: 637.28 | step: 237.00
[2025-03-12 09:47:03][I][megatron/training_log:661] iteration= 53/ 1271565 | consumed_samples= 20352 | consumed_tokens= 83361792 | elapsed_time_per_iteration_ms=27544.1 | learning_rate=1.66724e-07 | global_batch_size= 384 | lm loss=11.005310 | loss_scale=1.0 | grad_norm=11.063 | actual_seqlen= 4096 | number_of_skipped_iterations= 0 | number_of_nan_iterations= 0 | samples_per_second=13.941 | tokens_per_gpu_per_second_tgs=2379.315 | [LM]TFLOPs=98.15 | [DS]TFLOPs=94.32 |
(min, max) time across ranks (ms):
forward-backward ...............................: (27270.48, 27271.12)
optimizer ......................................: (236.09, 237.29)
[2025-03-12 09:47:30,531] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | optimizer_allgather: 223.01 | optimizer_gradients: 0.53 | optimizer_step: 1.03
[2025-03-12 09:47:30,531] [INFO] [logging.py:128:log_dist] [Rank 0] step=54, skipped=0, lr=[1.6986941288884168e-07, 1.6986941288884168e-07], mom=[(0.9, 0.95), (0.9, 0.95)]
[2025-03-12 09:47:30,532] [INFO] [timer.py:264:stop] epoch=0/micro_step=54/global_step=54, RunningAvgSamplesPerSec=150.72117384244626, CurrSamplesPerSec=150.63271194928672, MemAllocated=13.82GB, MaxMemAllocated=45.85GB
[2025-03-12 09:47:30,532] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd_microstep: 7772.02 | bwd_microstep: 19385.58 | bwd_inner_microstep: 18743.53 | bwd_allreduce_microstep: 641.75 | step_microstep: 237.28
[2025-03-12 09:47:30,532] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd: 7772.04 | bwd: 19385.58 | bwd_inner: 18743.59 | bwd_allreduce: 641.74 | step: 237.28
[2025-03-12 09:47:30][I][megatron/training_log:661] iteration= 54/ 1271565 | consumed_samples= 20736 | consumed_tokens= 84934656 | elapsed_time_per_iteration_ms=27517.0 | learning_rate=1.69869e-07 | global_batch_size= 384 | lm loss=10.987799 | loss_scale=1.0 | grad_norm=11.460 | actual_seqlen= 4096 | number_of_skipped_iterations= 0 | number_of_nan_iterations= 0 | samples_per_second=13.955 | tokens_per_gpu_per_second_tgs=2381.653 | [LM]TFLOPs=98.25 | [DS]TFLOPs=94.41 |
(min, max) time across ranks (ms):
forward-backward ...............................: (27257.15, 27258.15)
optimizer ......................................: (236.14, 237.55)
[2025-03-12 09:47:58,101] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | optimizer_allgather: 223.37 | optimizer_gradients: 0.53 | optimizer_step: 1.01
[2025-03-12 09:47:58,102] [INFO] [logging.py:128:log_dist] [Rank 0] step=55, skipped=0, lr=[1.730151427571536e-07, 1.730151427571536e-07], mom=[(0.9, 0.95), (0.9, 0.95)]
[2025-03-12 09:47:58,102] [INFO] [timer.py:264:stop] epoch=0/micro_step=55/global_step=55, RunningAvgSamplesPerSec=150.71704256819055, CurrSamplesPerSec=150.50246896079437, MemAllocated=13.82GB, MaxMemAllocated=45.85GB
[2025-03-12 09:47:58,102] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd_microstep: 7766.86 | bwd_microstep: 19441.42 | bwd_inner_microstep: 18802.36 | bwd_allreduce_microstep: 638.76 | step_microstep: 237.69
[2025-03-12 09:47:58,102] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd: 7766.88 | bwd: 19441.42 | bwd_inner: 18802.42 | bwd_allreduce: 638.75 | step: 237.69
[2025-03-12 09:47:58][I][megatron/training_log:661] iteration= 55/ 1271565 | consumed_samples= 21120 | consumed_tokens= 86507520 | elapsed_time_per_iteration_ms=27569.8 | learning_rate=1.73015e-07 | global_batch_size= 384 | lm loss=10.956585 | loss_scale=1.0 | grad_norm=12.528 | actual_seqlen= 4096 | number_of_skipped_iterations= 0 | number_of_nan_iterations= 0 | samples_per_second=13.928 | tokens_per_gpu_per_second_tgs=2377.090 | [LM]TFLOPs=98.06 | [DS]TFLOPs=94.23 |
(min, max) time across ranks (ms):
forward-backward ...............................: (27310.30, 27311.15)
optimizer ......................................: (236.76, 237.96)
[2025-03-12 09:48:25,592] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | optimizer_allgather: 222.78 | optimizer_gradients: 0.53 | optimizer_step: 1.04
[2025-03-12 09:48:25,592] [INFO] [logging.py:128:log_dist] [Rank 0] step=56, skipped=0, lr=[1.7616087262546548e-07, 1.7616087262546548e-07], mom=[(0.9, 0.95), (0.9, 0.95)]
[2025-03-12 09:48:25,593] [INFO] [timer.py:264:stop] epoch=0/micro_step=56/global_step=56, RunningAvgSamplesPerSec=150.71751857754913, CurrSamplesPerSec=150.74269220150842, MemAllocated=13.82GB, MaxMemAllocated=45.85GB
[2025-03-12 09:48:25,593] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd_microstep: 7746.50 | bwd_microstep: 19399.74 | bwd_inner_microstep: 18758.68 | bwd_allreduce_microstep: 640.77 | step_microstep: 237.31
[2025-03-12 09:48:25,593] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd: 7746.52 | bwd: 19399.73 | bwd_inner: 18758.73 | bwd_allreduce: 640.77 | step: 237.31
[2025-03-12 09:48:25][I][megatron/training_log:661] iteration= 56/ 1271565 | consumed_samples= 21504 | consumed_tokens= 88080384 | elapsed_time_per_iteration_ms=27490.6 | learning_rate=1.76161e-07 | global_batch_size= 384 | lm loss=10.941004 | loss_scale=1.0 | grad_norm=11.555 | actual_seqlen= 4096 | number_of_skipped_iterations= 0 | number_of_nan_iterations= 0 | samples_per_second=13.968 | tokens_per_gpu_per_second_tgs=2383.944 | [LM]TFLOPs=98.35 | [DS]TFLOPs=94.51 |
(min, max) time across ranks (ms):
forward-backward ...............................: (27231.20, 27231.96)
optimizer ......................................: (235.97, 237.59)
[2025-03-12 09:48:53,163] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | optimizer_allgather: 223.03 | optimizer_gradients: 0.53 | optimizer_step: 0.98
[2025-03-12 09:48:53,163] [INFO] [logging.py:128:log_dist] [Rank 0] step=57, skipped=0, lr=[1.7930660249377737e-07, 1.7930660249377737e-07], mom=[(0.9, 0.95), (0.9, 0.95)]
[2025-03-12 09:48:53,164] [INFO] [timer.py:264:stop] epoch=0/micro_step=57/global_step=57, RunningAvgSamplesPerSec=150.71533374439005, CurrSamplesPerSec=150.59738768407385, MemAllocated=13.82GB, MaxMemAllocated=45.85GB
[2025-03-12 09:48:53,164] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd_microstep: 7786.85 | bwd_microstep: 19439.95 | bwd_inner_microstep: 18801.56 | bwd_allreduce_microstep: 638.08 | step_microstep: 237.27
[2025-03-12 09:48:53,164] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd: 7786.87 | bwd: 19439.94 | bwd_inner: 18801.62 | bwd_allreduce: 638.07 | step: 237.27
[2025-03-12 09:48:53][I][megatron/training_log:661] iteration= 57/ 1271565 | consumed_samples= 21888 | consumed_tokens= 89653248 | elapsed_time_per_iteration_ms=27570.8 | learning_rate=1.79307e-07 | global_batch_size= 384 | lm loss=10.909065 | loss_scale=1.0 | grad_norm=12.329 | actual_seqlen= 4096 | number_of_skipped_iterations= 0 | number_of_nan_iterations= 0 | samples_per_second=13.928 | tokens_per_gpu_per_second_tgs=2377.011 | [LM]TFLOPs=98.06 | [DS]TFLOPs=94.23 |
(min, max) time across ranks (ms):
forward-backward ...............................: (27310.97, 27312.03)
optimizer ......................................: (236.29, 237.54)
[2025-03-12 09:49:20,721] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | optimizer_allgather: 223.22 | optimizer_gradients: 0.53 | optimizer_step: 1.01
[2025-03-12 09:49:20,721] [INFO] [logging.py:128:log_dist] [Rank 0] step=58, skipped=0, lr=[1.8245233236208926e-07, 1.8245233236208926e-07], mom=[(0.9, 0.95), (0.9, 0.95)]
[2025-03-12 09:49:20,722] [INFO] [timer.py:264:stop] epoch=0/micro_step=58/global_step=58, RunningAvgSamplesPerSec=150.718896873452, CurrSamplesPerSec=150.91506945042445, MemAllocated=13.82GB, MaxMemAllocated=45.85GB
[2025-03-12 09:49:20,722] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd_microstep: 7763.22 | bwd_microstep: 19449.94 | bwd_inner_microstep: 18814.93 | bwd_allreduce_microstep: 634.72 | step_microstep: 237.46
[2025-03-12 09:49:20,722] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd: 7763.24 | bwd: 19449.94 | bwd_inner: 18814.98 | bwd_allreduce: 634.71 | step: 237.47
[2025-03-12 09:49:20][I][megatron/training_log:661] iteration= 58/ 1271565 | consumed_samples= 22272 | consumed_tokens= 91226112 | elapsed_time_per_iteration_ms=27557.6 | learning_rate=1.82452e-07 | global_batch_size= 384 | lm loss=10.902085 | loss_scale=1.0 | grad_norm=12.222 | actual_seqlen= 4096 | number_of_skipped_iterations= 0 | number_of_nan_iterations= 0 | samples_per_second=13.934 | tokens_per_gpu_per_second_tgs=2378.145 | [LM]TFLOPs=98.11 | [DS]TFLOPs=94.28 |
(min, max) time across ranks (ms):
forward-backward ...............................: (27297.85, 27298.71)
optimizer ......................................: (236.37, 237.74)
[2025-03-12 09:49:48,273] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | optimizer_allgather: 223.24 | optimizer_gradients: 0.52 | optimizer_step: 1.02
[2025-03-12 09:49:48,274] [INFO] [logging.py:128:log_dist] [Rank 0] step=59, skipped=0, lr=[1.8559806223040112e-07, 1.8559806223040112e-07], mom=[(0.9, 0.95), (0.9, 0.95)]
[2025-03-12 09:49:48,274] [INFO] [timer.py:264:stop] epoch=0/micro_step=59/global_step=59, RunningAvgSamplesPerSec=150.71482314914178, CurrSamplesPerSec=150.48698654380541, MemAllocated=13.82GB, MaxMemAllocated=45.85GB
[2025-03-12 09:49:48,274] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd_microstep: 7761.09 | bwd_microstep: 19449.90 | bwd_inner_microstep: 18812.11 | bwd_allreduce_microstep: 637.50 | step_microstep: 237.58
[2025-03-12 09:49:48,275] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd: 7761.11 | bwd: 19449.90 | bwd_inner: 18812.17 | bwd_allreduce: 637.49 | step: 237.58
[2025-03-12 09:49:48][I][megatron/training_log:661] iteration= 59/ 1271565 | consumed_samples= 22656 | consumed_tokens= 92798976 | elapsed_time_per_iteration_ms=27552.8 | learning_rate=1.85598e-07 | global_batch_size= 384 | lm loss=10.873564 | loss_scale=1.0 | grad_norm=12.056 | actual_seqlen= 4096 | number_of_skipped_iterations= 0 | number_of_nan_iterations= 0 | samples_per_second=13.937 | tokens_per_gpu_per_second_tgs=2378.561 | [LM]TFLOPs=98.12 | [DS]TFLOPs=94.29 |
(min, max) time across ranks (ms):
forward-backward ...............................: (27293.18, 27293.90)
optimizer ......................................: (236.58, 237.85)
[2025-03-12 09:50:15,809] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | optimizer_allgather: 222.77 | optimizer_gradients: 0.55 | optimizer_step: 1.04
[2025-03-12 09:50:15,809] [INFO] [logging.py:128:log_dist] [Rank 0] step=60, skipped=0, lr=[1.8874379209871303e-07, 1.8874379209871303e-07], mom=[(0.9, 0.95), (0.9, 0.95)]
[2025-03-12 09:50:15,810] [INFO] [timer.py:264:stop] epoch=0/micro_step=60/global_step=60, RunningAvgSamplesPerSec=150.722766862357, CurrSamplesPerSec=151.17688735437054, MemAllocated=13.82GB, MaxMemAllocated=45.85GB
[2025-03-12 09:50:15,810] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd_microstep: 7770.25 | bwd_microstep: 19423.63 | bwd_inner_microstep: 18790.14 | bwd_allreduce_microstep: 633.19 | step_microstep: 237.08
[2025-03-12 09:50:15,810] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd: 7770.28 | bwd: 19423.63 | bwd_inner: 18790.20 | bwd_allreduce: 633.18 | step: 237.08
[2025-03-12 09:50:15][I][megatron/training_log:661] iteration= 60/ 1271565 | consumed_samples= 23040 | consumed_tokens= 94371840 | elapsed_time_per_iteration_ms=27535.1 | learning_rate=1.88744e-07 | global_batch_size= 384 | lm loss=10.851992 | loss_scale=1.0 | grad_norm=12.578 | actual_seqlen= 4096 | number_of_skipped_iterations= 0 | number_of_nan_iterations= 0 | samples_per_second=13.946 | tokens_per_gpu_per_second_tgs=2380.086 | [LM]TFLOPs=98.19 | [DS]TFLOPs=94.35 |
(min, max) time across ranks (ms):
forward-backward ...............................: (27276.42, 27277.17)
optimizer ......................................: (236.00, 237.35)