Skip to content

Instantly share code, notes, and snippets.

@saforem2
Last active March 12, 2025 14:54
Show Gist options
  • Select an option

  • Save saforem2/8678c900ad40b79ef6550a8d92f88014 to your computer and use it in GitHub Desktop.

Select an option

Save saforem2/8678c900ad40b79ef6550a8d92f88014 to your computer and use it in GitHub Desktop.

Megatron-DeepSpeed

I've included the full series of commands (and their outputs) from a fresh attempt again this morning (2025-03-12) incase its helpful:

#[08:54:16 AM][x4716c2s4b0n0][/f/d/f/p/a/Megatron-DeepSpeed][🌱 main][✓]
$ source <(curl 'https://raw.githubusercontent.com/saforem2/ezpz/refs/heads/main/src/ezpz/bin/utils.sh') && ezpz_setup_env
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 54998  100 54998    0     0  2944k      0 --:--:-- --:--:-- --:--:-- 2983k
Unable to detect PBS or SLURM working directory info...
Using /lus/flare/projects/datascience/foremans/projects/argonne-lcf/Megatron-DeepSpeed as working directory...
Using WORKING_DIR: /lus/flare/projects/datascience/foremans/projects/argonne-lcf/Megatron-DeepSpeed
No conda_prefix OR virtual_env found in environment...
Setting up conda...

Due to MODULEPATH changes, the following have been reloaded:
  1) hwloc/master-git.1793e43-level-zero     2) mpich/opt/4.3.0rc3

The following have been reloaded with a version change:
  1) oneapi/eng-compiler/2024.07.30.002 => oneapi/release/2024.2.1     2) yaksa/0.3-aw2kkvy => yaksa/0.3-euoqglg

Found conda at: /opt/aurora/24.180.3/frameworks/aurora_nre_models_frameworks-2024.2.1_u1
No VIRTUAL_ENV found in environment!
    - Trying to setup from /opt/aurora/24.180.3/frameworks/aurora_nre_models_frameworks-2024.2.1_u1
    - Using VENV_DIR=/lus/flare/projects/datascience/foremans/projects/argonne-lcf/Megatron-DeepSpeed/venvs/aurora_nre_models_frameworks-2024.2.1_u1
    - Found existing venv, activating from /lus/flare/projects/datascience/foremans/projects/argonne-lcf/Megatron-DeepSpeed/venvs/aurora_nre_models_frameworks-2024.2.1_u1
[python] Using /lus/flare/projects/datascience/foremans/projects/argonne-lcf/Megatron-DeepSpeed/venvs/aurora_nre_models_frameworks-2024.2.1_u1/bin/python3

[🍋 ezpz/bin/utils.sh]
    • USER=foremans
    • MACHINE=aurora
    • HOST=x4716c2s4b0n0
    • TSTAMP=2025-03-12-085419

[ezpz_setup_host_pbs]
    • Using hostfile: /var/spool/pbs/aux/3211413.aurora-pbs-0001.hostmgmt.cm.aurora.alcf.anl.gov
    • Found in environment:
        • HOSTFILE: /var/spool/pbs/aux/3211413.aurora-pbs-0001.hostmgmt.cm.aurora.alcf.anl.gov
        • Writing PBS vars to: /home/foremans/.pbsenv

[ezpz_save_pbs_env]
    • Setting:
        • HOSTFILE: /var/spool/pbs/aux/3211413.aurora-pbs-0001.hostmgmt.cm.aurora.alcf.anl.gov
        • JOBENV_FILE: /home/foremans/.pbsenv

[HOSTS]
    • [host:0] - x4716c2s3b0n0.hostmgmt2716.cm.aurora.alcf.anl.gov
    • [host:1] - x4716c2s4b0n0.hostmgmt2716.cm.aurora.alcf.anl.gov

[DIST INFO]
    • NGPUS=24
    • NHOSTS=2
    • NGPU_PER_HOST=12
    • HOSTFILE=/var/spool/pbs/aux/3211413.aurora-pbs-0001.hostmgmt.cm.aurora.alcf.anl.gov
    • DIST_LAUNCH=mpiexec --verbose --envall -n 24 -ppn 12 --hostfile /var/spool/pbs/aux/3211413.aurora-pbs-0001.hostmgmt.cm.aurora.alcf.anl.gov --cpu-bind depth -d 8 --no-vni

[LAUNCH]:
    • To launch across all available GPUs, use: launch

      launch = mpiexec --verbose --envall -n 24 -ppn 12 --hostfile /var/spool/pbs/aux/3211413.aurora-pbs-0001.hostmgmt.cm.aurora.alcf.anl.gov --cpu-bind depth -d 8 --no-vni

took: 0h:00m:06s
#[🐍 aurora_nre_models_frameworks-2024.2.1_u1](👻 aurora_nre_models_frameworks-2024.2.1_u
#[08:54:24 AM][x4716c2s4b0n0][/f/d/f/p/a/Megatron-DeepSpeed][🌱 main][✓] [⏱️ 6s
$ python3 -m pip install "git+https://github.com/saforem2/ezpz"
Collecting git+https://github.com/saforem2/ezpz
  Cloning https://github.com/saforem2/ezpz to /tmp/pip-req-build-5w2m90yj
  Running command git clone --filter=blob:none --quiet https://github.com/saforem2/ezpz /tmp/pip-req-build-5w2m90yj
  Resolved https://github.com/saforem2/ezpz to commit c45fb19353c9f06575e0ecb12ba7377321bb2f71
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Preparing metadata (pyproject.toml) ... done
Collecting ambivalent@ git+https://github.com/saforem2/ambivalent
  Cloning https://github.com/saforem2/ambivalent to /tmp/pip-install-ruitfkv9/ambivalent_aa683df8399d4aeb915c2d2b0071a645
  Running command git clone --filter=blob:none --quiet https://github.com/saforem2/ambivalent /tmp/pip-install-ruitfkv9/ambivalent_aa683df8399d4aeb915c2d2b0071a645
  Resolved https://github.com/saforem2/ambivalent to commit 9063fda7d139416f141c5259f945c76bf1b85ed3
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Preparing metadata (pyproject.toml) ... done
Requirement already satisfied: ml-dtypes in /home/foremans/.local/aurora/frameworks/2024.2.1_u1/lib/python3.10/site-packages (from ezpz==0.3) (0.5.1)
Requirement already satisfied: sh in /home/foremans/.local/aurora/frameworks/2024.2.1_u1/lib/python3.10/site-packages (from ezpz==0.3) (2.2.1)
Requirement already satisfied: omegaconf in /opt/aurora/24.180.3/frameworks/aurora_nre_models_frameworks-2024.2.1_u1/lib/python3.10/site-packages (from ezpz==0.3) (2.3.0)
Requirement already satisfied: tensorboard in /opt/aurora/24.180.3/frameworks/aurora_nre_models_frameworks-2024.2.1_u1/lib/python3.10/site-packages (from ezpz==0.3) (2.15.2)
Requirement already satisfied: hydra-core in /opt/aurora/24.180.3/frameworks/aurora_nre_models_frameworks-2024.2.1_u1/lib/python3.10/site-packages (from ezpz==0.3) (1.3.2)
Requirement already satisfied: torch in /opt/aurora/24.180.3/frameworks/aurora_nre_models_frameworks-2024.2.1_u1/lib/python3.10/site-packages (from ezpz==0.3) (2.3.1+cxx11.abi)
Requirement already satisfied: tqdm in /opt/aurora/24.180.3/frameworks/aurora_nre_models_frameworks-2024.2.1_u1/lib/python3.10/site-packages (from ezpz==0.3) (4.67.1)
Requirement already satisfied: jax in /home/foremans/.local/aurora/frameworks/2024.2.1_u1/lib/python3.10/site-packages (from ezpz==0.3) (0.5.0)
Requirement already satisfied: h5py in /opt/aurora/24.180.3/frameworks/aurora_nre_models_frameworks-2024.2.1_u1/lib/python3.10/site-packages (from ezpz==0.3) (3.12.1)
Requirement already satisfied: jaxtyping in /home/foremans/.local/aurora/frameworks/2024.2.1_u1/lib/python3.10/site-packages (from ezpz==0.3) (0.2.37)
Requirement already satisfied: jaxlib in /home/foremans/.local/aurora/frameworks/2024.2.1_u1/lib/python3.10/site-packages (from ezpz==0.3) (0.5.0)
Requirement already satisfied: sentencepiece in /home/foremans/.local/aurora/frameworks/2024.2.1_u1/lib/python3.10/site-packages (from ezpz==0.3) (0.2.0)
Requirement already satisfied: mpi4py in /opt/aurora/24.180.3/frameworks/aurora_nre_models_frameworks-2024.2.1_u1/lib/python3.10/site-packages (from ezpz==0.3) (3.1.6)
Requirement already satisfied: joblib in /opt/aurora/24.180.3/frameworks/aurora_nre_models_frameworks-2024.2.1_u1/lib/python3.10/site-packages (from ezpz==0.3) (1.4.2)
Requirement already satisfied: xarray in /home/foremans/.local/aurora/frameworks/2024.2.1_u1/lib/python3.10/site-packages (from ezpz==0.3) (2025.1.2)
Requirement already satisfied: ipython in /opt/aurora/24.180.3/frameworks/aurora_nre_models_frameworks-2024.2.1_u1/lib/python3.10/site-packages (from ezpz==0.3) (8.31.0)
Requirement already satisfied: seaborn in /opt/aurora/24.180.3/frameworks/aurora_nre_models_frameworks-2024.2.1_u1/lib/python3.10/site-packages (from ezpz==0.3) (0.13.2)
Requirement already satisfied: rich in /home/foremans/.local/aurora/frameworks/2024.2.1_u1/lib/python3.10/site-packages (from ezpz==0.3) (13.9.4)
Requirement already satisfied: plotext in /home/foremans/.local/aurora/frameworks/2024.2.1_u1/lib/python3.10/site-packages (from ezpz==0.3) (5.3.2)
Requirement already satisfied: pyinstrument in /home/foremans/.local/aurora/frameworks/2024.2.1_u1/lib/python3.10/site-packages (from ezpz==0.3) (5.0.1)
Requirement already satisfied: hydra-colorlog in /home/foremans/.local/aurora/frameworks/2024.2.1_u1/lib/python3.10/site-packages (from ezpz==0.3) (1.2.0)
Requirement already satisfied: wandb in /home/foremans/.local/aurora/frameworks/2024.2.1_u1/lib/python3.10/site-packages (from ezpz==0.3) (0.19.6)
Requirement already satisfied: matplotlib in /opt/aurora/24.180.3/frameworks/aurora_nre_models_frameworks-2024.2.1_u1/lib/python3.10/site-packages (from ambivalent@ git+https://github.com/saforem2/ambivalent->ezpz==0.3) (3.5.3)
Requirement already satisfied: requests in /opt/aurora/24.180.3/frameworks/aurora_nre_models_frameworks-2024.2.1_u1/lib/python3.10/site-packages (from ambivalent@ git+https://github.com/saforem2/ambivalent->ezpz==0.3) (2.32.3)
Requirement already satisfied: colormaps in /home/foremans/.local/aurora/frameworks/2024.2.1_u1/lib/python3.10/site-packages (from ambivalent@ git+https://github.com/saforem2/ambivalent->ezpz==0.3) (0.4.2)
Requirement already satisfied: numpy>=1.19.3 in /opt/aurora/24.180.3/frameworks/aurora_nre_models_frameworks-2024.2.1_u1/lib/python3.10/site-packages (from h5py->ezpz==0.3) (1.26.4)
Requirement already satisfied: colorlog in /home/foremans/.local/aurora/frameworks/2024.2.1_u1/lib/python3.10/site-packages (from hydra-colorlog->ezpz==0.3) (6.9.0)
Requirement already satisfied: packaging in /opt/aurora/24.180.3/frameworks/aurora_nre_models_frameworks-2024.2.1_u1/lib/python3.10/site-packages (from hydra-core->ezpz==0.3) (24.0)
Requirement already satisfied: antlr4-python3-runtime==4.9.* in /opt/aurora/24.180.3/frameworks/aurora_nre_models_frameworks-2024.2.1_u1/lib/python3.10/site-packages (from hydra-core->ezpz==0.3) (4.9.3)
Requirement already satisfied: PyYAML>=5.1.0 in /opt/aurora/24.180.3/frameworks/aurora_nre_models_frameworks-2024.2.1_u1/lib/python3.10/site-packages (from omegaconf->ezpz==0.3) (6.0.2)
Requirement already satisfied: matplotlib-inline in /opt/aurora/24.180.3/frameworks/aurora_nre_models_frameworks-2024.2.1_u1/lib/python3.10/site-packages (from ipython->ezpz==0.3) (0.1.7)
Requirement already satisfied: pygments>=2.4.0 in /opt/aurora/24.180.3/frameworks/aurora_nre_models_frameworks-2024.2.1_u1/lib/python3.10/site-packages (from ipython->ezpz==0.3) (2.19.1)
Requirement already satisfied: typing_extensions>=4.6 in /opt/aurora/24.180.3/frameworks/aurora_nre_models_frameworks-2024.2.1_u1/lib/python3.10/site-packages (from ipython->ezpz==0.3) (4.12.2)
Requirement already satisfied: exceptiongroup in /opt/aurora/24.180.3/frameworks/aurora_nre_models_frameworks-2024.2.1_u1/lib/python3.10/site-packages (from ipython->ezpz==0.3) (1.2.2)
Requirement already satisfied: prompt_toolkit<3.1.0,>=3.0.41 in /opt/aurora/24.180.3/frameworks/aurora_nre_models_frameworks-2024.2.1_u1/lib/python3.10/site-packages (from ipython->ezpz==0.3) (3.0.50)
Requirement already satisfied: decorator in /opt/aurora/24.180.3/frameworks/aurora_nre_models_frameworks-2024.2.1_u1/lib/python3.10/site-packages (from ipython->ezpz==0.3) (5.1.1)
Requirement already satisfied: stack_data in /opt/aurora/24.180.3/frameworks/aurora_nre_models_frameworks-2024.2.1_u1/lib/python3.10/site-packages (from ipython->ezpz==0.3) (0.6.3)
Requirement already satisfied: jedi>=0.16 in /opt/aurora/24.180.3/frameworks/aurora_nre_models_frameworks-2024.2.1_u1/lib/python3.10/site-packages (from ipython->ezpz==0.3) (0.19.2)
Requirement already satisfied: pexpect>4.3 in /opt/aurora/24.180.3/frameworks/aurora_nre_models_frameworks-2024.2.1_u1/lib/python3.10/site-packages (from ipython->ezpz==0.3) (4.9.0)
Requirement already satisfied: traitlets>=5.13.0 in /opt/aurora/24.180.3/frameworks/aurora_nre_models_frameworks-2024.2.1_u1/lib/python3.10/site-packages (from ipython->ezpz==0.3) (5.14.3)
Requirement already satisfied: opt_einsum in /opt/aurora/24.180.3/frameworks/aurora_nre_models_frameworks-2024.2.1_u1/lib/python3.10/site-packages (from jax->ezpz==0.3) (3.4.0)
Requirement already satisfied: scipy>=1.11.1 in /opt/aurora/24.180.3/frameworks/aurora_nre_models_frameworks-2024.2.1_u1/lib/python3.10/site-packages (from jax->ezpz==0.3) (1.12.0)
Requirement already satisfied: wadler-lindig>=0.1.3 in /home/foremans/.local/aurora/frameworks/2024.2.1_u1/lib/python3.10/site-packages (from jaxtyping->ezpz==0.3) (0.1.3)
Requirement already satisfied: markdown-it-py>=2.2.0 in /home/foremans/.local/aurora/frameworks/2024.2.1_u1/lib/python3.10/site-packages (from rich->ezpz==0.3) (3.0.0)
Requirement already satisfied: pandas>=1.2 in /home/foremans/.local/aurora/frameworks/2024.2.1_u1/lib/python3.10/site-packages (from seaborn->ezpz==0.3) (2.2.3)
Requirement already satisfied: absl-py>=0.4 in /opt/aurora/24.180.3/frameworks/aurora_nre_models_frameworks-2024.2.1_u1/lib/python3.10/site-packages (from tensorboard->ezpz==0.3) (2.1.0)
Requirement already satisfied: setuptools>=41.0.0 in ./venvs/aurora_nre_models_frameworks-2024.2.1_u1/lib/python3.10/site-packages (from tensorboard->ezpz==0.3) (65.5.0)
Requirement already satisfied: werkzeug>=1.0.1 in /opt/aurora/24.180.3/frameworks/aurora_nre_models_frameworks-2024.2.1_u1/lib/python3.10/site-packages (from tensorboard->ezpz==0.3) (3.1.3)
Requirement already satisfied: google-auth-oauthlib<2,>=0.5 in /opt/aurora/24.180.3/frameworks/aurora_nre_models_frameworks-2024.2.1_u1/lib/python3.10/site-packages (from tensorboard->ezpz==0.3) (1.2.1)
Requirement already satisfied: google-auth<3,>=1.6.3 in /opt/aurora/24.180.3/frameworks/aurora_nre_models_frameworks-2024.2.1_u1/lib/python3.10/site-packages (from tensorboard->ezpz==0.3) (2.37.0)
Requirement already satisfied: markdown>=2.6.8 in /opt/aurora/24.180.3/frameworks/aurora_nre_models_frameworks-2024.2.1_u1/lib/python3.10/site-packages (from tensorboard->ezpz==0.3) (3.7)
Requirement already satisfied: tensorboard-data-server<0.8.0,>=0.7.0 in /opt/aurora/24.180.3/frameworks/aurora_nre_models_frameworks-2024.2.1_u1/lib/python3.10/site-packages (from tensorboard->ezpz==0.3) (0.7.2)
Requirement already satisfied: protobuf!=4.24.0,>=3.19.6 in /opt/aurora/24.180.3/frameworks/aurora_nre_models_frameworks-2024.2.1_u1/lib/python3.10/site-packages (from tensorboard->ezpz==0.3) (4.25.5)
Requirement already satisfied: grpcio>=1.48.2 in /opt/aurora/24.180.3/frameworks/aurora_nre_models_frameworks-2024.2.1_u1/lib/python3.10/site-packages (from tensorboard->ezpz==0.3) (1.69.0)
Requirement already satisfied: six>1.9 in /opt/aurora/24.180.3/frameworks/aurora_nre_models_frameworks-2024.2.1_u1/lib/python3.10/site-packages (from tensorboard->ezpz==0.3) (1.16.0)
Requirement already satisfied: filelock in /opt/aurora/24.180.3/frameworks/aurora_nre_models_frameworks-2024.2.1_u1/lib/python3.10/site-packages (from torch->ezpz==0.3) (3.16.1)
Requirement already satisfied: networkx in /opt/aurora/24.180.3/frameworks/aurora_nre_models_frameworks-2024.2.1_u1/lib/python3.10/site-packages (from torch->ezpz==0.3) (3.4.2)
Requirement already satisfied: sympy in /opt/aurora/24.180.3/frameworks/aurora_nre_models_frameworks-2024.2.1_u1/lib/python3.10/site-packages (from torch->ezpz==0.3) (1.13.3)
Requirement already satisfied: jinja2 in /opt/aurora/24.180.3/frameworks/aurora_nre_models_frameworks-2024.2.1_u1/lib/python3.10/site-packages (from torch->ezpz==0.3) (3.1.5)
Requirement already satisfied: fsspec in /opt/aurora/24.180.3/frameworks/aurora_nre_models_frameworks-2024.2.1_u1/lib/python3.10/site-packages (from torch->ezpz==0.3) (2024.12.0)
Requirement already satisfied: platformdirs in /opt/aurora/24.180.3/frameworks/aurora_nre_models_frameworks-2024.2.1_u1/lib/python3.10/site-packages (from wandb->ezpz==0.3) (4.2.2)
Requirement already satisfied: gitpython!=3.1.29,>=1.0.0 in /opt/aurora/24.180.3/frameworks/aurora_nre_models_frameworks-2024.2.1_u1/lib/python3.10/site-packages (from wandb->ezpz==0.3) (3.1.44)
Requirement already satisfied: click!=8.0.0,>=7.1 in /opt/aurora/24.180.3/frameworks/aurora_nre_models_frameworks-2024.2.1_u1/lib/python3.10/site-packages (from wandb->ezpz==0.3) (8.1.8)
Requirement already satisfied: setproctitle in /home/foremans/.local/aurora/frameworks/2024.2.1_u1/lib/python3.10/site-packages (from wandb->ezpz==0.3) (1.3.4)
Requirement already satisfied: psutil>=5.0.0 in /opt/aurora/24.180.3/frameworks/aurora_nre_models_frameworks-2024.2.1_u1/lib/python3.10/site-packages (from wandb->ezpz==0.3) (6.1.1)
Requirement already satisfied: docker-pycreds>=0.4.0 in /home/foremans/.local/aurora/frameworks/2024.2.1_u1/lib/python3.10/site-packages (from wandb->ezpz==0.3) (0.4.0)
Requirement already satisfied: pydantic<3,>=2.6 in /opt/aurora/24.180.3/frameworks/aurora_nre_models_frameworks-2024.2.1_u1/lib/python3.10/site-packages (from wandb->ezpz==0.3) (2.10.5)
Requirement already satisfied: sentry-sdk>=2.0.0 in /home/foremans/.local/aurora/frameworks/2024.2.1_u1/lib/python3.10/site-packages (from wandb->ezpz==0.3) (2.20.0)
Requirement already satisfied: gitdb<5,>=4.0.1 in /opt/aurora/24.180.3/frameworks/aurora_nre_models_frameworks-2024.2.1_u1/lib/python3.10/site-packages (from gitpython!=3.1.29,>=1.0.0->wandb->ezpz==0.3) (4.0.12)
Requirement already satisfied: pyasn1-modules>=0.2.1 in /opt/aurora/24.180.3/frameworks/aurora_nre_models_frameworks-2024.2.1_u1/lib/python3.10/site-packages (from google-auth<3,>=1.6.3->tensorboard->ezpz==0.3) (0.4.1)
Requirement already satisfied: rsa<5,>=3.1.4 in /opt/aurora/24.180.3/frameworks/aurora_nre_models_frameworks-2024.2.1_u1/lib/python3.10/site-packages (from google-auth<3,>=1.6.3->tensorboard->ezpz==0.3) (4.9)
Requirement already satisfied: cachetools<6.0,>=2.0.0 in /opt/aurora/24.180.3/frameworks/aurora_nre_models_frameworks-2024.2.1_u1/lib/python3.10/site-packages (from google-auth<3,>=1.6.3->tensorboard->ezpz==0.3) (5.5.0)
Requirement already satisfied: requests-oauthlib>=0.7.0 in /opt/aurora/24.180.3/frameworks/aurora_nre_models_frameworks-2024.2.1_u1/lib/python3.10/site-packages (from google-auth-oauthlib<2,>=0.5->tensorboard->ezpz==0.3) (2.0.0)
Requirement already satisfied: parso<0.9.0,>=0.8.4 in /opt/aurora/24.180.3/frameworks/aurora_nre_models_frameworks-2024.2.1_u1/lib/python3.10/site-packages (from jedi>=0.16->ipython->ezpz==0.3) (0.8.4)
Requirement already satisfied: mdurl~=0.1 in /home/foremans/.local/aurora/frameworks/2024.2.1_u1/lib/python3.10/site-packages (from markdown-it-py>=2.2.0->rich->ezpz==0.3) (0.1.2)
Requirement already satisfied: kiwisolver>=1.0.1 in /opt/aurora/24.180.3/frameworks/aurora_nre_models_frameworks-2024.2.1_u1/lib/python3.10/site-packages (from matplotlib->ambivalent@ git+https://github.com/saforem2/ambivalent->ezpz==0.3) (1.4.8)
Requirement already satisfied: cycler>=0.10 in /opt/aurora/24.180.3/frameworks/aurora_nre_models_frameworks-2024.2.1_u1/lib/python3.10/site-packages (from matplotlib->ambivalent@ git+https://github.com/saforem2/ambivalent->ezpz==0.3) (0.12.1)
Requirement already satisfied: pyparsing>=2.2.1 in /opt/aurora/24.180.3/frameworks/aurora_nre_models_frameworks-2024.2.1_u1/lib/python3.10/site-packages (from matplotlib->ambivalent@ git+https://github.com/saforem2/ambivalent->ezpz==0.3) (3.2.1)
Requirement already satisfied: pillow>=6.2.0 in /opt/aurora/24.180.3/frameworks/aurora_nre_models_frameworks-2024.2.1_u1/lib/python3.10/site-packages (from matplotlib->ambivalent@ git+https://github.com/saforem2/ambivalent->ezpz==0.3) (11.1.0)
Requirement already satisfied: python-dateutil>=2.7 in /opt/aurora/24.180.3/frameworks/aurora_nre_models_frameworks-2024.2.1_u1/lib/python3.10/site-packages (from matplotlib->ambivalent@ git+https://github.com/saforem2/ambivalent->ezpz==0.3) (2.9.0)
Requirement already satisfied: fonttools>=4.22.0 in /opt/aurora/24.180.3/frameworks/aurora_nre_models_frameworks-2024.2.1_u1/lib/python3.10/site-packages (from matplotlib->ambivalent@ git+https://github.com/saforem2/ambivalent->ezpz==0.3) (4.55.4)
Requirement already satisfied: tzdata>=2022.7 in /home/foremans/.local/aurora/frameworks/2024.2.1_u1/lib/python3.10/site-packages (from pandas>=1.2->seaborn->ezpz==0.3) (2025.1)
Requirement already satisfied: pytz>=2020.1 in /opt/aurora/24.180.3/frameworks/aurora_nre_models_frameworks-2024.2.1_u1/lib/python3.10/site-packages (from pandas>=1.2->seaborn->ezpz==0.3) (2024.1)
Requirement already satisfied: ptyprocess>=0.5 in /opt/aurora/24.180.3/frameworks/aurora_nre_models_frameworks-2024.2.1_u1/lib/python3.10/site-packages (from pexpect>4.3->ipython->ezpz==0.3) (0.7.0)
Requirement already satisfied: wcwidth in /opt/aurora/24.180.3/frameworks/aurora_nre_models_frameworks-2024.2.1_u1/lib/python3.10/site-packages (from prompt_toolkit<3.1.0,>=3.0.41->ipython->ezpz==0.3) (0.2.13)
Requirement already satisfied: pydantic-core==2.27.2 in /opt/aurora/24.180.3/frameworks/aurora_nre_models_frameworks-2024.2.1_u1/lib/python3.10/site-packages (from pydantic<3,>=2.6->wandb->ezpz==0.3) (2.27.2)
Requirement already satisfied: annotated-types>=0.6.0 in /opt/aurora/24.180.3/frameworks/aurora_nre_models_frameworks-2024.2.1_u1/lib/python3.10/site-packages (from pydantic<3,>=2.6->wandb->ezpz==0.3) (0.7.0)
Requirement already satisfied: idna<4,>=2.5 in /opt/aurora/24.180.3/frameworks/aurora_nre_models_frameworks-2024.2.1_u1/lib/python3.10/site-packages (from requests->ambivalent@ git+https://github.com/saforem2/ambivalent->ezpz==0.3) (3.7)
Requirement already satisfied: urllib3<3,>=1.21.1 in /opt/aurora/24.180.3/frameworks/aurora_nre_models_frameworks-2024.2.1_u1/lib/python3.10/site-packages (from requests->ambivalent@ git+https://github.com/saforem2/ambivalent->ezpz==0.3) (2.2.1)
Requirement already satisfied: certifi>=2017.4.17 in /opt/aurora/24.180.3/frameworks/aurora_nre_models_frameworks-2024.2.1_u1/lib/python3.10/site-packages (from requests->ambivalent@ git+https://github.com/saforem2/ambivalent->ezpz==0.3) (2024.12.14)
Requirement already satisfied: charset-normalizer<4,>=2 in /opt/aurora/24.180.3/frameworks/aurora_nre_models_frameworks-2024.2.1_u1/lib/python3.10/site-packages (from requests->ambivalent@ git+https://github.com/saforem2/ambivalent->ezpz==0.3) (3.3.2)
Requirement already satisfied: MarkupSafe>=2.1.1 in /opt/aurora/24.180.3/frameworks/aurora_nre_models_frameworks-2024.2.1_u1/lib/python3.10/site-packages (from werkzeug>=1.0.1->tensorboard->ezpz==0.3) (3.0.2)
Requirement already satisfied: pure-eval in /opt/aurora/24.180.3/frameworks/aurora_nre_models_frameworks-2024.2.1_u1/lib/python3.10/site-packages (from stack_data->ipython->ezpz==0.3) (0.2.3)
Requirement already satisfied: asttokens>=2.1.0 in /opt/aurora/24.180.3/frameworks/aurora_nre_models_frameworks-2024.2.1_u1/lib/python3.10/site-packages (from stack_data->ipython->ezpz==0.3) (3.0.0)
Requirement already satisfied: executing>=1.2.0 in /opt/aurora/24.180.3/frameworks/aurora_nre_models_frameworks-2024.2.1_u1/lib/python3.10/site-packages (from stack_data->ipython->ezpz==0.3) (2.1.0)
Requirement already satisfied: mpmath<1.4,>=1.1.0 in /opt/aurora/24.180.3/frameworks/aurora_nre_models_frameworks-2024.2.1_u1/lib/python3.10/site-packages (from sympy->torch->ezpz==0.3) (1.3.0)
Requirement already satisfied: smmap<6,>=3.0.1 in /opt/aurora/24.180.3/frameworks/aurora_nre_models_frameworks-2024.2.1_u1/lib/python3.10/site-packages (from gitdb<5,>=4.0.1->gitpython!=3.1.29,>=1.0.0->wandb->ezpz==0.3) (5.0.2)
Requirement already satisfied: pyasn1<0.7.0,>=0.4.6 in /opt/aurora/24.180.3/frameworks/aurora_nre_models_frameworks-2024.2.1_u1/lib/python3.10/site-packages (from pyasn1-modules>=0.2.1->google-auth<3,>=1.6.3->tensorboard->ezpz==0.3) (0.6.1)
Requirement already satisfied: oauthlib>=3.0.0 in /opt/aurora/24.180.3/frameworks/aurora_nre_models_frameworks-2024.2.1_u1/lib/python3.10/site-packages (from requests-oauthlib>=0.7.0->google-auth-oauthlib<2,>=0.5->tensorboard->ezpz==0.3) (3.2.2)

[notice] A new release of pip is available: 23.0.1 -> 25.0.1
[notice] To update, run: pip install --upgrade pip
took: 0h:01m:18s
#[🐍 aurora_nre_models_frameworks-2024.2.1_u1](👻 aurora_nre_models_frameworks-2024.2.1_u
#[09:01:28 AM][x4716c2s4b0n0][/f/d/f/p/a/Megatron-DeepSpeed][🌱 main][✓]
$ PBS_O_WORKDIR=$(pwd) bash train_aGPT_7B.sh                                                                           
Using WORKING_DIR: /flare/datascience/foremans/projects/argonne-lcf/Megatron-DeepSpeed
Running on: aurora
Found ezpz in /flare/datascience/foremans/projects/argonne-lcf/Megatron-DeepSpeed/deps/ezpz
Using WORKING_DIR: /flare/datascience/foremans/projects/argonne-lcf/Megatron-DeepSpeed
Using virtual_env: /lus/flare/projects/datascience/foremans/projects/argonne-lcf/Megatron-DeepSpeed/venvs/aurora_nre_models_frameworks-2024.2.1_u1 on top of conda from: /opt/aurora/24.180.3/frameworks/aurora_nre_models_frameworks-2024.2.1_u1
[python] Using /lus/flare/projects/datascience/foremans/projects/argonne-lcf/Megatron-DeepSpeed/venvs/aurora_nre_models_frameworks-2024.2.1_u1/bin/python3

[🍋 ezpz/bin/utils.sh
    • USER=foremans
    • MACHINE=aurora
    • HOST=x4716c2s4b0n0
    • TSTAMP=2025-03-12-090157

[ezpz_setup_host_pbs]
    • Using hostfile: /var/spool/pbs/aux/3211413.aurora-pbs-0001.hostmgmt.cm.aurora.alcf.anl.gov
    • Found in environment:
        • HOSTFILE: /var/spool/pbs/aux/3211413.aurora-pbs-0001.hostmgmt.cm.aurora.alcf.anl.gov
        • Writing PBS vars to: /home/foremans/.pbsenv

[ezpz_save_pbs_env]
    • Setting:
        • HOSTFILE: /var/spool/pbs/aux/3211413.aurora-pbs-0001.hostmgmt.cm.aurora.alcf.anl.gov
        • JOBENV_FILE: /home/foremans/.pbsenv

[HOSTS]
    • [host:0] - x4716c2s3b0n0.hostmgmt2716.cm.aurora.alcf.anl.gov
    • [host:1] - x4716c2s4b0n0.hostmgmt2716.cm.aurora.alcf.anl.gov

[DIST INFO]
    • NGPUS=24
    • NHOSTS=2
    • NGPU_PER_HOST=12
    • HOSTFILE=/var/spool/pbs/aux/3211413.aurora-pbs-0001.hostmgmt.cm.aurora.alcf.anl.gov
    • DIST_LAUNCH=mpiexec --verbose --envall -n 24 -ppn 12 --hostfile /var/spool/pbs/aux/3211413.aurora-pbs-0001.hostmgmt.cm.aurora.alcf.anl.gov --cpu-bind depth -d 8 --no-vni

[LAUNCH]:
    • To launch across all available GPUs, use: launch

      launch = mpiexec --verbose --envall -n 24 -ppn 12 --hostfile /var/spool/pbs/aux/3211413.aurora-pbs-0001.hostmgmt.cm.aurora.alcf.anl.gov --cpu-bind depth -d 8 --no-vni


[notice] A new release of pip is available: 23.0.1 -> 25.0.1
[notice] To update, run: pip install --upgrade pip
[ezpz_install] Found ezpz @ 0.3
[install_dependencies] Ensuring all dependencies from /flare/datascience/foremans/projects/argonne-lcf/Megatron-DeepSpeed/ALCF/requirements/requirements.txt installed...

[notice] A new release of pip is available: 23.0.1 -> 25.0.1
[notice] To update, run: pip install --upgrade pip
[install_dependencies] No 'deepspeed' command found on aurora[install_dependencies] !! No deepsepeed in /lus/flare/projects/datascience/foremans/projects/argonne-lcf/Megatron-DeepSpeed/venvs/aurora_nre_models_frameworks-2024.2.1_u1/bin/python3[setParams] Using GRAD_ACC_STEPS: 16
TRAIN_TOKENS=2000000000000 (=2000B tokens)
TRAIN_ITERS=1271565
DS_CONFIG: /flare/datascience/foremans/projects/argonne-lcf/Megatron-DeepSpeed/ds-configs/ds_stage1_mb1_gb384_pp1_bf16.json
ZS=1, MB=1, GB=384, PP=1, DTYPE=bf16
{
  "train_batch_size": 384,
  "train_micro_batch_size_per_gpu": 1,
  "gradient_clipping": 1,
  "steps_per_print": 1,
  "gradient_accumulation_steps": 16,
  "zero_force_ds_cpu_optimizer": false,
  "zero_allow_untested_optimizer": true,
  "wall_clock_breakdown": false,
  "zero_optimization": {
    "stage": 1
  },
  "fp16": {
    "enabled": false,
    "loss_scale": 0,
    "loss_scale_window": 1000,
    "hysteresis": 2,
    "min_loss_scale": 1
  },
  "bfloat16": {
    "enabled": true,
    "loss_scale": 1
  },
  "comms_logger": {
    "enabled": false,
    "verbose": false,
    "debug": false
  },
  "flops_profiler": {
    "enabled": true,
    "profile_step": 2,
    "module_depth": -1,
    "top_modules": 1,
    "detailed": true,
    "output_file": null
  }
}
Checkpoints will be saved to: checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash

 Please see logs at: logs/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/20250312-090212_24_x4716c2s4b0n0
Setting up tokenizer with Llama2Tokenizer
Using data_file_list: /flare/datascience/foremans/projects/argonne-lcf/Megatron-DeepSpeed/ALCF/data-lists/aurora/dolma.txt
Using tokenizer: Llama2Tokenizer. Setting up data with 
Calling:  setData() with /flare/datascience/foremans/projects/argonne-lcf/Megatron-DeepSpeed/ALCF/data-lists/aurora/dolma.txt
--------------------
Updated environment:
DATA_FILE_LIST: /flare/datascience/foremans/projects/argonne-lcf/Megatron-DeepSpeed/ALCF/data-lists/aurora/dolma.txt
NUM_DOCS: 2419
 WEIGHT_SUM: 1.0
DFL_STEM: dolma
DATA_CACHE_PATH: .cache/dolma/index-cache
DATA_FLAGS: 
--------------------
[setData] DATA_FLAGS: 
[setData] TOKENIZER_FLAGS: --tokenizer-type Llama2Tokenizer --tokenizer-model /flare/datascience/foremans/projects/argonne-lcf/Megatron-DeepSpeed/ALCF/tokenizer.model
Requirement already satisfied: pybind11 in /home/foremans/.local/aurora/frameworks/2024.2.1_u1/lib/python3.10/site-packages (2.13.6)

[notice] A new release of pip is available: 23.0.1 -> 25.0.1
[notice] To update, run: pip install --upgrade pip
make: Nothing to be done for 'default'.
/flare/datascience/foremans/projects/argonne-lcf/Megatron-DeepSpeed
++++++++++++++++++++++++++++++++++++++++++++++++++
- MPICH_DIR=/opt/aurora/24.180.3/spack/unified/0.8.0/install/linux-sles15-x86_64/oneapi-2024.2.1/mpich-4.3.0rc3-hipyfz6
- Using /lus/flare/projects/datascience/foremans/projects/argonne-lcf/Megatron-DeepSpeed/venvs/aurora_nre_models_frameworks-2024.2.1_u1/bin/python3
- WORLD_SIZE:24
- BACKEND: ccl
- MODEL_TYPE: llama-gb384-seq4096-pp1-tp1-32layers-32heads-4096hidden
- Using DATA_FILE_LIST: /flare/datascience/foremans/projects/argonne-lcf/Megatron-DeepSpeed/ALCF/data-lists/aurora/dolma.txt
++++++++++++++++++++++++++++++++++++++++++++++++++

Currently Loaded Modules:
  1) gcc-runtime/12.2.0-267awrk   5) gcc/12.2.0           9) oneapi/release/2024.2.1              13) yaksa/0.3-euoqglg
  2) gmp/6.2.1-yctcuid            6) libfabric/1.20.1    10) pti-gpu/d3639de                      14) mpich/opt/4.3.0rc3
  3) mpfr/4.2.1-fhgnwe7           7) cray-pals/1.4.0     11) frameworks/2024.2.1_u1
  4) mpc/1.3.1-ygprpb4            8) cray-libpals/1.4.0  12) hwloc/master-git.1793e43-level-zero

 

Saving environment to checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/.env
Not currently running. Continuing!
Launching with: MPICH
 mpiexec --verbose --envall -n 24 -ppn 12 --hostfile /var/spool/pbs/aux/3211413.aurora-pbs-0001.hostmgmt.cm.aurora.alcf.anl.gov --cpu-bind depth -d 8 --no-vni --pmi=pmix --genvall /lus/flare/projects/datascience/foremans/projects/argonne-lcf/Megatron-DeepSpeed/venvs/aurora_nre_models_frameworks-2024.2.1_u1/bin/python3 -Wignore /flare/datascience/foremans/projects/argonne-lcf/Megatron-DeepSpeed/pretrain_gpt_alcf.py
Using data_cache_path: checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/.cache/dolma/index-cache
Training Arguments: 

--accumulate-allreduce-grads-in-fp32
--adam-beta1=0.9
--adam-beta2=0.95
--adam-eps=0.00001
--attention-dropout 0
--bf16
--blend-sample-in-corpus
--clip-grad=1.0
--data-cache-path=checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/.cache/dolma/index-cache
--data-file-list=/flare/datascience/foremans/projects/argonne-lcf/Megatron-DeepSpeed/ALCF/data-lists/aurora/dolma.txt
--deepspeed
--deepspeed_config=/flare/datascience/foremans/projects/argonne-lcf/Megatron-DeepSpeed/ds-configs/ds_stage1_mb1_gb384_pp1_bf16.json
--disable-bias-linear
--distributed-backend=ccl
--ds-sequence-parallel-size=1
--eval-interval=100
--eval-iters=20
--ffn-hidden-size 11008
--global-batch-size=384
--hidden-dropout 0
--hidden-size=4096
--load=checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash
--log-interval=1
--log-optimizer-states-to-tensorboard
--log-timers-to-tensorboard
--lr 0.0002
--lr-decay-style cosine
--lr-warmup-fraction 0.05
--max-position-embeddings=4096
--micro-batch-size=1
--no-bias-dropout-fusion
--no-bias-gelu-fusion
--no-gradient-accumulation-fusion
--no-masked-softmax-fusion
--no-pipeline-parallel
--no-query-key-layer-scaling
--normalization rmsnorm
--num-attention-heads=32
--num-key-value-heads 8
--num-layers=32
--optimizer=adamw
--pipeline-model-parallel-size=1
--save=checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash
--save-interval=50
--seq-length=4096
--shuffle-sample-in-corpus
--split=990,10,0
--swiglu
--tensorboard-dir checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/tensorboard
--tensor-model-parallel-size=1
--timing-log-level=1
--tokenizer-type Llama2Tokenizer --tokenizer-model /flare/datascience/foremans/projects/argonne-lcf/Megatron-DeepSpeed/ALCF/tokenizer.model
--train-iters=1271565
--untie-embeddings-and-output-weights
--use-checkpoint-opt_param-scheduler
--use-flash-attn-builder
--use-rotary-position-embeddings
--weight-decay=0.1
--zero-stage=1
mpiexec --verbose --envall -n 24 -ppn 12 --hostfile /var/spool/pbs/aux/3211413.aurora-pbs-0001.hostmgmt.cm.aurora.alcf.anl.gov --cpu-bind depth -d 8 --no-vni --pmi=pmix --genvall /lus/flare/projects/datascience/foremans/projects/argonne-lcf/Megatron-DeepSpeed/venvs/aurora_nre_models_frameworks-2024.2.1_u1/bin/python3 -Wignore /flare/datascience/foremans/projects/argonne-lcf/Megatron-DeepSpeed/pretrain_gpt_alcf.py --use-checkpoint-opt_param-scheduler --lr 0.0002 --lr-decay-style cosine --lr-warmup-fraction 0.05 --swiglu --hidden-dropout 0 --attention-dropout 0 --normalization rmsnorm --disable-bias-linear --no-query-key-layer-scaling --use-rotary-position-embeddings --untie-embeddings-and-output-weights --num-key-value-heads 8 --ffn-hidden-size 11008 --use-flash-attn-builder   --tokenizer-type Llama2Tokenizer --tokenizer-model /flare/datascience/foremans/projects/argonne-lcf/Megatron-DeepSpeed/ALCF/tokenizer.model --log-timers-to-tensorboard --log-optimizer-states-to-tensorboard --tensorboard-dir checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/tensorboar
d --deepspeed --no-pipeline-parallel --deepspeed_config=/flare/datascience/foremans/projects/argonne-lcf/Megatron-DeepSpeed/ds-configs/ds_stage1_mb1_gb384_pp1_bf16.json --zero-stage=1 --bf16 --shuffle-sample-in-corpus --blend-sample-in-corpus --accumulate-allreduce-grads-in-fp32 --no-bias-gelu-fusion --no-bias-dropout-fusion --no-masked-softmax-fusion --no-gradient-accumulation-fusion --optimizer=adamw --tensor-model-parallel-size=1 --pipeline-model-parallel-size=1 --max-position-embeddings=4096 --micro-batch-size=1 --ds-sequence-parallel-size=1 --global-batch-size=384 --split=990,10,0 --timing-log-level=1 --eval-interval=100 --eval-iters=20 --save-interval=50 --log-interval=1 --save=checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash --load=checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash --seq-length=4096 --num-layers=32 --hidden-size=4096 --train-iters=1271565 --distributed-backend=ccl --weight-decay=0.1 --adam-beta1=0.9 --adam-beta2=0.95 --adam-eps=0.00001 --clip-grad=1.0 --num-attention-heads=32 --data-cac
he-path=checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/.cache/dolma/index-cache --data-file-list=/flare/datascience/foremans/projects/argonne-lcf/Megatron-DeepSpeed/ALCF/data-lists/aurora/dolma.txt
[!! NOTE] View output at:
 logs/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/20250312-090212_24_x4716c2s4b0n0/output.log
Disabling local launch: multi-node application
Connected to tcp://x4716c2s3b0n0.hostmgmt2716.cm.aurora.alcf.anl.gov:7919
Launching application 9419d72d-4156-4bcd-a54e-8cd39acce81a
[2025-03-12 09:02:39,280] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to xpu (auto detect)
[2025-03-12 09:02:39,280] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to xpu (auto detect)
[2025-03-12 09:02:39,285] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to xpu (auto detect)
[2025-03-12 09:02:39,285] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to xpu (auto detect)
[2025-03-12 09:02:39,285] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to xpu (auto detect)
[2025-03-12 09:02:39,289] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to xpu (auto detect)
[2025-03-12 09:02:39,310] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to xpu (auto detect)
[2025-03-12 09:02:39,310] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to xpu (auto detect)
[2025-03-12 09:02:39,311] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to xpu (auto detect)
[2025-03-12 09:02:39,311] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to xpu (auto detect)
[2025-03-12 09:02:39,311] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to xpu (auto detect)
[2025-03-12 09:02:39,311] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to xpu (auto detect)
[2025-03-12 09:02:39,356] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to xpu (auto detect)
[2025-03-12 09:02:39,360] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to xpu (auto detect)
[2025-03-12 09:02:39,370] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to xpu (auto detect)
[2025-03-12 09:02:39,370] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to xpu (auto detect)
[2025-03-12 09:02:39,372] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to xpu (auto detect)
[2025-03-12 09:02:39,372] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to xpu (auto detect)
[2025-03-12 09:02:39,372] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to xpu (auto detect)
[2025-03-12 09:02:39,373] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to xpu (auto detect)
[2025-03-12 09:02:39,374] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to xpu (auto detect)
[2025-03-12 09:02:39,374] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to xpu (auto detect)
[2025-03-12 09:02:39,374] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to xpu (auto detect)
[2025-03-12 09:02:39,374] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to xpu (auto detect)
[2025-03-12 09:02:42,644] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to xpu (auto detect)
[2025-03-12 09:02:42,645] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to xpu (auto detect)
[2025-03-12 09:02:42,645] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to xpu (auto detect)
[2025-03-12 09:02:42,645] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to xpu (auto detect)
[2025-03-12 09:02:42,645] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to xpu (auto detect)
[2025-03-12 09:02:42,645] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to xpu (auto detect)
[2025-03-12 09:02:42,646] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to xpu (auto detect)
[2025-03-12 09:02:42,646] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to xpu (auto detect)
[2025-03-12 09:02:42,646] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to xpu (auto detect)
[2025-03-12 09:02:42,646] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to xpu (auto detect)
[2025-03-12 09:02:42,646] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to xpu (auto detect)
[2025-03-12 09:02:42,646] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to xpu (auto detect)
[2025-03-12 09:02:42,646] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to xpu (auto detect)
[2025-03-12 09:02:42,646] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to xpu (auto detect)
[2025-03-12 09:02:42,651] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to xpu (auto detect)
[2025-03-12 09:02:42,651] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to xpu (auto detect)
[2025-03-12 09:02:42,651] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to xpu (auto detect)
[2025-03-12 09:02:42,651] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to xpu (auto detect)
[2025-03-12 09:02:42,651] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to xpu (auto detect)
[2025-03-12 09:02:42,651] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to xpu (auto detect)
[2025-03-12 09:02:42,651] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to xpu (auto detect)
[2025-03-12 09:02:42,651] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to xpu (auto detect)
[2025-03-12 09:02:42,651] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to xpu (auto detect)
[2025-03-12 09:02:42,651] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to xpu (auto detect)
[2025-03-12 09:12:31,996] [INFO] [comm.py:161:init_deepspeed_backend] Initialize ccl backend
[2025-03-12 09:12:31,996] [INFO] [comm.py:658:init_distributed] cdb=None
[2025-03-12 09:12:31,997] [INFO] [comm.py:673:init_distributed] Not using the DeepSpeed or dist launchers, attempting to detect MPI environment...
[2025-03-12 09:12:31,996] [INFO] [comm.py:161:init_deepspeed_backend] Initialize ccl backend
[2025-03-12 09:12:31,997] [INFO] [comm.py:658:init_distributed] cdb=None
[2025-03-12 09:12:31,997] [INFO] [comm.py:673:init_distributed] Not using the DeepSpeed or dist launchers, attempting to detect MPI environment...
[2025-03-12 09:12:31,996] [INFO] [comm.py:161:init_deepspeed_backend] Initialize ccl backend
[2025-03-12 09:12:31,997] [INFO] [comm.py:658:init_distributed] cdb=None
[2025-03-12 09:12:31,997] [INFO] [comm.py:673:init_distributed] Not using the DeepSpeed or dist launchers, attempting to detect MPI environment...
[2025-03-12 09:12:31,997] [INFO] [comm.py:161:init_deepspeed_backend] Initialize ccl backend
[2025-03-12 09:12:31,997] [INFO] [comm.py:658:init_distributed] cdb=None
[2025-03-12 09:12:31,997] [INFO] [comm.py:673:init_distributed] Not using the DeepSpeed or dist launchers, attempting to detect MPI environment...
[2025-03-12 09:12:31,997] [INFO] [comm.py:161:init_deepspeed_backend] Initialize ccl backend
[2025-03-12 09:12:31,997] [INFO] [comm.py:658:init_distributed] cdb=None
[2025-03-12 09:12:31,997] [INFO] [comm.py:673:init_distributed] Not using the DeepSpeed or dist launchers, attempting to detect MPI environment...
[2025-03-12 09:12:31,997] [INFO] [comm.py:161:init_deepspeed_backend] Initialize ccl backend
[2025-03-12 09:12:31,997] [INFO] [comm.py:658:init_distributed] cdb=None
[2025-03-12 09:12:31,997] [INFO] [comm.py:673:init_distributed] Not using the DeepSpeed or dist launchers, attempting to detect MPI environment...
[2025-03-12 09:12:31,997] [INFO] [comm.py:161:init_deepspeed_backend] Initialize ccl backend
[2025-03-12 09:12:31,997] [INFO] [comm.py:658:init_distributed] cdb=None
[2025-03-12 09:12:31,997] [INFO] [comm.py:673:init_distributed] Not using the DeepSpeed or dist launchers, attempting to detect MPI environment...
[2025-03-12 09:12:31,996] [INFO] [comm.py:161:init_deepspeed_backend] Initialize ccl backend
[2025-03-12 09:12:31,997] [INFO] [comm.py:658:init_distributed] cdb=None
[2025-03-12 09:12:31,997] [INFO] [comm.py:673:init_distributed] Not using the DeepSpeed or dist launchers, attempting to detect MPI environment...
[2025-03-12 09:12:31,996] [INFO] [comm.py:161:init_deepspeed_backend] Initialize ccl backend
[2025-03-12 09:12:31,997] [INFO] [comm.py:658:init_distributed] cdb=None
[2025-03-12 09:12:31,997] [INFO] [comm.py:673:init_distributed] Not using the DeepSpeed or dist launchers, attempting to detect MPI environment...
[2025-03-12 09:12:31,996] [INFO] [comm.py:161:init_deepspeed_backend] Initialize ccl backend
[2025-03-12 09:12:31,997] [INFO] [comm.py:658:init_distributed] cdb=None
[2025-03-12 09:12:31,997] [INFO] [comm.py:673:init_distributed] Not using the DeepSpeed or dist launchers, attempting to detect MPI environment...
[2025-03-12 09:12:31,996] [INFO] [comm.py:161:init_deepspeed_backend] Initialize ccl backend
[2025-03-12 09:12:31,997] [INFO] [comm.py:658:init_distributed] cdb=None
[2025-03-12 09:12:31,997] [INFO] [comm.py:673:init_distributed] Not using the DeepSpeed or dist launchers, attempting to detect MPI environment...
[2025-03-12 09:12:31,997] [INFO] [comm.py:161:init_deepspeed_backend] Initialize ccl backend
[2025-03-12 09:12:31,998] [INFO] [comm.py:658:init_distributed] cdb=None
[2025-03-12 09:12:31,998] [INFO] [comm.py:673:init_distributed] Not using the DeepSpeed or dist launchers, attempting to detect MPI environment...
[2025-03-12 09:12:54,370] [INFO] [comm.py:161:init_deepspeed_backend] Initialize ccl backend
[2025-03-12 09:12:54,370] [INFO] [comm.py:161:init_deepspeed_backend] Initialize ccl backend
[2025-03-12 09:12:54,370] [INFO] [comm.py:658:init_distributed] cdb=None
[2025-03-12 09:12:54,370] [INFO] [comm.py:673:init_distributed] Not using the DeepSpeed or dist launchers, attempting to detect MPI environment...
[2025-03-12 09:12:54,370] [INFO] [comm.py:161:init_deepspeed_backend] Initialize ccl backend
[2025-03-12 09:12:54,370] [INFO] [comm.py:658:init_distributed] cdb=None
[2025-03-12 09:12:54,370] [INFO] [comm.py:673:init_distributed] Not using the DeepSpeed or dist launchers, attempting to detect MPI environment...
[2025-03-12 09:12:54,370] [INFO] [comm.py:161:init_deepspeed_backend] Initialize ccl backend
[2025-03-12 09:12:54,370] [INFO] [comm.py:658:init_distributed] cdb=None
[2025-03-12 09:12:54,370] [INFO] [comm.py:673:init_distributed] Not using the DeepSpeed or dist launchers, attempting to detect MPI environment...
[2025-03-12 09:12:54,370] [INFO] [comm.py:161:init_deepspeed_backend] Initialize ccl backend
[2025-03-12 09:12:54,370] [INFO] [comm.py:658:init_distributed] cdb=None
[2025-03-12 09:12:54,370] [INFO] [comm.py:673:init_distributed] Not using the DeepSpeed or dist launchers, attempting to detect MPI environment...
[2025-03-12 09:12:54,370] [INFO] [comm.py:161:init_deepspeed_backend] Initialize ccl backend
[2025-03-12 09:12:54,370] [INFO] [comm.py:658:init_distributed] cdb=None
[2025-03-12 09:12:54,370] [INFO] [comm.py:673:init_distributed] Not using the DeepSpeed or dist launchers, attempting to detect MPI environment...
[2025-03-12 09:12:54,370] [INFO] [comm.py:658:init_distributed] cdb=None
[2025-03-12 09:12:54,370] [INFO] [comm.py:673:init_distributed] Not using the DeepSpeed or dist launchers, attempting to detect MPI environment...
[2025-03-12 09:12:54,370] [INFO] [comm.py:161:init_deepspeed_backend] Initialize ccl backend
[2025-03-12 09:12:54,370] [INFO] [comm.py:658:init_distributed] cdb=None
[2025-03-12 09:12:54,370] [INFO] [comm.py:673:init_distributed] Not using the DeepSpeed or dist launchers, attempting to detect MPI environment...
[2025-03-12 09:12:54,370] [INFO] [comm.py:161:init_deepspeed_backend] Initialize ccl backend
[2025-03-12 09:12:54,370] [INFO] [comm.py:658:init_distributed] cdb=None
[2025-03-12 09:12:54,370] [INFO] [comm.py:673:init_distributed] Not using the DeepSpeed or dist launchers, attempting to detect MPI environment...
[2025-03-12 09:12:54,370] [INFO] [comm.py:161:init_deepspeed_backend] Initialize ccl backend
[2025-03-12 09:12:54,370] [INFO] [comm.py:658:init_distributed] cdb=None
[2025-03-12 09:12:54,370] [INFO] [comm.py:673:init_distributed] Not using the DeepSpeed or dist launchers, attempting to detect MPI environment...
[2025-03-12 09:12:54,370] [INFO] [comm.py:161:init_deepspeed_backend] Initialize ccl backend
[2025-03-12 09:12:54,370] [INFO] [comm.py:658:init_distributed] cdb=None
[2025-03-12 09:12:54,370] [INFO] [comm.py:673:init_distributed] Not using the DeepSpeed or dist launchers, attempting to detect MPI environment...
[2025-03-12 09:12:54,371] [INFO] [comm.py:161:init_deepspeed_backend] Initialize ccl backend
[2025-03-12 09:12:54,371] [INFO] [comm.py:658:init_distributed] cdb=None
[2025-03-12 09:12:54,371] [INFO] [comm.py:673:init_distributed] Not using the DeepSpeed or dist launchers, attempting to detect MPI environment...
[2025-03-12 09:12:54,370] [INFO] [comm.py:161:init_deepspeed_backend] Initialize ccl backend
[2025-03-12 09:12:54,370] [INFO] [comm.py:658:init_distributed] cdb=None
[2025-03-12 09:12:54,370] [INFO] [comm.py:673:init_distributed] Not using the DeepSpeed or dist launchers, attempting to detect MPI environment...
[2025-03-12 09:12:54,372] [INFO] [comm.py:724:mpi_discovery] Discovered MPI settings of world_rank=19, local_rank=7, world_size=24, master_addr=10.115.81.149, master_port=29500
[2025-03-12 09:12:54,372] [INFO] [comm.py:724:mpi_discovery] Discovered MPI settings of world_rank=13, local_rank=1, world_size=24, master_addr=10.115.81.149, master_port=29500
[2025-03-12 09:12:54,372] [INFO] [comm.py:724:mpi_discovery] Discovered MPI settings of world_rank=14, local_rank=2, world_size=24, master_addr=10.115.81.149, master_port=29500
[2025-03-12 09:12:54,372] [INFO] [comm.py:724:mpi_discovery] Discovered MPI settings of world_rank=15, local_rank=3, world_size=24, master_addr=10.115.81.149, master_port=29500
[2025-03-12 09:12:54,372] [INFO] [comm.py:724:mpi_discovery] Discovered MPI settings of world_rank=16, local_rank=4, world_size=24, master_addr=10.115.81.149, master_port=29500
[2025-03-12 09:12:54,372] [INFO] [comm.py:724:mpi_discovery] Discovered MPI settings of world_rank=17, local_rank=5, world_size=24, master_addr=10.115.81.149, master_port=29500
[2025-03-12 09:12:54,372] [INFO] [comm.py:724:mpi_discovery] Discovered MPI settings of world_rank=10, local_rank=10, world_size=24, master_addr=10.115.81.149, master_port=29500
[2025-03-12 09:12:54,372] [INFO] [comm.py:724:mpi_discovery] Discovered MPI settings of world_rank=18, local_rank=6, world_size=24, master_addr=10.115.81.149, master_port=29500
[2025-03-12 09:12:54,372] [INFO] [comm.py:724:mpi_discovery] Discovered MPI settings of world_rank=21, local_rank=9, world_size=24, master_addr=10.115.81.149, master_port=29500
[2025-03-12 09:12:54,372] [INFO] [comm.py:724:mpi_discovery] Discovered MPI settings of world_rank=22, local_rank=10, world_size=24, master_addr=10.115.81.149, master_port=29500
[2025-03-12 09:12:54,372] [INFO] [comm.py:724:mpi_discovery] Discovered MPI settings of world_rank=23, local_rank=11, world_size=24, master_addr=10.115.81.149, master_port=29500
[2025-03-12 09:12:54,372] [INFO] [comm.py:724:mpi_discovery] Discovered MPI settings of world_rank=12, local_rank=0, world_size=24, master_addr=10.115.81.149, master_port=29500
[2025-03-12 09:12:54,372] [INFO] [comm.py:724:mpi_discovery] Discovered MPI settings of world_rank=20, local_rank=8, world_size=24, master_addr=10.115.81.149, master_port=29500
[2025-03-12 09:12:54,372] [INFO] [comm.py:724:mpi_discovery] Discovered MPI settings of world_rank=0, local_rank=0, world_size=24, master_addr=10.115.81.149, master_port=29500
[2025-03-12 09:12:54,372] [INFO] [comm.py:689:init_distributed] Initializing TorchBackend in DeepSpeed with backend ccl
[2025-03-12 09:12:54,372] [INFO] [comm.py:724:mpi_discovery] Discovered MPI settings of world_rank=1, local_rank=1, world_size=24, master_addr=10.115.81.149, master_port=29500
[2025-03-12 09:12:54,372] [INFO] [comm.py:724:mpi_discovery] Discovered MPI settings of world_rank=2, local_rank=2, world_size=24, master_addr=10.115.81.149, master_port=29500
[2025-03-12 09:12:54,372] [INFO] [comm.py:724:mpi_discovery] Discovered MPI settings of world_rank=3, local_rank=3, world_size=24, master_addr=10.115.81.149, master_port=29500
[2025-03-12 09:12:54,372] [INFO] [comm.py:724:mpi_discovery] Discovered MPI settings of world_rank=4, local_rank=4, world_size=24, master_addr=10.115.81.149, master_port=29500
[2025-03-12 09:12:54,372] [INFO] [comm.py:724:mpi_discovery] Discovered MPI settings of world_rank=5, local_rank=5, world_size=24, master_addr=10.115.81.149, master_port=29500
[2025-03-12 09:12:54,372] [INFO] [comm.py:724:mpi_discovery] Discovered MPI settings of world_rank=6, local_rank=6, world_size=24, master_addr=10.115.81.149, master_port=29500
[2025-03-12 09:12:54,372] [INFO] [comm.py:724:mpi_discovery] Discovered MPI settings of world_rank=7, local_rank=7, world_size=24, master_addr=10.115.81.149, master_port=29500
[2025-03-12 09:12:54,372] [INFO] [comm.py:724:mpi_discovery] Discovered MPI settings of world_rank=8, local_rank=8, world_size=24, master_addr=10.115.81.149, master_port=29500
[2025-03-12 09:12:54,372] [INFO] [comm.py:724:mpi_discovery] Discovered MPI settings of world_rank=9, local_rank=9, world_size=24, master_addr=10.115.81.149, master_port=29500
[2025-03-12 09:12:54,372] [INFO] [comm.py:724:mpi_discovery] Discovered MPI settings of world_rank=11, local_rank=11, world_size=24, master_addr=10.115.81.149, master_port=29500
[2025-03-12 09:12:54][I][ezpz/dist:895] ['x4716c2s3b0n0'][10/23] 
[2025-03-12 09:12:54][I][ezpz/dist:895] ['x4716c2s4b0n0'][20/23] 
[2025-03-12 09:12:54][I][ezpz/dist:895] ['x4716c2s3b0n0'][ 5/23] 
[2025-03-12 09:12:54][I][ezpz/dist:895] ['x4716c2s4b0n0'][15/23] 
[2025-03-12 09:12:54][I][ezpz/dist:895] ['x4716c2s4b0n0'][22/23] 
[2025-03-12 09:12:54][I][ezpz/dist:895] ['x4716c2s3b0n0'][ 1/23] 
[2025-03-12 09:12:54][I][ezpz/dist:895] ['x4716c2s3b0n0'][ 7/23] 
[2025-03-12 09:12:54][I][ezpz/dist:895] ['x4716c2s3b0n0'][11/23] 
[2025-03-12 09:12:54][I][ezpz/dist:895] ['x4716c2s4b0n0'][14/23] 
[2025-03-12 09:12:54][I][ezpz/dist:895] ['x4716c2s3b0n0'][ 9/23] 
[2025-03-12 09:12:54][I][ezpz/dist:895] ['x4716c2s3b0n0'][ 4/23] 
[2025-03-12 09:12:54][I][ezpz/dist:895] ['x4716c2s4b0n0'][23/23] 
[2025-03-12 09:12:54][I][ezpz/dist:895] ['x4716c2s4b0n0'][19/23] 
[2025-03-12 09:12:54][I][ezpz/dist:895] ['x4716c2s3b0n0'][ 6/23] 
[2025-03-12 09:12:54][I][ezpz/dist:895] ['x4716c2s4b0n0'][21/23] 
[2025-03-12 09:12:54][I][ezpz/dist:895] ['x4716c2s4b0n0'][12/23] 
[2025-03-12 09:12:54][I][ezpz/dist:895] ['x4716c2s4b0n0'][13/23] 
[2025-03-12 09:12:54][I][ezpz/dist:895] ['x4716c2s4b0n0'][16/23] 
[2025-03-12 09:12:54][I][ezpz/dist:895] ['x4716c2s4b0n0'][17/23] 
[2025-03-12 09:12:54][I][ezpz/dist:895] ['x4716c2s4b0n0'][18/23] 
[2025-03-12 09:12:54][I][ezpz/dist:895] ['x4716c2s3b0n0'][ 2/23] 
[2025-03-12 09:12:54][I][ezpz/dist:895] ['x4716c2s3b0n0'][ 8/23] 
[2025-03-12 09:12:54][I][ezpz/dist:895] ['x4716c2s3b0n0'][ 3/23] 
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [OKAY]
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
deepspeed_not_implemented  [NO] ....... [OKAY]
 [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.
 [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [NO] ....... [NO]
cpu_adagrad ............ [NO] ....... [OKAY]
cpu_adam ............... [NO] ....... [OKAY]
flash_attn ............. [NO] ....... [OKAY]
fused_adam ............. [NO] ....... [OKAY]
transformer_inference .. [NO] ....... [OKAY]
pack_bits .............. [NO] ....... [OKAY]
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/opt/aurora/24.180.3/frameworks/aurora_nre_models_frameworks-2024.2.1_u1/lib/python3.10/site-packages/torch']
torch version .................... 2.3.1+cxx11.abi
deepspeed install path ........... ['/lus/flare/projects/Aurora_deployment/foremans/projects/deepspeedai/Megatron-DeepSpeed/deps/DeepSpeed/deepspeed']
deepspeed info ................... 0.16.4+9f1ac32c, 9f1ac32c, saforem2/ucp-bug
deepspeed wheel compiled w. ...... torch 2.3 
shared memory (/dev/shm) size .... 503.18 GB
[2025-03-12 09:12:54][I][ezpz/configs:286] **** Git info for DeepSpeed: git_hash=8098a708 git_branch=main ****
[2025-03-12 09:12:54][I][ezpz/dist:845] Using device='xpu' with backend='deepspeed' + 'ccl' for distributed training.
[2025-03-12 09:12:54][I][ezpz/dist:895] ['x4716c2s3b0n0'][ 0/23] 
[2025-03-12 09:12:54][I][Megatron-DeepSpeed/pretrain_gpt_alcf:69:__main__] Import python modules in 603.5460715293884 seconds
[2025-03-12 09:12:54][I][Megatron-DeepSpeed/pretrain_gpt_alcf:70:__main__] ez.setup_torch time: 22.705841779708862 seconds
[2025-03-12 09:12:54][I][Megatron-DeepSpeed/pretrain_gpt_alcf:80:__main__] Setting up W&B from: 0 with AuroraGPT
[2025-03-12 09:12:54][I][ezpz/dist:1071] Setting up wandb from rank=0
[2025-03-12 09:12:54][I][ezpz/dist:1072] Using=WB PROJECT=AuroraGPT
wandb: Currently logged in as: foremans (aurora_gpt) to https://api.wandb.ai. Use `wandb login --relogin` to force relogin
wandb: Using wandb-core as the SDK backend.  Please refer to https://wandb.me/wandb-core for more information.
2025-03-12 09:12:56.509920: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2025-03-12 09:12:56.509946: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2025-03-12 09:12:56.578118: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
wandb: Tracking run with wandb version 0.19.6
wandb: Run data is saved locally in /lus/flare/projects/datascience/foremans/projects/argonne-lcf/Megatron-DeepSpeed/wandb/run-20250312_091254-by2ozrz3
wandb: Run `wandb offline` to turn off syncing.
wandb: Syncing run misty-deluge-1402
wandb: ⭐️ View project at https://wandb.ai/aurora_gpt/AuroraGPT
wandb: 🚀 View run at https://wandb.ai/aurora_gpt/AuroraGPT/runs/by2ozrz
2025-03-12 09:12:57.924088: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
[2025-03-12 09:12:58][I][ezpz/dist:1097] W&B RUN=[misty-deluge-1402](https://wandb.ai/aurora_gpt/AuroraGPT/runs/by2ozrz3)
[2025-03-12 09:12:58][I][ezpz/dist:301] Updating wandb.run: misty-deluge-1402 config with "DIST_INFO"
[2025-03-12 09:12:58][I][ezpz/dist:1142] Running on machine='Aurora'
using world size: 24, data-parallel-size: 24, sequence-parallel size: 1, tensor-model-parallel size: 1, pipeline-model-parallel size: 1 
using torch.bfloat16 for parameters ...
------------------------ arguments ------------------------
  accumulate_allreduce_grads_in_fp32 .............. True
  adam_beta1 ...................................... 0.9
  adam_beta2 ...................................... 0.95
  adam_eps ........................................ 1e-05
  add_bias_linear ................................. False
  add_position_embedding .......................... False
  adlr_autoresume ................................. False
  adlr_autoresume_interval ........................ 1000
  aml_data_download_path .......................... None
  apply_layernorm_1p .............................. False
  apply_query_key_layer_scaling ................... False
  apply_residual_connection_post_layernorm ........ False
  async_tensor_model_parallel_allreduce ........... False
  attention_dropout ............................... 0.0
  attention_softmax_in_fp32 ....................... False
  barrier_with_L1_time ............................ True
  bert_binary_head ................................ True
  bert_embedder_type .............................. megatron
  bert_load ....................................... None
  bf16 ............................................ True
  bias_dropout_fusion ............................. False
  bias_gelu_fusion ................................ False
  biencoder_projection_dim ........................ 0
  biencoder_shared_query_context_model ............ False
  blend_sample_in_corpus .......................... True
  block_data_path ................................. None
  checkpoint_activations .......................... False
  checkpoint_in_cpu ............................... False
  checkpoint_num_layers ........................... 1
  classes_fraction ................................ 1.0
  clip_grad ....................................... 1.0
  compression_training ............................ False
  consumed_train_samples .......................... 0
  consumed_train_tokens ........................... 0
  consumed_valid_samples .......................... 0
  contigious_checkpointing ........................ False
  cpu_optimizer ................................... False
  cpu_torch_adam .................................. False
  create_moe_param_group .......................... False
  curriculum_learning_legacy ...................... False
  data_cache_path ................................. checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/.cache/dolma/index-cache
  data_efficiency_curriculum_learning ............. False
  data_file_list .................................. /flare/datascience/foremans/projects/argonne-lcf/Megatron-DeepSpeed/ALCF/data-lists/aurora/dolma.txt
  data_impl ....................................... infer
  data_parallel_random_init ....................... False
  data_parallel_size .............................. 24
  data_path ....................................... None
  data_per_class_fraction ......................... 1.0
  data_sharding ................................... True
  dataloader_type ................................. single
  DDP_impl ........................................ local
  decoder_num_layers .............................. None
  decoder_seq_length .............................. None
  deepscale ....................................... False
  deepscale_config ................................ None
  deepspeed ....................................... True
  deepspeed_activation_checkpointing .............. False
  deepspeed_config ................................ /flare/datascience/foremans/projects/argonne-lcf/Megatron-DeepSpeed/ds-configs/ds_stage1_mb1_gb384_pp1_bf16.json
  dino_bottleneck_size ............................ 256
  dino_freeze_last_layer .......................... 1
  dino_head_hidden_size ........................... 2048
  dino_local_crops_number ......................... 10
  dino_local_img_size ............................. 96
  dino_norm_last_layer ............................ False
  dino_teacher_temp ............................... 0.07
  dino_warmup_teacher_temp ........................ 0.04
  dino_warmup_teacher_temp_epochs ................. 30
  distribute_checkpointed_activations ............. False
  distribute_saved_activations .................... False
  distributed_backend ............................. ccl
  distributed_timeout_minutes ..................... 10
  ds_fused_adam ................................... False
  ds_inference .................................... False
  ds_pipeline_enabled ............................. False
  ds_sequence_parallel_size ....................... 1
  embedding_path .................................. None
  embedding_weights_in_fp32 ....................... False
  empty_unused_memory_level ....................... 0
  enable_expert_tensor_parallelism ................ False
  enable_zbh1_exact_semantics ..................... False
  enable_zbh1_pipeline ............................ False
  encoder_num_layers .............................. 32
  encoder_seq_length .............................. 4096
  end_weight_decay ................................ 0.1
  eod_mask_loss ................................... False
  eval_interval ................................... 100
  eval_iters ...................................... 20
  evidence_data_path .............................. None
  exit_duration_in_mins ........................... None
  exit_interval ................................... None
  exit_on_missing_checkpoint ...................... False
  exit_signal_handler ............................. False
  expert_interval ................................. 2
  ffn_hidden_size ................................. 11008
  finetune ........................................ False
  force_ds_sequence_parallel ...................... False
  fp16 ............................................ False
  fp16_lm_cross_entropy ........................... False
  fp32_residual_connection ........................ False
  fp8_amax_compute_algo ........................... most_recent
  fp8_amax_history_len ............................ 1
  fp8_e4m3 ........................................ False
  fp8_hybrid ...................................... False
  fp8_interval .................................... 1
  fp8_margin ...................................... 0
  fp8_wgrad ....................................... True
  global_batch_size ............................... 384
  gradient_accumulation_fusion .................... False
  head_lr_mult .................................... 1.0
  hidden_dropout .................................. 0.0
  hidden_size ..................................... 4096
  hidden_size_teacher ............................. None
  hysteresis ...................................... 2
  ict_head_size ................................... None
  ict_load ........................................ None
  img_h ........................................... 224
  img_w ........................................... 224
  indexer_batch_size .............................. 128
  indexer_log_interval ............................ 1000
  inference ....................................... False
  inference_batch_times_seqlen_threshold .......... 512
  init_method_std ................................. 0.02
  init_method_xavier_uniform ...................... False
  initial_loss_scale .............................. 4294967296
  iter_per_epoch .................................. 1250
  kd .............................................. False
  kd_alpha_ce ..................................... 1
  kd_beta_ce ...................................... 1
  kd_temp ......................................... 1.0
  kill_switch_file ................................ None
  kv_channels ..................................... 128
  layernorm_epsilon ............................... 1e-05
  lazy_mpu_init ................................... None
  load ............................................ checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash
  load_tag ........................................ None
  load_teacher .................................... None
  local_rank ...................................... None
  log_batch_size_to_tensorboard ................... False
  log_interval .................................... 1
  log_learning_rate_to_tensorboard ................ True
  log_loss_scale_to_tensorboard ................... True
  log_memory_to_tensorboard ....................... False
  log_num_zeros_in_grad ........................... False
  log_optimizer_states_to_tensorboard ............. True
  log_params_norm ................................. False
  log_timers_to_tensorboard ....................... True
  log_validation_ppl_to_tensorboard ............... False
  log_world_size_to_tensorboard ................... False
  loss_scale ...................................... None
  loss_scale_window ............................... 1000
  lr .............................................. 0.0002
  lr_decay_iters .................................. None
  lr_decay_samples ................................ None
  lr_decay_style .................................. cosine
  lr_decay_tokens ................................. None
  lr_warmup_fraction .............................. 0.05
  lr_warmup_iters ................................. 0
  lr_warmup_samples ............................... 0
  lr_warmup_tokens ................................ None
  make_vocab_size_divisible_by .................... 128
  mask_factor ..................................... 1.0
  mask_prob ....................................... 0.15
  mask_type ....................................... random
  masked_softmax_fusion ........................... False
  max_position_embeddings ......................... 4096
  max_tokens_to_oom ............................... 12000
  mem_efficient_ln ................................ True
  memory_centric_tiled_linear ..................... False
  merge_file ...................................... None
  micro_batch_size ................................ 1
  min_loss_scale .................................. 1.0
  min_lr .......................................... 0.0
  mlp_type ........................................ standard
  mmap_warmup ..................................... False
  moe_eval_capacity_factor ........................ 1.0
  moe_expert_parallel_size ........................ 1
  moe_loss_coeff .................................. 0.1
  moe_min_capacity ................................ 4
  moe_token_dropping .............................. True
  moe_top2_2nd_expert_sampling .................... True
  moe_train_capacity_factor ....................... 1.0
  mos ............................................. False
  multiprocessing_context ......................... fork
  no_load_lr_state ................................ False
  no_load_optim ................................... None
  no_load_rng ..................................... None
  no_persist_layer_norm ........................... False
  no_pipeline_parallel ............................ True
  no_save_optim ................................... None
  no_save_rng ..................................... None
  normalization ................................... rmsnorm
  num_attention_heads ............................. 32
  num_attention_heads_teacher ..................... None
  num_channels .................................... 3
  num_classes ..................................... 1000
  num_experts ..................................... [1]
  num_experts_switch .............................. None
  num_experts_teacher ............................. [1]
  num_key_value_heads ............................. 8
  num_layers ...................................... 32
  num_layers_per_virtual_pipeline_stage ........... None
  num_layers_teacher .............................. None
  num_workers ..................................... 2
  onnx_safe ....................................... None
  openai_gelu ..................................... False
  optimizer ....................................... adamw
  output_bert_embeddings .......................... False
  overlap_p2p_comm ................................ False
  override_opt_param_scheduler .................... False
  params_dtype .................................... torch.bfloat16
  partition_activations ........................... False
  patch_dim ....................................... 16
  perform_initialization .......................... True
  pipeline_model_parallel_size .................... 1
  pipeline_model_parallel_split_rank .............. None
  profile ......................................... None
  profile_backward ................................ False
  profile_ranks ................................... None
  profile_steps ................................... 2,3
  query_in_block_prob ............................. 0.1
  rampup_batch_size ............................... None
  random_ltd ...................................... False
  rank ............................................ 0
  recompute_granularity ........................... None
  recompute_method ................................ None
  recompute_num_layers ............................ 1
  remote_device ................................... none
  repeated_dataloader ............................. False
  reset_attention_mask ............................ False
  reset_iteration ................................. False
  reset_position_ids .............................. False
  retriever_report_topk_accuracies ................ []
  retriever_score_scaling ......................... False
  retriever_seq_length ............................ 256
  retro_add_retriever ............................. False
  retro_cyclic_train_iters ........................ None
  retro_encoder_attention_dropout ................. 0.1
  retro_encoder_hidden_dropout .................... 0.1
  retro_encoder_layers ............................ 2
  retro_num_neighbors ............................. 2
  retro_num_retrieved_chunks ...................... 2
  retro_return_doc_ids ............................ False
  retro_workdir ................................... None
  return_data_index ............................... False
  rope_theta ...................................... 10000
  rotary_percent .................................. 1.0
  sample_rate ..................................... 1.0
  save ............................................ checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash
  save_interval ................................... 50
  scatter_gather_tensors_in_pipeline .............. True
  scattered_embeddings ............................ False
  schedulefree_for_each ........................... False
  seed ............................................ 1234
  seq_length ...................................... 4096
  sequence_parallel ............................... False
  sgd_momentum .................................... 0.9
  short_seq_prob .................................. 0.1
  shuffle_sample_in_corpus ........................ True
  skip_train ...................................... False
  sophiag_beta1 ................................... 0.9
  sophiag_beta2 ................................... 0.95
  sophiag_rho ..................................... 0.01
  split ........................................... 990,10,0
  split_transformers .............................. False
  squared_relu .................................... False
  standalone_embedding_stage ...................... False
  start_weight_decay .............................. 0.1
  swiglu .......................................... True
  swin_backbone_type .............................. tiny
  synchronize_each_layer .......................... False
  tensor_model_parallel_size ...................... 1
  tensorboard_dir ................................. checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/tensorboard
  tensorboard_log_interval ........................ 1
  tensorboard_queue_size .......................... 1000
  test_data_path .................................. None
  tile_factor ..................................... 1
  timing_log_level ................................ 1
  timing_log_option ............................... minmax
  titles_data_path ................................ None
  tokenizer_model ................................. /flare/datascience/foremans/projects/argonne-lcf/Megatron-DeepSpeed/ALCF/tokenizer.model
  tokenizer_type .................................. Llama2Tokenizer
  topk ............................................ 1
  trace_dir ....................................... ./trace/
  train_data_exact_num_epochs ..................... None
  train_data_path ................................. None
  train_desc_path ................................. None
  train_doc_idx_path .............................. None
  train_idx_path .................................. None
  train_iters ..................................... 1271565
  train_iters_to_skip ............................. None
  train_range_to_skip ............................. None
  train_sample_idx_path ........................... None
  train_samples ................................... None
  train_shuffle_idx_path .......................... None
  train_tokens .................................... None
  transformer_impl ................................ local
  transformer_pipeline_model_parallel_size ........ 1
  trust_remote_code ............................... False
  universal_checkpoint ............................ False
  untie_embeddings_and_output_weights ............. True
  use_checkpoint_args ............................. False
  use_checkpoint_opt_param_scheduler .............. True
  use_contiguous_buffers_in_local_ddp ............. True
  use_cpu_initialization .......................... None
  use_dataset_only ................................ False
  use_distributed_optimizer ....................... False
  use_flash_attn .................................. True
  use_flash_attn_builder .......................... True
  use_flash_attn_triton ........................... False
  use_flash_attn_v1 ............................... False
  use_flash_attn_v2 ............................... False
  use_mics ........................................ False
  use_one_sent_docs ............................... False
  use_pin_memory .................................. False
  use_ring_exchange_p2p ........................... False
  use_rotary_position_embeddings .................. True
  use_tutel ....................................... False
  valid_data_path ................................. None
  variable_seq_lengths ............................ False
  virtual_pipeline_model_parallel_size ............ None
  vision_backbone_type ............................ vit
  vision_pretraining .............................. False
  vision_pretraining_type ......................... classify
  vocab_extra_ids ................................. 0
  vocab_file ...................................... None
  vocab_size ...................................... None
  wandb_exp_name .................................. 
  wandb_project ................................... 
  wandb_save_dir .................................. 
  weight_decay .................................... 0.1
  weight_decay_incr_style ......................... constant
  world_size ...................................... 24
  zero_allgather_bucket_size ...................... 0.0
  zero_contigious_gradients ....................... False
  zero_reduce_bucket_size ......................... 0.0
  zero_reduce_scatter ............................. False
  zero_stage ...................................... 1
-------------------- end of arguments ---------------------
setting number of micro-batches to constant 16
> building Llama2Tokenizer tokenizer ...
 > padded vocab (size: 32000) with 0 dummy tokens (new size: 32000)
torch distributed is already initialized, skipping initialization ...
> initialized tensor model parallel with size 1
> initialized pipeline model parallel with size 1
> setting random seeds to 1234 ...
> initializing model parallel cuda seeds on global rank 0, model parallel rank 0, and data parallel rank 0 with model parallel seed: 3952 and data parallel seed: 1234
make: Entering directory '/lus/flare/projects/datascience/foremans/projects/argonne-lcf/Megatron-DeepSpeed/megatron/data'
make: Nothing to be done for 'default'.
make: Leaving directory '/lus/flare/projects/datascience/foremans/projects/argonne-lcf/Megatron-DeepSpeed/megatron/data'
> compiling dataset index builder ...
>>> done with dataset index builder. Compilation time: 0.146 seconds
>fused kernel is only supported in cuda, skip loading fused kernel
[2025-03-12 09:12:58][I][megatron/training:185] time to finish initialize_megatron: 26.979411840438843 seconds
2025:03:12-09:12:58:(18380) |CCL_WARN| value of CCL_KVS_MODE changed to be mpi (default:pmi)
2025:03:12-09:12:58:(18380) |CCL_WARN| value of CCL_KVS_CONNECTION_TIMEOUT changed to be 3600 (default:120)
2025:03:12-09:12:58:(18380) |CCL_WARN| value of CCL_BCAST changed to be double_tree (default:)
2025:03:12-09:12:58:(18380) |CCL_WARN| value of CCL_ENABLE_SYCL_KERNELS changed to be 1 (default:0)
2025:03:12-09:12:58:(18380) |CCL_WARN| value of CCL_SYCL_ESIMD changed to be 1 (default:0)
2025:03:12-09:12:58:(18380) |CCL_WARN| value of CCL_PROCESS_LAUNCHER changed to be pmix (default:hydra)
2025:03:12-09:12:58:(18380) |CCL_WARN| value of CCL_ZE_CACHE_OPEN_IPC_HANDLES_THRESHOLD changed to be 32768 (default:1000)
2025:03:12-09:12:58:(18380) |CCL_WARN| CCL_ALLGATHERV_MEDIUM_SIZE_THRESHOLD=0 is unknown to and unused by oneCCL code but is present in the environment, check if it is not mistyped.
2025:03:12-09:12:58:(18380) |CCL_WARN| CCL_SKIP_SCHEDULER=1 is unknown to and unused by oneCCL code but is present in the environment, check if it is not mistyped.
2025-03-12 09:13:00.634801: W external/local_tsl/tsl/lib/monitoring/collection_registry.cc:81] Trying to register 2 metrics with the same name: /tensorflow/core/bfc_allocator_delay. The old value will be erased in order to register a new one. Please check if you link the metric more than once, or if the name is already used by other metrics.
2025-03-12 09:13:00.658209: W external/local_tsl/tsl/lib/monitoring/collection_registry.cc:81] Trying to register 2 metrics with the same name: /xla/service/gpu/compiled_programs_count. The old value will be erased in order to register a new one. Please check if you link the metric more than once, or if the name is already used by other metrics.
2025-03-12 09:13:00.680012: W external/local_tsl/tsl/lib/monitoring/collection_registry.cc:81] Trying to register 2 metrics with the same name: /jax/pjrt/pjrt_executable_executions. The old value will be erased in order to register a new one. Please check if you link the metric more than once, or if the name is already used by other metrics.
2025-03-12 09:13:00.680028: W external/local_tsl/tsl/lib/monitoring/collection_registry.cc:81] Trying to register 2 metrics with the same name: /jax/pjrt/pjrt_executable_execution_time_usecs. The old value will be erased in order to register a new one. Please check if you link the metric more than once, or if the name is already used by other metrics.
2025-03-12 09:13:03.431071: I itex/core/wrapper/itex_gpu_wrapper.cc:38] Intel Extension for Tensorflow* GPU backend is loaded.
2025-03-12 09:13:03.507480: I itex/core/devices/gpu/itex_gpu_runtime.cc:130] Selected platform: Intel(R) Level-Zero
2025-03-12 09:13:03.507832: I itex/core/devices/gpu/itex_gpu_runtime.cc:155] number of sub-devices is zero, expose root device.
2025-03-12 09:13:03.507834: I itex/core/devices/gpu/itex_gpu_runtime.cc:155] number of sub-devices is zero, expose root device.
2025-03-12 09:13:03.507836: I itex/core/devices/gpu/itex_gpu_runtime.cc:155] number of sub-devices is zero, expose root device.
2025-03-12 09:13:03.507838: I itex/core/devices/gpu/itex_gpu_runtime.cc:155] number of sub-devices is zero, expose root device.
2025-03-12 09:13:03.507839: I itex/core/devices/gpu/itex_gpu_runtime.cc:155] number of sub-devices is zero, expose root device.
2025-03-12 09:13:03.507841: I itex/core/devices/gpu/itex_gpu_runtime.cc:155] number of sub-devices is zero, expose root device.
2025-03-12 09:13:03.507843: I itex/core/devices/gpu/itex_gpu_runtime.cc:155] number of sub-devices is zero, expose root device.
2025-03-12 09:13:03.507845: I itex/core/devices/gpu/itex_gpu_runtime.cc:155] number of sub-devices is zero, expose root device.
2025-03-12 09:13:03.507846: I itex/core/devices/gpu/itex_gpu_runtime.cc:155] number of sub-devices is zero, expose root device.
2025-03-12 09:13:03.507848: I itex/core/devices/gpu/itex_gpu_runtime.cc:155] number of sub-devices is zero, expose root device.
2025-03-12 09:13:03.507849: I itex/core/devices/gpu/itex_gpu_runtime.cc:155] number of sub-devices is zero, expose root device.
2025-03-12 09:13:03.507851: I itex/core/devices/gpu/itex_gpu_runtime.cc:155] number of sub-devices is zero, expose root device.
> setting tensorboard ...
WARNING: WANDB writing requested but no legit wandb project or experiment name provided, therefore no WANDB logs will be written according to random generated project or experiment name.
[2025-03-12 09:13:13][I][megatron/training:193] allreduce call time: 14.624404430389404 seconds
[2025-03-12 09:13:13][I][megatron/training:195] time to initialize megatron (seconds)=41.767
[2025-03-12 09:13:13][I][megatron/training:96] [after megatron is initialized] datetime=2025-03-12 09:13:13 
[2025-03-12 09:13:13][I][Megatron-DeepSpeed/pretrain_gpt_alcf:87:__main__] building GPT model ...
[2025-03-12 09:13:13,668] [INFO] [utils.py:781:see_memory_usage] Before Building Model
[2025-03-12 09:13:13,668] [INFO] [utils.py:782:see_memory_usage] MA 0.0 GB         Max_MA 0.0 GB         CA 0.0 GB         Max_CA 0 GB 
[2025-03-12 09:13:13,668] [INFO] [utils.py:789:see_memory_usage] CPU Virtual Memory:  used = 38.34 GB, percent = 3.4%
>fused kernel is only supported in cuda, skip loading fused kernel
[2025-03-12 09:13:13,975] [INFO] [config.py:734:__init__] Config mesh_device None world_size = 24
[2025-03-12 09:13:13][I][Megatron-DeepSpeed/pretrain_gpt_alcf:147:__main__] --------------------------------------------------------------------------------
[2025-03-12 09:13:13][I][Megatron-DeepSpeed/pretrain_gpt_alcf:148:__main__] Number of parameters in model: 5933109248
[2025-03-12 09:13:13][I][Megatron-DeepSpeed/pretrain_gpt_alcf:149:__main__] --------------------------------------------------------------------------------
[2025-03-12 09:13:14,183] [INFO] [utils.py:781:see_memory_usage] After Building Model
[2025-03-12 09:13:14,184] [INFO] [utils.py:782:see_memory_usage] MA 11.05 GB         Max_MA 11.05 GB         CA 11.05 GB         Max_CA 11 GB 
[2025-03-12 09:13:14,184] [INFO] [utils.py:789:see_memory_usage] CPU Virtual Memory:  used = 38.41 GB, percent = 3.4%
[2025-03-12 09:13:14][I][Megatron-DeepSpeed/pretrain_gpt_alcf:157:__main__] Patching tensorboard from checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/tensorboard
>fused kernel is only supported in cuda, skip loading fused kernel
[2025-03-12 09:13:14,226] [INFO] [config.py:734:__init__] Config mesh_device None world_size = 24
>fused kernel is only supported in cuda, skip loading fused kernel
[2025-03-12 09:13:14,379] [INFO] [config.py:734:__init__] Config mesh_device None world_size = 24
>fused kernel is only supported in cuda, skip loading fused kernel
[2025-03-12 09:13:14,426] [INFO] [config.py:734:__init__] Config mesh_device None world_size = 24
>fused kernel is only supported in cuda, skip loading fused kernel
[2025-03-12 09:13:14,456] [INFO] [config.py:734:__init__] Config mesh_device None world_size = 24
>fused kernel is only supported in cuda, skip loading fused kernel
[2025-03-12 09:13:14,530] [INFO] [config.py:734:__init__] Config mesh_device None world_size = 24
>fused kernel is only supported in cuda, skip loading fused kernel
[2025-03-12 09:13:14,560] [INFO] [config.py:734:__init__] Config mesh_device None world_size = 24
>fused kernel is only supported in cuda, skip loading fused kernel
[2025-03-12 09:13:14,561] [INFO] [config.py:734:__init__] Config mesh_device None world_size = 24
>fused kernel is only supported in cuda, skip loading fused kernel
[2025-03-12 09:13:14,571] [INFO] [config.py:734:__init__] Config mesh_device None world_size = 24
>fused kernel is only supported in cuda, skip loading fused kernel
[2025-03-12 09:13:14,571] [INFO] [config.py:734:__init__] Config mesh_device None world_size = 24
>fused kernel is only supported in cuda, skip loading fused kernel
[2025-03-12 09:13:14,577] [INFO] [config.py:734:__init__] Config mesh_device None world_size = 24
>fused kernel is only supported in cuda, skip loading fused kernel
[2025-03-12 09:13:14,586] [INFO] [config.py:734:__init__] Config mesh_device None world_size = 24
>fused kernel is only supported in cuda, skip loading fused kernel
[2025-03-12 09:13:14,587] [INFO] [config.py:734:__init__] Config mesh_device None world_size = 24
>fused kernel is only supported in cuda, skip loading fused kernel
[2025-03-12 09:13:14,587] [INFO] [config.py:734:__init__] Config mesh_device None world_size = 24
>fused kernel is only supported in cuda, skip loading fused kernel
[2025-03-12 09:13:14,588] [INFO] [config.py:734:__init__] Config mesh_device None world_size = 24
>fused kernel is only supported in cuda, skip loading fused kernel
[2025-03-12 09:13:14,594] [INFO] [config.py:734:__init__] Config mesh_device None world_size = 24
>fused kernel is only supported in cuda, skip loading fused kernel
[2025-03-12 09:13:14,599] [INFO] [config.py:734:__init__] Config mesh_device None world_size = 24
>fused kernel is only supported in cuda, skip loading fused kernel
[2025-03-12 09:13:14,602] [INFO] [config.py:734:__init__] Config mesh_device None world_size = 24
>fused kernel is only supported in cuda, skip loading fused kernel
[2025-03-12 09:13:14,608] [INFO] [config.py:734:__init__] Config mesh_device None world_size = 24
>fused kernel is only supported in cuda, skip loading fused kernel
[2025-03-12 09:13:14,618] [INFO] [config.py:734:__init__] Config mesh_device None world_size = 24
>fused kernel is only supported in cuda, skip loading fused kernel
[2025-03-12 09:13:14,619] [INFO] [config.py:734:__init__] Config mesh_device None world_size = 24
>fused kernel is only supported in cuda, skip loading fused kernel
[2025-03-12 09:13:14,620] [INFO] [config.py:734:__init__] Config mesh_device None world_size = 24
>fused kernel is only supported in cuda, skip loading fused kernel
[2025-03-12 09:13:14,621] [INFO] [config.py:734:__init__] Config mesh_device None world_size = 24
2025-03-12 09:13:15.568542: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2025-03-12 09:13:15.568571: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2025-03-12 09:13:15.636421: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2025-03-12 09:13:16.987447: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
2025-03-12 09:13:19.784258: W external/local_tsl/tsl/lib/monitoring/collection_registry.cc:81] Trying to register 2 metrics with the same name: /tensorflow/core/bfc_allocator_delay. The old value will be erased in order to register a new one. Please check if you link the metric more than once, or if the name is already used by other metrics.
2025-03-12 09:13:19.807894: W external/local_tsl/tsl/lib/monitoring/collection_registry.cc:81] Trying to register 2 metrics with the same name: /xla/service/gpu/compiled_programs_count. The old value will be erased in order to register a new one. Please check if you link the metric more than once, or if the name is already used by other metrics.
2025-03-12 09:13:19.829831: W external/local_tsl/tsl/lib/monitoring/collection_registry.cc:81] Trying to register 2 metrics with the same name: /jax/pjrt/pjrt_executable_executions. The old value will be erased in order to register a new one. Please check if you link the metric more than once, or if the name is already used by other metrics.
2025-03-12 09:13:19.829852: W external/local_tsl/tsl/lib/monitoring/collection_registry.cc:81] Trying to register 2 metrics with the same name: /jax/pjrt/pjrt_executable_execution_time_usecs. The old value will be erased in order to register a new one. Please check if you link the metric more than once, or if the name is already used by other metrics.
2025-03-12 09:13:22.580672: I itex/core/wrapper/itex_gpu_wrapper.cc:38] Intel Extension for Tensorflow* GPU backend is loaded.
2025-03-12 09:13:22.678320: I itex/core/devices/gpu/itex_gpu_runtime.cc:130] Selected platform: Intel(R) Level-Zero
2025-03-12 09:13:22.678675: I itex/core/devices/gpu/itex_gpu_runtime.cc:155] number of sub-devices is zero, expose root device.
2025-03-12 09:13:22.678679: I itex/core/devices/gpu/itex_gpu_runtime.cc:155] number of sub-devices is zero, expose root device.
2025-03-12 09:13:22.678681: I itex/core/devices/gpu/itex_gpu_runtime.cc:155] number of sub-devices is zero, expose root device.
2025-03-12 09:13:22.678683: I itex/core/devices/gpu/itex_gpu_runtime.cc:155] number of sub-devices is zero, expose root device.
2025-03-12 09:13:22.678685: I itex/core/devices/gpu/itex_gpu_runtime.cc:155] number of sub-devices is zero, expose root device.
2025-03-12 09:13:22.678686: I itex/core/devices/gpu/itex_gpu_runtime.cc:155] number of sub-devices is zero, expose root device.
2025-03-12 09:13:22.678688: I itex/core/devices/gpu/itex_gpu_runtime.cc:155] number of sub-devices is zero, expose root device.
2025-03-12 09:13:22.678689: I itex/core/devices/gpu/itex_gpu_runtime.cc:155] number of sub-devices is zero, expose root device.
2025-03-12 09:13:22.678691: I itex/core/devices/gpu/itex_gpu_runtime.cc:155] number of sub-devices is zero, expose root device.
2025-03-12 09:13:22.678692: I itex/core/devices/gpu/itex_gpu_runtime.cc:155] number of sub-devices is zero, expose root device.
2025-03-12 09:13:22.678694: I itex/core/devices/gpu/itex_gpu_runtime.cc:155] number of sub-devices is zero, expose root device.
2025-03-12 09:13:22.678696: I itex/core/devices/gpu/itex_gpu_runtime.cc:155] number of sub-devices is zero, expose root device.
[2025-03-12 09:13:23][I][Megatron-DeepSpeed/pretrain_gpt_alcf:164:__main__] Updating WandB run.config: [misty-deluge-1402](https://wandb.ai/aurora_gpt/AuroraGPT/runs/by2ozrz3)
[2025-03-12 09:13:23][I][ezpz/dist:125] `model_provider`, {'pre_process': True, 'post_process': True}) took: dt=9.6073s
 > number of parameters on (tensor, pipeline) model parallel rank (0, 0)=5933109248
[2025-03-12 09:13:23][I][ezpz/dist:125] `get_model`((<function model_provider at 0x152f6ca89ea0>, <ModelType.encoder_or_decoder: 1>)) took: dt=9.6094s
[2025-03-12 09:13:23][I][megatron/utils:368] > learning rate decay style: cosine
[2025-03-12 09:13:23][I][ezpz/dist:125] `get_optimizer_param_scheduler`((AdamW (
Parameter Group 0
    amsgrad: False
    betas: (0.9, 0.95)
    capturable: False
    differentiable: False
    eps: 1e-05
    foreach: None
    fused: None
    lr: 0.0
    lr_mult: 1.0
    maximize: False
    name: wd_no_scale_lr
    wd_mult: 1.0
    weight_decay: 0.1

Parameter Group 1
    amsgrad: False
    betas: (0.9, 0.95)
    capturable: False
    differentiable: False
    eps: 1e-05
    foreach: None
    fused: None
    lr: 0.0
    lr_mult: 1.0
    maximize: False
    name: no_wd_no_scale_lr
    wd_mult: 0.0
    weight_decay: 0.0
),)) took: dt=0.0005s
[2025-03-12 09:13:23][I][megatron/training:692] DeepSpeed is enabled.
[2025-03-12 09:13:23][I][megatron/training:747] Did NOT catch: ('args.data_efficiency_curriculum_learning' and 'build_train_valid_test_datasets_provider is not None')
[2025-03-12 09:13:23][I][megatron/training:756] Calling 'deepspeed.initialize'...
[2025-03-12 09:13:23][I][megatron/training:757] Wrapped with: profiler=<megatron.utils.Profile object at 0x152e64716b00>
[2025-03-12 09:13:23,094] [INFO] [logging.py:128:log_dist] [Rank 0] DeepSpeed info: version=0.16.4+9f1ac32c, git-hash=9f1ac32c, git-branch=saforem2/ucp-bug
[2025-03-12 09:13:23,095] [INFO] [config.py:734:__init__] Config mesh_device None world_size = 24
[2025-03-12 09:13:30,773] [INFO] [logging.py:128:log_dist] [Rank 0] DeepSpeed Flops Profiler Enabled: True
[2025-03-12 09:13:30,774] [INFO] [logging.py:128:log_dist] [Rank 0] Using client Optimizer as basic optimizer
[2025-03-12 09:13:30,775] [INFO] [logging.py:128:log_dist] [Rank 0] Removing param_group that has no 'params' in the basic Optimizer
[2025-03-12 09:13:30,778] [INFO] [logging.py:128:log_dist] [Rank 0] DeepSpeed Basic Optimizer = AdamW
[2025-03-12 09:13:30,778] [INFO] [utils.py:59:is_zero_supported_optimizer] Checking ZeRO support for optimizer=AdamW type=<class 'torch.optim.adamw.AdamW'>
[2025-03-12 09:13:30,778] [INFO] [logging.py:128:log_dist] [Rank 0] Creating torch.bfloat16 ZeRO stage 1 optimizer
[2025-03-12 09:13:30,778] [INFO] [stage_1_and_2.py:149:__init__] Reduce bucket size 500000000
[2025-03-12 09:13:30,778] [INFO] [stage_1_and_2.py:150:__init__] Allgather bucket size 500000000
[2025-03-12 09:13:30,778] [INFO] [stage_1_and_2.py:151:__init__] CPU Offload: False
[2025-03-12 09:13:30,779] [INFO] [stage_1_and_2.py:152:__init__] Round robin gradient partitioning: False
[2025-03-12 09:13:34,939] [INFO] [utils.py:781:see_memory_usage] Before initializing optimizer states
[2025-03-12 09:13:34,940] [INFO] [utils.py:782:see_memory_usage] MA 11.97 GB         Max_MA 11.97 GB         CA 11.97 GB         Max_CA 12 GB 
[2025-03-12 09:13:34,940] [INFO] [utils.py:789:see_memory_usage] CPU Virtual Memory:  used = 51.36 GB, percent = 4.5%
[2025-03-12 09:13:35,132] [INFO] [utils.py:781:see_memory_usage] After initializing optimizer states
[2025-03-12 09:13:35,132] [INFO] [utils.py:782:see_memory_usage] MA 11.97 GB         Max_MA 12.9 GB         CA 12.9 GB         Max_CA 13 GB 
[2025-03-12 09:13:35,132] [INFO] [utils.py:789:see_memory_usage] CPU Virtual Memory:  used = 51.36 GB, percent = 4.5%
[2025-03-12 09:13:35,132] [INFO] [stage_1_and_2.py:550:__init__] optimizer state initialized
[2025-03-12 09:13:35,305] [INFO] [utils.py:781:see_memory_usage] After initializing ZeRO optimizer
[2025-03-12 09:13:35,306] [INFO] [utils.py:782:see_memory_usage] MA 11.97 GB         Max_MA 11.97 GB         CA 12.9 GB         Max_CA 13 GB 
[2025-03-12 09:13:35,306] [INFO] [utils.py:789:see_memory_usage] CPU Virtual Memory:  used = 51.36 GB, percent = 4.5%
[2025-03-12 09:13:35,307] [INFO] [logging.py:128:log_dist] [Rank 0] DeepSpeed Final Optimizer = DeepSpeedZeroOptimizer
[2025-03-12 09:13:35,307] [INFO] [logging.py:128:log_dist] [Rank 0] DeepSpeed using client LR scheduler
[2025-03-12 09:13:35,307] [INFO] [logging.py:128:log_dist] [Rank 0] DeepSpeed LR Scheduler = <megatron.optimizer_param_scheduler.OptimizerParamScheduler object at 0x152e64716bc0>
[2025-03-12 09:13:35,307] [INFO] [logging.py:128:log_dist] [Rank 0] step=0, skipped=0, lr=[0.0, 0.0], mom=[(0.9, 0.95), (0.9, 0.95)]
[2025-03-12 09:13:35,308] [INFO] [config.py:1001:print] DeepSpeedEngine configuration:
[2025-03-12 09:13:35,308] [INFO] [config.py:1005:print]   activation_checkpointing_config  {
    "partition_activations": false, 
    "contiguous_memory_optimization": false, 
    "cpu_checkpointing": false, 
    "number_checkpoints": null, 
    "synchronize_checkpoint_boundary": false, 
    "profile": false
}
[2025-03-12 09:13:35,308] [INFO] [config.py:1005:print]   aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True, 'use_gds': False}
[2025-03-12 09:13:35,308] [INFO] [config.py:1005:print]   amp_enabled .................. False
[2025-03-12 09:13:35,308] [INFO] [config.py:1005:print]   amp_params ................... False
[2025-03-12 09:13:35,309] [INFO] [config.py:1005:print]   autotuning_config ............ {
    "enabled": false, 
    "start_step": null, 
    "end_step": null, 
    "metric_path": null, 
    "arg_mappings": null, 
    "metric": "throughput", 
    "model_info": null, 
    "results_dir": "autotuning_results", 
    "exps_dir": "autotuning_exps", 
    "overwrite": true, 
    "fast": true, 
    "start_profile_step": 3, 
    "end_profile_step": 5, 
    "tuner_type": "gridsearch", 
    "tuner_early_stopping": 5, 
    "tuner_num_trials": 50, 
    "model_info_path": null, 
    "mp_size": 1, 
    "max_train_batch_size": null, 
    "min_train_batch_size": 1, 
    "max_train_micro_batch_size_per_gpu": 1.024000e+03, 
    "min_train_micro_batch_size_per_gpu": 1, 
    "num_tuning_micro_batch_sizes": 3
}
[2025-03-12 09:13:35,309] [INFO] [config.py:1005:print]   bfloat16_enabled ............. True
[2025-03-12 09:13:35,309] [INFO] [config.py:1005:print]   bfloat16_immediate_grad_update  False
[2025-03-12 09:13:35,309] [INFO] [config.py:1005:print]   checkpoint_parallel_write_pipeline  False
[2025-03-12 09:13:35,309] [INFO] [config.py:1005:print]   checkpoint_tag_validation_enabled  True
[2025-03-12 09:13:35,309] [INFO] [config.py:1005:print]   checkpoint_tag_validation_fail  False
[2025-03-12 09:13:35,309] [INFO] [config.py:1005:print]   comms_config ................. <deepspeed.comm.config.DeepSpeedCommsConfig object at 0x152e64745780>
[2025-03-12 09:13:35,309] [INFO] [config.py:1005:print]   communication_data_type ...... None
[2025-03-12 09:13:35,309] [INFO] [config.py:1005:print]   compression_config ........... {'weight_quantization': {'shared_parameters': {'enabled': False, 'quantizer_kernel': False, 'schedule_offset': 0, 'quantize_groups': 1, 'quantize_verbose': False, 'quantization_type': 'symmetric', 'quantize_weight_in_forward': False, 'rounding': 'nearest', 'fp16_mixed_quantize': False, 'quantize_change_ratio': 0.001}, 'different_groups': {}}, 'activation_quantization': {'shared_parameters': {'enabled': False, 'quantization_type': 'symmetric', 'range_calibration': 'dynamic', 'schedule_offset': 1000}, 'different_groups': {}}, 'sparse_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'row_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'head_pruning': {'shared_parameters': {'enabled': False, 'method': 'topk', 'schedule_offset': 1000}, 'different_groups': {}}, 'channel_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {
}}, 'layer_reduction': {'enabled': False}}
[2025-03-12 09:13:35,309] [INFO] [config.py:1005:print]   curriculum_enabled_legacy .... False
[2025-03-12 09:13:35,309] [INFO] [config.py:1005:print]   curriculum_params_legacy ..... False
[2025-03-12 09:13:35,309] [INFO] [config.py:1005:print]   data_efficiency_config ....... {'enabled': False, 'seed': 1234, 'data_sampling': {'enabled': False, 'num_epochs': 1000, 'num_workers': 0, 'curriculum_learning': {'enabled': False}}, 'data_routing': {'enabled': False, 'random_ltd': {'enabled': False, 'layer_token_lr_schedule': {'enabled': False}}}}
[2025-03-12 09:13:35,309] [INFO] [config.py:1005:print]   data_efficiency_enabled ...... False
[2025-03-12 09:13:35,309] [INFO] [config.py:1005:print]   dataloader_drop_last ......... False
[2025-03-12 09:13:35,309] [INFO] [config.py:1005:print]   disable_allgather ............ False
[2025-03-12 09:13:35,310] [INFO] [config.py:1005:print]   dump_state ................... False
[2025-03-12 09:13:35,310] [INFO] [config.py:1005:print]   dynamic_loss_scale_args ...... None
[2025-03-12 09:13:35,310] [INFO] [config.py:1005:print]   eigenvalue_enabled ........... False
[2025-03-12 09:13:35,310] [INFO] [config.py:1005:print]   eigenvalue_gas_boundary_resolution  1
[2025-03-12 09:13:35,310] [INFO] [config.py:1005:print]   eigenvalue_layer_name ........ bert.encoder.layer
[2025-03-12 09:13:35,310] [INFO] [config.py:1005:print]   eigenvalue_layer_num ......... 0
[2025-03-12 09:13:35,310] [INFO] [config.py:1005:print]   eigenvalue_max_iter .......... 100
[2025-03-12 09:13:35,310] [INFO] [config.py:1005:print]   eigenvalue_stability ......... 1e-06
[2025-03-12 09:13:35,310] [INFO] [config.py:1005:print]   eigenvalue_tol ............... 0.01
[2025-03-12 09:13:35,310] [INFO] [config.py:1005:print]   eigenvalue_verbose ........... False
[2025-03-12 09:13:35,310] [INFO] [config.py:1005:print]   elasticity_enabled ........... False
[2025-03-12 09:13:35,310] [INFO] [config.py:1005:print]   flops_profiler_config ........ {
    "enabled": true, 
    "recompute_fwd_factor": 0.0, 
    "profile_step": 2, 
    "module_depth": -1, 
    "top_modules": 1, 
    "detailed": true, 
    "output_file": null
}
[2025-03-12 09:13:35,310] [INFO] [config.py:1005:print]   fp16_auto_cast ............... None
[2025-03-12 09:13:35,310] [INFO] [config.py:1005:print]   fp16_enabled ................. False
[2025-03-12 09:13:35,310] [INFO] [config.py:1005:print]   fp16_master_weights_and_gradients  False
[2025-03-12 09:13:35,310] [INFO] [config.py:1005:print]   global_rank .................. 0
[2025-03-12 09:13:35,310] [INFO] [config.py:1005:print]   grad_accum_dtype ............. None
[2025-03-12 09:13:35,311] [INFO] [config.py:1005:print]   gradient_accumulation_steps .. 16
[2025-03-12 09:13:35,311] [INFO] [config.py:1005:print]   gradient_clipping ............ 1.0
[2025-03-12 09:13:35,311] [INFO] [config.py:1005:print]   gradient_predivide_factor .... 1.0
[2025-03-12 09:13:35,311] [INFO] [config.py:1005:print]   graph_harvesting ............. False
[2025-03-12 09:13:35,311] [INFO] [config.py:1005:print]   hybrid_engine ................ enabled=False max_out_tokens=512 inference_tp_size=1 release_inference_cache=False pin_parameters=True tp_gather_partition_size=8
[2025-03-12 09:13:35,311] [INFO] [config.py:1005:print]   initial_dynamic_scale ........ 1
[2025-03-12 09:13:35,311] [INFO] [config.py:1005:print]   load_universal_checkpoint .... False
[2025-03-12 09:13:35,311] [INFO] [config.py:1005:print]   loss_scale ................... 1.0
[2025-03-12 09:13:35,311] [INFO] [config.py:1005:print]   memory_breakdown ............. False
[2025-03-12 09:13:35,311] [INFO] [config.py:1005:print]   mics_hierarchial_params_gather  False
[2025-03-12 09:13:35,311] [INFO] [config.py:1005:print]   mics_shard_size .............. -1
[2025-03-12 09:13:35,311] [INFO] [config.py:1005:print]   monitor_config ............... tensorboard=TensorBoardConfig(enabled=False, output_path='', job_name='DeepSpeedJobName') comet=CometConfig(enabled=False, samples_log_interval=100, project=None, workspace=None, api_key=None, experiment_name=None, experiment_key=None, online=None, mode=None) wandb=WandbConfig(enabled=False, group=None, team=None, project='deepspeed') csv_monitor=CSVConfig(enabled=False, output_path='', job_name='DeepSpeedJobName')
[2025-03-12 09:13:35,311] [INFO] [config.py:1005:print]   nebula_config ................ {
    "enabled": false, 
    "persistent_storage_path": null, 
    "persistent_time_interval": 100, 
    "num_of_version_in_retention": 2, 
    "enable_nebula_load": true, 
    "load_path": null
}
[2025-03-12 09:13:35,311] [INFO] [config.py:1005:print]   optimizer_legacy_fusion ...... False
[2025-03-12 09:13:35,311] [INFO] [config.py:1005:print]   optimizer_name ............... None
[2025-03-12 09:13:35,311] [INFO] [config.py:1005:print]   optimizer_params ............. None
[2025-03-12 09:13:35,311] [INFO] [config.py:1005:print]   pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0, 'pipe_partitioned': True, 'grad_partitioned': True}
[2025-03-12 09:13:35,312] [INFO] [config.py:1005:print]   pld_enabled .................. False
[2025-03-12 09:13:35,312] [INFO] [config.py:1005:print]   pld_params ................... False
[2025-03-12 09:13:35,312] [INFO] [config.py:1005:print]   prescale_gradients ........... False
[2025-03-12 09:13:35,312] [INFO] [config.py:1005:print]   scheduler_name ............... None
[2025-03-12 09:13:35,312] [INFO] [config.py:1005:print]   scheduler_params ............. None
[2025-03-12 09:13:35,312] [INFO] [config.py:1005:print]   seq_parallel_communication_data_type  torch.float32
[2025-03-12 09:13:35,312] [INFO] [config.py:1005:print]   sparse_attention ............. None
[2025-03-12 09:13:35,312] [INFO] [config.py:1005:print]   sparse_gradients_enabled ..... False
[2025-03-12 09:13:35,312] [INFO] [config.py:1005:print]   steps_per_print .............. 1
[2025-03-12 09:13:35,312] [INFO] [config.py:1005:print]   tensor_parallel_config ....... dtype=torch.float16 autotp_size=0 tensor_parallel=TPConfig(tp_size=1, tp_grain_size=1, mpu=None, tp_group=None) injection_policy_tuple=None keep_module_on_host=False replace_with_kernel_inject=False
[2025-03-12 09:13:35,312] [INFO] [config.py:1005:print]   timers_config ................ enabled=True synchronized=True
[2025-03-12 09:13:35,312] [INFO] [config.py:1005:print]   train_batch_size ............. 384
[2025-03-12 09:13:35,312] [INFO] [config.py:1005:print]   train_micro_batch_size_per_gpu  1
[2025-03-12 09:13:35,312] [INFO] [config.py:1005:print]   use_data_before_expert_parallel_  False
[2025-03-12 09:13:35,312] [INFO] [config.py:1005:print]   use_node_local_storage ....... False
[2025-03-12 09:13:35,312] [INFO] [config.py:1005:print]   wall_clock_breakdown ......... True
[2025-03-12 09:13:35,312] [INFO] [config.py:1005:print]   weight_quantization_config ... None
[2025-03-12 09:13:35,313] [INFO] [config.py:1005:print]   world_size ................... 24
[2025-03-12 09:13:35,313] [INFO] [config.py:1005:print]   zero_allow_untested_optimizer  True
[2025-03-12 09:13:35,313] [INFO] [config.py:1005:print]   zero_config .................. stage=1 contiguous_gradients=True reduce_scatter=True reduce_bucket_size=500000000 use_multi_rank_bucket_allreduce=True allgather_partitions=True allgather_bucket_size=500000000 overlap_comm=False load_from_fp32_weights=True elastic_checkpoint=False offload_param=None offload_optimizer=None sub_group_size=1000000000 cpu_offload_param=None cpu_offload_use_pin_memory=None cpu_offload=None prefetch_bucket_size=50000000 param_persistence_threshold=100000 model_persistence_threshold=9223372036854775807 max_live_parameters=1000000000 max_reuse_distance=1000000000 gather_16bit_weights_on_model_save=False module_granularity_threshold=0 use_all_reduce_for_fetch_params=False stage3_gather_fp16_weights_on_model_save=False ignore_unused_parameters=True legacy_stage1=False round_robin_gradients=False zero_hpz_partition_size=1 zero_quantized_weights=False zero_quantized_nontrainable_weights=False zero_quantized_gradients=False zeropp_loco_param=None mics_shard_size=-1 mics_hierarchical_params_gather=False memory_efficient_line
ar=True pipeline_loading_checkpoint=False override_module_apply=True
[2025-03-12 09:13:35,313] [INFO] [config.py:1005:print]   zero_enabled ................. True
[2025-03-12 09:13:35,313] [INFO] [config.py:1005:print]   zero_force_ds_cpu_optimizer .. False
[2025-03-12 09:13:35,313] [INFO] [config.py:1005:print]   zero_optimization_stage ...... 1
[2025-03-12 09:13:35,313] [INFO] [config.py:991:print_user_config]   json = {
    "train_batch_size": 384, 
    "train_micro_batch_size_per_gpu": 1, 
    "gradient_clipping": 1.0, 
    "steps_per_print": 1, 
    "gradient_accumulation_steps": 16, 
    "zero_force_ds_cpu_optimizer": false, 
    "zero_allow_untested_optimizer": true, 
    "wall_clock_breakdown": false, 
    "zero_optimization": {
        "stage": 1
    }, 
    "fp16": {
        "enabled": false, 
        "loss_scale": 0, 
        "loss_scale_window": 1000, 
        "hysteresis": 2, 
        "min_loss_scale": 1
    }, 
    "bfloat16": {
        "enabled": true, 
        "loss_scale": 1.0
    }, 
    "comms_logger": {
        "enabled": false, 
        "verbose": false, 
        "debug": false
    }, 
    "flops_profiler": {
        "enabled": true, 
        "profile_step": 2, 
        "module_depth": -1, 
        "top_modules": 1, 
        "detailed": true, 
        "output_file": null
    }
}
[2025-03-12 09:13:35][I][megatron/training:767] 'deepspeed.initialize' took: 12.21954s
[2025-03-12 09:13:35][I][megatron/checkpointing:568] Unable to load lr_state_dict from lr_state_dict_fp=PosixPath('checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/lr_state_dict_0_of_24.yaml'), but strict=False. Returning empty dictionary: lr_state_dict={}
[2025-03-12 09:13:35,320] [WARNING] [engine.py:2909:load_checkpoint] Unable to find latest file at checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2025-03-12 09:13:35,320] [WARNING] [engine.py:2909:load_checkpoint] Unable to find latest file at checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2025-03-12 09:13:35,320] [WARNING] [engine.py:2909:load_checkpoint] Unable to find latest file at checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2025-03-12 09:13:35,320] [WARNING] [engine.py:2909:load_checkpoint] Unable to find latest file at checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2025-03-12 09:13:35,320] [WARNING] [engine.py:2909:load_checkpoint] Unable to find latest file at checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2025-03-12 09:13:35,320] [WARNING] [engine.py:2909:load_checkpoint] Unable to find latest file at checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2025-03-12 09:13:35,320] [WARNING] [engine.py:2909:load_checkpoint] Unable to find latest file at checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2025-03-12 09:13:35,320] [WARNING] [engine.py:2909:load_checkpoint] Unable to find latest file at checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2025-03-12 09:13:35,320] [WARNING] [engine.py:2909:load_checkpoint] Unable to find latest file at checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2025-03-12 09:13:35,320] [WARNING] [engine.py:2909:load_checkpoint] Unable to find latest file at checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2025-03-12 09:13:35,320] [WARNING] [engine.py:2909:load_checkpoint] Unable to find latest file at checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2025-03-12 09:13:35,320] [WARNING] [engine.py:2909:load_checkpoint] Unable to find latest file at checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2025-03-12 09:13:35][I][megatron/utils:368] WARNING: could not find the metadata file checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash 
[2025-03-12 09:13:35,320] [WARNING] [engine.py:2909:load_checkpoint] Unable to find latest file at checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2025-03-12 09:13:35][I][megatron/utils:368]     will not load any checkpoints and will start from random
[2025-03-12 09:13:35,321] [WARNING] [engine.py:2909:load_checkpoint] Unable to find latest file at checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2025-03-12 09:13:35,321] [WARNING] [engine.py:2909:load_checkpoint] Unable to find latest file at checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2025-03-12 09:13:35,322] [WARNING] [engine.py:2909:load_checkpoint] Unable to find latest file at checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2025-03-12 09:13:35,323] [WARNING] [engine.py:2909:load_checkpoint] Unable to find latest file at checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2025-03-12 09:13:35,324] [WARNING] [engine.py:2909:load_checkpoint] Unable to find latest file at checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2025-03-12 09:13:35,325] [WARNING] [engine.py:2909:load_checkpoint] Unable to find latest file at checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2025-03-12 09:13:35,325] [WARNING] [engine.py:2909:load_checkpoint] Unable to find latest file at checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2025-03-12 09:13:35,326] [WARNING] [engine.py:2909:load_checkpoint] Unable to find latest file at checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2025-03-12 09:13:35,327] [WARNING] [engine.py:2909:load_checkpoint] Unable to find latest file at checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2025-03-12 09:13:35,328] [WARNING] [engine.py:2909:load_checkpoint] Unable to find latest file at checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2025-03-12 09:13:35,329] [WARNING] [engine.py:2909:load_checkpoint] Unable to find latest file at checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
(min, max) time across ranks (ms):
    load-checkpoint ................................: (15.33, 15.42)
[2025-03-12 09:13:44][I][ezpz/dist:125] `setup_model_and_optimizer`((<function model_provider at 0x152f6ca89ea0>, <ModelType.encoder_or_decoder: 1>), {'teacher': False, 'data_post_process': <function data_post_process at 0x152f6ca8a290>, 'build_train_valid_test_datasets_provider': <function train_valid_test_datasets_provider at 0x152f6ca8ab90>}) took: dt=31.3965s
[2025-03-12 09:13:44][I][megatron/training:96] [after model, optimizer, and learning rate scheduler are built] datetime=2025-03-12 09:13:44 
[2025-03-12 09:13:44][I][megatron/training:1510] > building train, validation, and test datasets ...
[2025-03-12 09:13:44][I][megatron/training:1493]  > datasets target sizes (minimum size):
[2025-03-12 09:13:44][I][megatron/training:1494]     train:      488280960
[2025-03-12 09:13:44][I][megatron/training:1495]     validation: 97658880
[2025-03-12 09:13:44][I][megatron/training:1496]     test:       7680
[2025-03-12 09:13:44][I][Megatron-DeepSpeed/pretrain_gpt_alcf:465:__main__] > building train, validation, and test datasets for GPT ...
[2025-03-12 09:13:44][I][Megatron-DeepSpeed/pretrain_gpt_alcf:468:__main__] Reading datasets from /flare/datascience/foremans/projects/argonne-lcf/Megatron-DeepSpeed/ALCF/data-lists/aurora/dolma.txt
[2025-03-12 09:13:44][W][utils/_logger:68:megatron.data.gpt_dataset]  > WARNING: could not find index map files, building on rank 0
    using:
     number of documents:       76719
     number of epochs:          3
     sequence length:           4096
     total number of samples:   1076724
[2025-03-12 09:13:45][W][utils/_logger:68:megatron.data.gpt_dataset]  > WARNING: could not find index map files, building on rank 0
    using:
     number of documents:       16107
     number of epochs:          3
     sequence length:           4096
     total number of samples:   230638
[2025-03-12 09:13:45][W][utils/_logger:68:megatron.data.gpt_dataset]  > WARNING: could not find index map files, building on rank 0
    using:
     number of documents:       13889
     number of epochs:          3
     sequence length:           4096
     total number of samples:   202946
[2025-03-12 09:13:45][W][utils/_logger:68:megatron.data.gpt_dataset]  > WARNING: could not find index map files, building on rank 0
    using:
     number of documents:       12255
     number of epochs:          3
     sequence length:           4096
     total number of samples:   183947
[2025-03-12 09:13:45][W][utils/_logger:68:megatron.data.gpt_dataset]  > WARNING: could not find index map files, building on rank 0
    using:
     number of documents:       13559
     number of epochs:          3
     sequence length:           4096
     total number of samples:   191776
[2025-03-12 09:13:45][W][utils/_logger:68:megatron.data.gpt_dataset]  > WARNING: could not find index map files, building on rank 0
    using:
     number of documents:       1535650
     number of epochs:          1
     sequence length:           4096
     total number of samples:   232658
[2025-03-12 09:13:45][W][utils/_logger:68:megatron.data.gpt_dataset]  > WARNING: could not find index map files, building on rank 0
    using:
     number of documents:       1536175
     number of epochs:          1
     sequence length:           4096
     total number of samples:   232428
[2025-03-12 09:13:46][W][utils/_logger:68:megatron.data.gpt_dataset]  > WARNING: could not find index map files, building on rank 0
    using:
     number of documents:       1485173
     number of epochs:          1
     sequence length:           4096
     total number of samples:   226616
[2025-03-12 09:13:46][W][utils/_logger:68:megatron.data.gpt_dataset]  > WARNING: could not find index map files, building on rank 0
    using:
     number of documents:       1452918
     number of epochs:          1
     sequence length:           4096
     total number of samples:   221729
[2025-03-12 09:13:46][W][utils/_logger:68:megatron.data.gpt_dataset]  > WARNING: could not find index map files, building on rank 0
    using:
     number of documents:       1427747
     number of epochs:          1
     sequence length:           4096
     total number of samples:   218369
[2025-03-12 09:13:46][W][utils/_logger:68:megatron.data.gpt_dataset]  > WARNING: could not find index map files, building on rank 0
    using:
     number of documents:       1418426
     number of epochs:          1
     sequence length:           4096
     total number of samples:   216980
[2025-03-12 09:13:46][W][utils/_logger:68:megatron.data.gpt_dataset]  > WARNING: could not find index map files, building on rank 0
    using:
     number of documents:       1394724
     number of epochs:          1
     sequence length:           4096
     total number of samples:   214265
[2025-03-12 09:13:47][W][utils/_logger:68:megatron.data.gpt_dataset]  > WARNING: could not find index map files, building on rank 0
    using:
     number of documents:       1377335
     number of epochs:          1
     sequence length:           4096
     total number of samples:   211248
[2025-03-12 09:13:47][W][utils/_logger:68:megatron.data.gpt_dataset]  > WARNING: could not find index map files, building on rank 0
    using:
     number of documents:       1950844
     number of epochs:          1
     sequence length:           4096
     total number of samples:   429672
[2025-03-12 09:13:47][W][utils/_logger:68:megatron.data.gpt_dataset]  > WARNING: could not find index map files, building on rank 0
    using:
     number of documents:       1386132
     number of epochs:          1
     sequence length:           4096
     total number of samples:   300551
[2025-03-12 09:13:47][W][utils/_logger:68:megatron.data.gpt_dataset]  > WARNING: could not find index map files, building on rank 0
    using:
     number of documents:       1452549
     number of epochs:          1
     sequence length:           4096
     total number of samples:   297764
[2025-03-12 09:13:48][W][utils/_logger:68:megatron.data.gpt_dataset]  > WARNING: could not find index map files, building on rank 0
    using:
     number of documents:       1202980
     number of epochs:          1
     sequence length:           4096
     total number of samples:   243814
[2025-03-12 09:13:48][W][utils/_logger:68:megatron.data.gpt_dataset]  > WARNING: could not find index map files, building on rank 0
    using:
     number of documents:       2283343
     number of epochs:          1
     sequence length:           4096
     total number of samples:   475304
[2025-03-12 09:13:48][W][utils/_logger:68:megatron.data.gpt_dataset]  > WARNING: could not find index map files, building on rank 0
    using:
     number of documents:       1524141
     number of epochs:          1
     sequence length:           4096
     total number of samples:   296513
[2025-03-12 09:13:48][W][utils/_logger:68:megatron.data.gpt_dataset]  > WARNING: could not find index map files, building on rank 0
    using:
     number of documents:       1567022
     number of epochs:          1
     sequence length:           4096
     total number of samples:   324782
[2025-03-12 09:13:49][W][utils/_logger:68:megatron.data.gpt_dataset]  > WARNING: could not find index map files, building on rank 0
    using:
     number of documents:       1284147
     number of epochs:          1
     sequence length:           4096
     total number of samples:   254471
[2025-03-12 09:13:49][W][utils/_logger:68:megatron.data.gpt_dataset]  > WARNING: could not find index map files, building on rank 0
    using:
     number of documents:       1923644
     number of epochs:          1
     sequence length:           4096
     total number of samples:   396586
[2025-03-12 09:13:49][W][utils/_logger:68:megatron.data.gpt_dataset]  > WARNING: could not find index map files, building on rank 0
    using:
     number of documents:       1673241
     number of epochs:          1
     sequence length:           4096
     total number of samples:   336782
[2025-03-12 09:13:49][W][utils/_logger:68:megatron.data.gpt_dataset]  > WARNING: could not find index map files, building on rank 0
    using:
     number of documents:       2103473
     number of epochs:          1
     sequence length:           4096
     total number of samples:   429258
[2025-03-12 09:13:50][W][utils/_logger:68:megatron.data.gpt_dataset]  > WARNING: could not find index map files, building on rank 0
    using:
     number of documents:       1987726
     number of epochs:          1
     sequence length:           4096
     total number of samples:   437762
[2025-03-12 09:13:50][W][utils/_logger:68:megatron.data.gpt_dataset]  > WARNING: could not find index map files, building on rank 0
    using:
     number of documents:       1940226
     number of epochs:          1
     sequence length:           4096
     total number of samples:   419521
[2025-03-12 09:13:50][W][utils/_logger:68:megatron.data.gpt_dataset]  > WARNING: could not find index map files, building on rank 0
    using:
     number of documents:       1810202
     number of epochs:          1
     sequence length:           4096
     total number of samples:   387484
[2025-03-12 09:13:51][W][utils/_logger:68:megatron.data.gpt_dataset]  > WARNING: could not find index map files, building on rank 0
    using:
     number of documents:       1834404
     number of epochs:          1
     sequence length:           4096
     total number of samples:   405380
[2025-03-12 09:13:51][W][utils/_logger:68:megatron.data.gpt_dataset]  > WARNING: could not find index map files, building on rank 0
    using:
     number of documents:       1529006
     number of epochs:          1
     sequence length:           4096
     total number of samples:   291698
[2025-03-12 09:13:51][W][utils/_logger:68:megatron.data.gpt_dataset]  > WARNING: could not find index map files, building on rank 0
    using:
     number of documents:       1439716
     number of epochs:          1
     sequence length:           4096
     total number of samples:   283551
[2025-03-12 09:13:51][W][utils/_logger:68:megatron.data.gpt_dataset]  > WARNING: could not find index map files, building on rank 0
    using:
     number of documents:       1952651
     number of epochs:          1
     sequence length:           4096
     total number of samples:   361801
[2025-03-12 09:13:52][W][utils/_logger:68:megatron.data.gpt_dataset]  > WARNING: could not find index map files, building on rank 0
    using:
     number of documents:       1938814
     number of epochs:          1
     sequence length:           4096
     total number of samples:   371649
[2025-03-12 09:13:52][W][utils/_logger:68:megatron.data.gpt_dataset]  > WARNING: could not find index map files, building on rank 0
    using:
     number of documents:       1225643
     number of epochs:          1
     sequence length:           4096
     total number of samples:   263051
[2025-03-12 09:13:52][W][utils/_logger:68:megatron.data.gpt_dataset]  > WARNING: could not find index map files, building on rank 0
    using:
     number of documents:       1189447
     number of epochs:          1
     sequence length:           4096
     total number of samples:   253377
[2025-03-12 09:13:52][W][utils/_logger:68:megatron.data.gpt_dataset]  > WARNING: could not find index map files, building on rank 0
    using:
     number of documents:       1216006
     number of epochs:          1
     sequence length:           4096
     total number of samples:   241829
[2025-03-12 09:13:52][W][utils/_logger:68:megatron.data.gpt_dataset]  > WARNING: could not find index map files, building on rank 0
    using:
     number of documents:       1500532
     number of epochs:          1
     sequence length:           4096
     total number of samples:   296694
[2025-03-12 09:13:53][W][utils/_logger:68:megatron.data.gpt_dataset]  > WARNING: could not find index map files, building on rank 0
    using:
     number of documents:       1485856
     number of epochs:          1
     sequence length:           4096
     total number of samples:   290219
[2025-03-12 09:13:53][W][utils/_logger:68:megatron.data.gpt_dataset]  > WARNING: could not find index map files, building on rank 0
    using:
     number of documents:       1738037
     number of epochs:          1
     sequence length:           4096
     total number of samples:   328277
[2025-03-12 09:13:53][W][utils/_logger:68:megatron.data.gpt_dataset]  > WARNING: could not find index map files, building on rank 0
    using:
     number of documents:       1584844
     number of epochs:          1
     sequence length:           4096
     total number of samples:   308502
[2025-03-12 09:13:53][W][utils/_logger:68:megatron.data.gpt_dataset]  > WARNING: could not find index map files, building on rank 0
    using:
     number of documents:       1504078
     number of epochs:          1
     sequence length:           4096
     total number of samples:   304730
[2025-03-12 09:13:54][W][utils/_logger:68:megatron.data.gpt_dataset]  > WARNING: could not find index map files, building on rank 0
    using:
     number of documents:       1932012
     number of epochs:          1
     sequence length:           4096
     total number of samples:   278333
[2025-03-12 09:13:54][W][utils/_logger:68:megatron.data.gpt_dataset]  > WARNING: could not find index map files, building on rank 0
    using:
     number of documents:       1326699
     number of epochs:          1
     sequence length:           4096
     total number of samples:   188164
[2025-03-12 09:13:54][W][utils/_logger:68:megatron.data.gpt_dataset]  > WARNING: could not find index map files, building on rank 0
    using:
     number of documents:       1478427
     number of epochs:          1
     sequence length:           4096
     total number of samples:   216844
[2025-03-12 09:13:54][W][utils/_logger:68:megatron.data.gpt_dataset]  > WARNING: could not find index map files, building on rank 0
    using:
     number of documents:       1870780
     number of epochs:          1
     sequence length:           4096
     total number of samples:   292973
[2025-03-12 09:13:55][W][utils/_logger:68:megatron.data.gpt_dataset]  > WARNING: could not find index map files, building on rank 0
    using:
     number of documents:       1346694
     number of epochs:          1
     sequence length:           4096
     total number of samples:   197333
[2025-03-12 09:13:55][W][utils/_logger:68:megatron.data.gpt_dataset]  > WARNING: could not find index map files, building on rank 0
    using:
     number of documents:       1867538
     number of epochs:          1
     sequence length:           4096
     total number of samples:   285221
[2025-03-12 09:13:55][W][utils/_logger:68:megatron.data.gpt_dataset]  > WARNING: could not find index map files, building on rank 0
    using:
     number of documents:       2180198
     number of epochs:          1
     sequence length:           4096
     total number of samples:   344117
[2025-03-12 09:13:55][W][utils/_logger:68:megatron.data.gpt_dataset]  > WARNING: could not find index map files, building on rank 0
    using:
     number of documents:       1878140
     number of epochs:          1
     sequence length:           4096
     total number of samples:   319221
[2025-03-12 09:13:56][W][utils/_logger:68:megatron.data.gpt_dataset]  > WARNING: could not find index map files, building on rank 0
    using:
     number of documents:       1083117
     number of epochs:          1
     sequence length:           4096
     total number of samples:   181387
[2025-03-12 09:13:56][W][utils/_logger:68:megatron.data.gpt_dataset]  > WARNING: could not find index map files, building on rank 0
    using:
     number of documents:       1047206
     number of epochs:          1
     sequence length:           4096
     total number of samples:   200809
[2025-03-12 09:13:56][W][utils/_logger:68:megatron.data.gpt_dataset]  > WARNING: could not find index map files, building on rank 0
    using:
     number of documents:       907049
     number of epochs:          1
     sequence length:           4096
     total number of samples:   188732
[2025-03-12 09:13:56][W][utils/_logger:68:megatron.data.gpt_dataset]  > WARNING: could not find index map files, building on rank 0
    using:
     number of documents:       992535
     number of epochs:          1
     sequence length:           4096
     total number of samples:   181178
[2025-03-12 09:13:56][W][utils/_logger:68:megatron.data.gpt_dataset]  > WARNING: could not find index map files, building on rank 0
    using:
     number of documents:       1351495
     number of epochs:          1
     sequence length:           4096
     total number of samples:   223210
[2025-03-12 09:13:56][W][utils/_logger:68:megatron.data.gpt_dataset]  > WARNING: could not find index map files, building on rank 0
    using:
     number of documents:       1639484
     number of epochs:          1
     sequence length:           4096
     total number of samples:   296497
[2025-03-12 09:13:57][W][utils/_logger:68:megatron.data.gpt_dataset]  > WARNING: could not find index map files, building on rank 0
    using:
     number of documents:       1701336
     number of epochs:          1
     sequence length:           4096
     total number of samples:   274972
[2025-03-12 09:13:57][W][utils/_logger:68:megatron.data.gpt_dataset]  > WARNING: could not find index map files, building on rank 0
    using:
     number of documents:       1080033
     number of epochs:          1
     sequence length:           4096
     total number of samples:   175149
[2025-03-12 09:13:57][W][utils/_logger:68:megatron.data.gpt_dataset]  > WARNING: could not find index map files, building on rank 0
    using:
     number of documents:       1892876
     number of epochs:          1
     sequence length:           4096
     total number of samples:   331007
[2025-03-12 09:13:57][W][utils/_logger:68:megatron.data.gpt_dataset]  > WARNING: could not find index map files, building on rank 0
    using:
     number of documents:       1898716
     number of epochs:          1
     sequence length:           4096
     total number of samples:   328440
[2025-03-12 09:13:58][W][utils/_logger:68:megatron.data.gpt_dataset]  > WARNING: could not find index map files, building on rank 0
    using:
     number of documents:       2323171
     number of epochs:          3
     sequence length:           4096
     total number of samples:   1234953
[2025-03-12 09:13:58][W][utils/_logger:68:megatron.data.gpt_dataset]  > WARNING: could not find index map files, building on rank 0
    using:
     number of documents:       1338962
     number of epochs:          1
     sequence length:           4096
     total number of samples:   251413
[2025-03-12 09:13:58][W][utils/_logger:68:megatron.data.gpt_dataset]  > WARNING: could not find index map files, building on rank 0
    using:
     number of documents:       1344419
     number of epochs:          1
     sequence length:           4096
     total number of samples:   255855
[2025-03-12 09:13:59][W][utils/_logger:68:megatron.data.gpt_dataset]  > WARNING: could not find index map files, building on rank 0
    using:
     number of documents:       1340310
     number of epochs:          1
     sequence length:           4096
     total number of samples:   254109
[2025-03-12 09:13:59][W][utils/_logger:68:megatron.data.gpt_dataset]  > WARNING: could not find index map files, building on rank 0
    using:
     number of documents:       1335427
     number of epochs:          1
     sequence length:           4096
     total number of samples:   249938
[2025-03-12 09:13:59][W][utils/_logger:68:megatron.data.gpt_dataset]  > WARNING: could not find index map files, building on rank 0
    using:
     number of documents:       1341637
     number of epochs:          1
     sequence length:           4096
     total number of samples:   254494
[2025-03-12 09:13:59][W][utils/_logger:68:megatron.data.gpt_dataset]  > WARNING: could not find index map files, building on rank 0
    using:
     number of documents:       1318060
     number of epochs:          1
     sequence length:           4096
     total number of samples:   249232
[2025-03-12 09:14:00][W][utils/_logger:68:megatron.data.gpt_dataset]  > WARNING: could not find index map files, building on rank 0
    using:
     number of documents:       1325614
     number of epochs:          1
     sequence length:           4096
     total number of samples:   252013
[2025-03-12 09:14:00][W][utils/_logger:68:megatron.data.gpt_dataset]  > WARNING: could not find index map files, building on rank 0
    using:
     number of documents:       1319093
     number of epochs:          1
     sequence length:           4096
     total number of samples:   253530
[2025-03-12 09:14:00][W][utils/_logger:68:megatron.data.gpt_dataset]  > WARNING: could not find index map files, building on rank 0
    using:
     number of documents:       1314383
     number of epochs:          1
     sequence length:           4096
     total number of samples:   250341
[2025-03-12 09:14:00][W][utils/_logger:68:megatron.data.gpt_dataset]  > WARNING: could not find index map files, building on rank 0
    using:
     number of documents:       1323808
     number of epochs:          1
     sequence length:           4096
     total number of samples:   253309
[2025-03-12 09:14:00][W][utils/_logger:68:megatron.data.gpt_dataset]  > WARNING: could not find index map files, building on rank 0
    using:
     number of documents:       1299771
     number of epochs:          1
     sequence length:           4096
     total number of samples:   246119
[2025-03-12 09:14:01][W][utils/_logger:68:megatron.data.gpt_dataset]  > WARNING: could not find index map files, building on rank 0
    using:
     number of documents:       1302356
     number of epochs:          1
     sequence length:           4096
     total number of samples:   252107
[2025-03-12 09:14:01][W][utils/_logger:68:megatron.data.gpt_dataset]  > WARNING: could not find index map files, building on rank 0
    using:
     number of documents:       1306591
     number of epochs:          1
     sequence length:           4096
     total number of samples:   250995
[2025-03-12 09:14:01][W][utils/_logger:68:megatron.data.gpt_dataset]  > WARNING: could not find index map files, building on rank 0
    using:
     number of documents:       1303414
     number of epochs:          1
     sequence length:           4096
     total number of samples:   248234
[2025-03-12 09:14:01][W][utils/_logger:68:megatron.data.gpt_dataset]  > WARNING: could not find index map files, building on rank 0
    using:
     number of documents:       1298642
     number of epochs:          1
     sequence length:           4096
     total number of samples:   250193
[2025-03-12 09:14:01][W][utils/_logger:68:megatron.data.gpt_dataset]  > WARNING: could not find index map files, building on rank 0
    using:
     number of documents:       1309217
     number of epochs:          1
     sequence length:           4096
     total number of samples:   250386
[2025-03-12 09:14:02][W][utils/_logger:68:megatron.data.gpt_dataset]  > WARNING: could not find index map files, building on rank 0
    using:
     number of documents:       1284962
     number of epochs:          1
     sequence length:           4096
     total number of samples:   247510
[2025-03-12 09:14:02][W][utils/_logger:68:megatron.data.gpt_dataset]  > WARNING: could not find index map files, building on rank 0
    using:
     number of documents:       1290348
     number of epochs:          1
     sequence length:           4096
     total number of samples:   247609
[2025-03-12 09:14:02][W][utils/_logger:68:megatron.data.gpt_dataset]  > WARNING: could not find index map files, building on rank 0
    using:
     number of documents:       1298179
     number of epochs:          1
     sequence length:           4096
     total number of samples:   251943
[2025-03-12 09:14:02][W][utils/_logger:68:megatron.data.gpt_dataset]  > WARNING: could not find index map files, building on rank 0
    using:
     number of documents:       1291877
     number of epochs:          1
     sequence length:           4096
     total number of samples:   248300
[2025-03-12 09:14:02][W][utils/_logger:68:megatron.data.gpt_dataset]  > WARNING: could not find index map files, building on rank 0
    using:
     number of documents:       1299821
     number of epochs:          1
     sequence length:           4096
     total number of samples:   256596
[2025-03-12 09:14:03][W][utils/_logger:68:megatron.data.gpt_dataset]  > WARNING: could not find index map files, building on rank 0
    using:
     number of documents:       6151
     number of epochs:          3
     sequence length:           4096
     total number of samples:   6638
[2025-03-12 09:14:03][W][utils/_logger:68:megatron.data.gpt_dataset]  > WARNING: could not find index map files, building on rank 0
    using:
     number of documents:       6321
     number of epochs:          3
     sequence length:           4096
     total number of samples:   7220
[2025-03-12 09:14:03][W][utils/_logger:68:megatron.data.gpt_dataset]  > WARNING: could not find index map files, building on rank 0
    using:
     number of documents:       23226
     number of epochs:          3
     sequence length:           4096
     total number of samples:   29261
[2025-03-12 09:14:03][W][utils/_logger:68:megatron.data.gpt_dataset]  > WARNING: could not find index map files, building on rank 0
    using:
     number of documents:       26873
     number of epochs:          3
     sequence length:           4096
     total number of samples:   35083
[2025-03-12 09:14:03][W][utils/_logger:68:megatron.data.gpt_dataset]  > WARNING: could not find index map files, building on rank 0
    using:
     number of documents:       12268
     number of epochs:          3
     sequence length:           4096
     total number of samples:   15612
[2025-03-12 09:14:03][W][utils/_logger:68:megatron.data.gpt_dataset]  > WARNING: could not find index map files, building on rank 0
    using:
     number of documents:       8603
     number of epochs:          3
     sequence length:           4096
     total number of samples:   9954
[2025-03-12 09:14:03][W][utils/_logger:68:megatron.data.gpt_dataset]  > WARNING: could not find index map files, building on rank 0
    using:
     number of documents:       4452
     number of epochs:          3
     sequence length:           4096
     total number of samples:   5007
[2025-03-12 09:14:03][W][utils/_logger:68:megatron.data.gpt_dataset]  > WARNING: could not find index map files, building on rank 0
    using:
     number of documents:       9868
     number of epochs:          3
     sequence length:           4096
     total number of samples:   10321
[2025-03-12 09:14:03][W][utils/_logger:68:megatron.data.gpt_dataset]  > WARNING: could not find index map files, building on rank 0
    using:
     number of documents:       7576
     number of epochs:          3
     sequence length:           4096
     total number of samples:   8769
[2025-03-12 09:14:03][W][utils/_logger:68:megatron.data.gpt_dataset]  > WARNING: could not find index map files, building on rank 0
    using:
     number of documents:       3397
     number of epochs:          3
     sequence length:           4096
     total number of samples:   3502
[2025-03-12 09:14:03][W][utils/_logger:68:megatron.data.gpt_dataset]  > WARNING: could not find index map files, building on rank 0
    using:
     number of documents:       493755
     number of epochs:          3
     sequence length:           4096
     total number of samples:   841284
[2025-03-12 09:14:03][W][utils/_logger:68:megatron.data.gpt_dataset]  > WARNING: could not find index map files, building on rank 0
    using:
     number of documents:       488661
     number of epochs:          3
     sequence length:           4096
     total number of samples:   2550288
[2025-03-12 09:14:04][W][utils/_logger:68:megatron.data.gpt_dataset]  > WARNING: could not find index map files, building on rank 0
    using:
     number of documents:       4974886
     number of epochs:          2
     sequence length:           4096
     total number of samples:   549391
[2025-03-12 09:14:05][W][utils/_logger:68:megatron.data.gpt_dataset]  > WARNING: could not find index map files, building on rank 0
    using:
     number of documents:       4992910
     number of epochs:          2
     sequence length:           4096
     total number of samples:   551776
[2025-03-12 09:14:05][W][utils/_logger:68:megatron.data.gpt_dataset]  > WARNING: could not find index map files, building on rank 0
    using:
     number of documents:       4937663
     number of epochs:          2
     sequence length:           4096
     total number of samples:   542815
[2025-03-12 09:14:06][W][utils/_logger:68:megatron.data.gpt_dataset]  > WARNING: could not find index map files, building on rank 0
    using:
     number of documents:       967341
     number of epochs:          3
     sequence length:           4096
     total number of samples:   521329
[2025-03-12 09:14:06][W][utils/_logger:68:megatron.data.gpt_dataset]  > WARNING: could not find index map files, building on rank 0
    using:
     number of documents:       3264225
     number of epochs:          2
     sequence length:           4096
     total number of samples:   3815008
[2025-03-12 09:14:07][W][utils/_logger:68:megatron.data.gpt_dataset]  > WARNING: could not find index map files, building on rank 0
    using:
     number of documents:       5060412
     number of epochs:          2
     sequence length:           4096
     total number of samples:   2823789
[2025-03-12 09:14:08][W][utils/_logger:68:megatron.data.gpt_dataset]  > WARNING: could not find index map files, building on rank 0
    using:
     number of documents:       494774
     number of epochs:          3
     sequence length:           4096
     total number of samples:   207587
[2025-03-12 09:14:08][W][utils/_logger:68:megatron.data.gpt_dataset]  > WARNING: could not find index map files, building on rank 0
    using:
     number of documents:       1390327
     number of epochs:          3
     sequence length:           4096
     total number of samples:   177457
[2025-03-12 09:14:08][W][utils/_logger:68:megatron.data.gpt_dataset]  > WARNING: could not find index map files, building on rank 0
    using:
     number of documents:       198000
     number of epochs:          3
     sequence length:           4096
     total number of samples:   156655
> building indices for blendable datasets ...
 > sample ratios:
   dataset 0, input: 0.108499, achieved: 0.108499
   dataset 1, input: 0.103053, achieved: 0.103053
   dataset 2, input: 0.085475, achieved: 0.085475
   dataset 3, input: 0.0433843, achieved: 0.0433843
   dataset 4, input: 0.0113768, achieved: 0.0113768
   dataset 5, input: 0.0527751, achieved: 0.0527751
   dataset 6, input: 0.00885526, achieved: 0.00885526
   dataset 7, input: 0.0852543, achieved: 0.0852543
   dataset 8, input: 0.0730516, achieved: 0.0730516
   dataset 9, input: 0.0799137, achieved: 0.0799137
   dataset 10, input: 0.0413844, achieved: 0.0413844
   dataset 11, input: 0.0496325, achieved: 0.0496325
   dataset 12, input: 0.011625, achieved: 0.011625
   dataset 13, input: 0.032061, achieved: 0.032061
   dataset 14, input: 0.106373, achieved: 0.106373
   dataset 15, input: 0.107286, achieved: 0.107286
[2025-03-12 09:14:09][I][data/gpt_dataset:191:megatron.data.gpt_dataset] [BuildConcatDataset] Caught args.shuffle_sample_in_corpus=True across 8376602 samples
> building indices for blendable datasets ...
 > sample ratios:
   dataset 0, input: 0.00859629, achieved: 0.00859629
   dataset 1, input: 0.00880482, achieved: 0.00880482
   dataset 2, input: 0.0105313, achieved: 0.0105313
   dataset 3, input: 0.0097168, achieved: 0.0097168
   dataset 4, input: 0.00944725, achieved: 0.00944725
   dataset 5, input: 0.0101287, achieved: 0.0101287
   dataset 6, input: 0.0105225, achieved: 0.0105225
   dataset 7, input: 0.0102593, achieved: 0.0102593
   dataset 8, input: 0.0105674, achieved: 0.0105674
   dataset 9, input: 0.00843697, achieved: 0.00843697
   dataset 10, input: 0.0102051, achieved: 0.0102051
   dataset 11, input: 0.00864062, achieved: 0.00864062
   dataset 12, input: 0.012604, achieved: 0.012604
   dataset 13, input: 0.0093038, achieved: 0.0093038
   dataset 14, input: 0.0111696, achieved: 0.0111696
   dataset 15, input: 0.0101548, achieved: 0.0101548
   dataset 16, input: 0.0107164, achieved: 0.0107164
   dataset 17, input: 0.0110761, achieved: 0.0110761
   dataset 18, input: 0.0103199, achieved: 0.0103199
   dataset 19, input: 0.0107663, achieved: 0.0107663
   dataset 20, input: 0.0116292, achieved: 0.0116292
   dataset 21, input: 0.00938725, achieved: 0.00938725
   dataset 22, input: 0.0101135, achieved: 0.0101135
   dataset 23, input: 0.00983307, achieved: 0.00983307
   dataset 24, input: 0.00962867, achieved: 0.00962867
   dataset 25, input: 0.00957125, achieved: 0.00957125
   dataset 26, input: 0.0097747, achieved: 0.0097747
   dataset 27, input: 0.00901967, achieved: 0.00901967
   dataset 28, input: 0.0103566, achieved: 0.0103566
   dataset 29, input: 0.00999056, achieved: 0.00999056
   dataset 30, input: 0.0124184, achieved: 0.0124184
   dataset 31, input: 0.00891079, achieved: 0.00891079
   dataset 32, input: 0.00931397, achieved: 0.00931397
   dataset 33, input: 0.0114225, achieved: 0.0114225
   dataset 34, input: 0.0119184, achieved: 0.0119184
   dataset 35, input: 0.0103449, achieved: 0.0103449
   dataset 36, input: 0.00920292, achieved: 0.00920292
   dataset 37, input: 0.0100794, achieved: 0.0100794
   dataset 38, input: 0.00899384, achieved: 0.00899384
   dataset 39, input: 0.0100108, achieved: 0.0100108
   dataset 40, input: 0.0094962, achieved: 0.0094962
   dataset 41, input: 0.00916875, achieved: 0.00916875
   dataset 42, input: 0.0105867, achieved: 0.0105867
   dataset 43, input: 0.0110166, achieved: 0.0110166
   dataset 44, input: 0.00956528, achieved: 0.00956528
   dataset 45, input: 0.0100959, achieved: 0.0100959
   dataset 46, input: 0.0111119, achieved: 0.0111119
   dataset 47, input: 0.00861405, achieved: 0.00861405
   dataset 48, input: 0.00969287, achieved: 0.00969287
   dataset 49, input: 0.00888462, achieved: 0.00888462
   dataset 50, input: 0.0106551, achieved: 0.0106551
   dataset 51, input: 0.0107086, achieved: 0.0107086
   dataset 52, input: 0.0105182, achieved: 0.0105182
   dataset 53, input: 0.0105936, achieved: 0.0105936
   dataset 54, input: 0.0101075, achieved: 0.0101075
   dataset 55, input: 0.0106141, achieved: 0.0106141
   dataset 56, input: 0.00844348, achieved: 0.00844348
   dataset 57, input: 0.0100399, achieved: 0.0100399
   dataset 58, input: 0.00954325, achieved: 0.00954325
   dataset 59, input: 0.0104015, achieved: 0.0104015
   dataset 60, input: 0.011547, achieved: 0.011547
   dataset 61, input: 0.00886638, achieved: 0.00886638
   dataset 62, input: 0.0115073, achieved: 0.0115073
   dataset 63, input: 0.00804098, achieved: 0.00804098
   dataset 64, input: 0.0102777, achieved: 0.0102777
   dataset 65, input: 0.00969355, achieved: 0.00969355
   dataset 66, input: 0.00880428, achieved: 0.00880428
   dataset 67, input: 0.0101621, achieved: 0.0101621
   dataset 68, input: 0.0106685, achieved: 0.0106685
   dataset 69, input: 0.010303, achieved: 0.010303
   dataset 70, input: 0.00776017, achieved: 0.00776017
   dataset 71, input: 0.0101559, achieved: 0.0101559
   dataset 72, input: 0.0117694, achieved: 0.0117694
   dataset 73, input: 0.00965538, achieved: 0.00965538
   dataset 74, input: 0.00980263, achieved: 0.00980263
   dataset 75, input: 0.00957104, achieved: 0.00957104
   dataset 76, input: 0.0102657, achieved: 0.0102657
   dataset 77, input: 0.0101735, achieved: 0.0101735
   dataset 78, input: 0.00952515, achieved: 0.00952515
   dataset 79, input: 0.0095482, achieved: 0.0095482
   dataset 80, input: 0.00878903, achieved: 0.00878903
   dataset 81, input: 0.00989416, achieved: 0.00989416
   dataset 82, input: 0.0107253, achieved: 0.0107253
   dataset 83, input: 0.0105408, achieved: 0.0105408
   dataset 84, input: 0.0103657, achieved: 0.0103657
   dataset 85, input: 0.0113504, achieved: 0.0113504
   dataset 86, input: 0.00890882, achieved: 0.00890882
   dataset 87, input: 0.0101038, achieved: 0.0101038
   dataset 88, input: 0.00923078, achieved: 0.00923078
   dataset 89, input: 0.00903187, achieved: 0.00903187
   dataset 90, input: 0.00932956, achieved: 0.00932956
   dataset 91, input: 0.0107644, achieved: 0.0107644
   dataset 92, input: 0.010269, achieved: 0.010269
   dataset 93, input: 0.0113082, achieved: 0.0113082
   dataset 94, input: 0.010295, achieved: 0.010295
   dataset 95, input: 0.00908536, achieved: 0.00908536
   dataset 96, input: 0.00956054, achieved: 0.00956054
   dataset 97, input: 0.0103095, achieved: 0.0103095
   dataset 98, input: 0.00869723, achieved: 0.00869723
   dataset 99, input: 0.00959599, achieved: 0.00959599
[2025-03-12 09:14:11][I][data/gpt_dataset:191:megatron.data.gpt_dataset] [BuildConcatDataset] Caught args.shuffle_sample_in_corpus=True across 14750323 samples
> building indices for blendable datasets ...
 > sample ratios:
   dataset 0, input: 0.430653, achieved: 0.430653
   dataset 1, input: 0.430584, achieved: 0.430584
   dataset 2, input: 0.138763, achieved: 0.138763
[2025-03-12 09:14:11][I][data/gpt_dataset:191:megatron.data.gpt_dataset] [BuildConcatDataset] Caught args.shuffle_sample_in_corpus=True across 3535268 samples
> building indices for blendable datasets ...
 > sample ratios:
   dataset 0, input: 0.00616816, achieved: 0.00616816
   dataset 1, input: 0.00616445, achieved: 0.00616445
   dataset 2, input: 0.00616806, achieved: 0.00616806
   dataset 3, input: 0.00616984, achieved: 0.00616984
   dataset 4, input: 0.00616639, achieved: 0.00616639
   dataset 5, input: 0.00616795, achieved: 0.00616795
   dataset 6, input: 0.00617955, achieved: 0.00617955
   dataset 7, input: 0.00615845, achieved: 0.00615845
   dataset 8, input: 0.0061702, achieved: 0.0061702
   dataset 9, input: 0.00617057, achieved: 0.00617057
   dataset 10, input: 0.00615918, achieved: 0.00615918
   dataset 11, input: 0.00617542, achieved: 0.00617542
   dataset 12, input: 0.00617307, achieved: 0.00617307
   dataset 13, input: 0.00617412, achieved: 0.00617412
   dataset 14, input: 0.00617736, achieved: 0.00617736
   dataset 15, input: 0.00616968, achieved: 0.00616968
   dataset 16, input: 0.00618812, achieved: 0.00618812
   dataset 17, input: 0.00618927, achieved: 0.00618927
   dataset 18, input: 0.00615944, achieved: 0.00615944
   dataset 19, input: 0.00615667, achieved: 0.00615667
   dataset 20, input: 0.00614784, achieved: 0.00614784
   dataset 21, input: 0.0061538, achieved: 0.0061538
   dataset 22, input: 0.0061561, achieved: 0.0061561
   dataset 23, input: 0.00617475, achieved: 0.00617475
   dataset 24, input: 0.00617266, achieved: 0.00617266
   dataset 25, input: 0.00615751, achieved: 0.00615751
   dataset 26, input: 0.00617198, achieved: 0.00617198
   dataset 27, input: 0.00617448, achieved: 0.00617448
   dataset 28, input: 0.00617276, achieved: 0.00617276
   dataset 29, input: 0.00616289, achieved: 0.00616289
   dataset 30, input: 0.00618148, achieved: 0.00618148
   dataset 31, input: 0.00605089, achieved: 0.00605089
   dataset 32, input: 0.00601652, achieved: 0.00601652
   dataset 33, input: 0.00600649, achieved: 0.00600649
   dataset 34, input: 0.00600017, achieved: 0.00600017
   dataset 35, input: 0.0060207, achieved: 0.0060207
   dataset 36, input: 0.00600299, achieved: 0.00600299
   dataset 37, input: 0.00600388, achieved: 0.00600388
   dataset 38, input: 0.00600984, achieved: 0.00600984
   dataset 39, input: 0.00599234, achieved: 0.00599234
   dataset 40, input: 0.00601438, achieved: 0.00601438
   dataset 41, input: 0.00599558, achieved: 0.00599558
   dataset 42, input: 0.00599923, achieved: 0.00599923
   dataset 43, input: 0.0059997, achieved: 0.0059997
   dataset 44, input: 0.00598523, achieved: 0.00598523
   dataset 45, input: 0.00599343, achieved: 0.00599343
   dataset 46, input: 0.00599537, achieved: 0.00599537
   dataset 47, input: 0.00598878, achieved: 0.00598878
   dataset 48, input: 0.00600367, achieved: 0.00600367
   dataset 49, input: 0.00600351, achieved: 0.00600351
   dataset 50, input: 0.00598993, achieved: 0.00598993
   dataset 51, input: 0.00598414, achieved: 0.00598414
   dataset 52, input: 0.00599113, achieved: 0.00599113
   dataset 53, input: 0.00599808, achieved: 0.00599808
   dataset 54, input: 0.00598748, achieved: 0.00598748
   dataset 55, input: 0.00598225, achieved: 0.00598225
   dataset 56, input: 0.00599443, achieved: 0.00599443
   dataset 57, input: 0.00597301, achieved: 0.00597301
   dataset 58, input: 0.0059926, achieved: 0.0059926
   dataset 59, input: 0.0059787, achieved: 0.0059787
   dataset 60, input: 0.00597416, achieved: 0.00597416
   dataset 61, input: 0.00598325, achieved: 0.00598325
   dataset 62, input: 0.00594673, achieved: 0.00594673
   dataset 63, input: 0.00590181, achieved: 0.00590181
   dataset 64, input: 0.00589204, achieved: 0.00589204
   dataset 65, input: 0.00587679, achieved: 0.00587679
   dataset 66, input: 0.00587689, achieved: 0.00587689
   dataset 67, input: 0.00587595, achieved: 0.00587595
   dataset 68, input: 0.00586963, achieved: 0.00586963
   dataset 69, input: 0.00587642, achieved: 0.00587642
   dataset 70, input: 0.00586509, achieved: 0.00586509
   dataset 71, input: 0.00586128, achieved: 0.00586128
   dataset 72, input: 0.00587972, achieved: 0.00587972
   dataset 73, input: 0.00587454, achieved: 0.00587454
   dataset 74, input: 0.00587433, achieved: 0.00587433
   dataset 75, input: 0.00587214, achieved: 0.00587214
   dataset 76, input: 0.00588196, achieved: 0.00588196
   dataset 77, input: 0.00587125, achieved: 0.00587125
   dataset 78, input: 0.00588123, achieved: 0.00588123
   dataset 79, input: 0.00588619, achieved: 0.00588619
   dataset 80, input: 0.00585851, achieved: 0.00585851
   dataset 81, input: 0.00587601, achieved: 0.00587601
   dataset 82, input: 0.00585788, achieved: 0.00585788
   dataset 83, input: 0.00585673, achieved: 0.00585673
   dataset 84, input: 0.00586911, achieved: 0.00586911
   dataset 85, input: 0.00585354, achieved: 0.00585354
   dataset 86, input: 0.00586791, achieved: 0.00586791
   dataset 87, input: 0.00584618, achieved: 0.00584618
   dataset 88, input: 0.00585119, achieved: 0.00585119
   dataset 89, input: 0.00587183, achieved: 0.00587183
   dataset 90, input: 0.00586404, achieved: 0.00586404
   dataset 91, input: 0.0058513, achieved: 0.0058513
   dataset 92, input: 0.00586222, achieved: 0.00586222
   dataset 93, input: 0.00584843, achieved: 0.00584843
   dataset 94, input: 0.00579258, achieved: 0.00579258
   dataset 95, input: 0.00578355, achieved: 0.00578355
   dataset 96, input: 0.00579081, achieved: 0.00579081
   dataset 97, input: 0.00578491, achieved: 0.00578491
   dataset 98, input: 0.00578632, achieved: 0.00578632
   dataset 99, input: 0.00576976, achieved: 0.00576976
   dataset 100, input: 0.00578412, achieved: 0.00578412
   dataset 101, input: 0.00578376, achieved: 0.00578376
   dataset 102, input: 0.00576871, achieved: 0.00576871
   dataset 103, input: 0.00577383, achieved: 0.00577383
   dataset 104, input: 0.00577571, achieved: 0.00577571
   dataset 105, input: 0.00575341, achieved: 0.00575341
   dataset 106, input: 0.00575743, achieved: 0.00575743
   dataset 107, input: 0.00575581, achieved: 0.00575581
   dataset 108, input: 0.00575414, achieved: 0.00575414
   dataset 109, input: 0.00576798, achieved: 0.00576798
   dataset 110, input: 0.00575571, achieved: 0.00575571
   dataset 111, input: 0.00576359, achieved: 0.00576359
   dataset 112, input: 0.00575879, achieved: 0.00575879
   dataset 113, input: 0.00575555, achieved: 0.00575555
   dataset 114, input: 0.00576056, achieved: 0.00576056
   dataset 115, input: 0.00575879, achieved: 0.00575879
   dataset 116, input: 0.00574986, achieved: 0.00574986
   dataset 117, input: 0.0057614, achieved: 0.0057614
   dataset 118, input: 0.00575926, achieved: 0.00575926
   dataset 119, input: 0.00573267, achieved: 0.00573267
   dataset 120, input: 0.00575701, achieved: 0.00575701
   dataset 121, input: 0.00574986, achieved: 0.00574986
   dataset 122, input: 0.00575999, achieved: 0.00575999
   dataset 123, input: 0.0057555, achieved: 0.0057555
   dataset 124, input: 0.00575644, achieved: 0.00575644
   dataset 125, input: 0.00571449, achieved: 0.00571449
   dataset 126, input: 0.00570007, achieved: 0.00570007
   dataset 127, input: 0.00568456, achieved: 0.00568456
   dataset 128, input: 0.0057041, achieved: 0.0057041
   dataset 129, input: 0.00567411, achieved: 0.00567411
   dataset 130, input: 0.00567672, achieved: 0.00567672
   dataset 131, input: 0.00568346, achieved: 0.00568346
   dataset 132, input: 0.00567636, achieved: 0.00567636
   dataset 133, input: 0.00566508, achieved: 0.00566508
   dataset 134, input: 0.00567949, achieved: 0.00567949
   dataset 135, input: 0.00567396, achieved: 0.00567396
   dataset 136, input: 0.00568007, achieved: 0.00568007
   dataset 137, input: 0.00567605, achieved: 0.00567605
   dataset 138, input: 0.0056704, achieved: 0.0056704
   dataset 139, input: 0.00566962, achieved: 0.00566962
   dataset 140, input: 0.00565917, achieved: 0.00565917
   dataset 141, input: 0.0056633, achieved: 0.0056633
   dataset 142, input: 0.00566278, achieved: 0.00566278
   dataset 143, input: 0.00565437, achieved: 0.00565437
   dataset 144, input: 0.0056667, achieved: 0.0056667
   dataset 145, input: 0.00567589, achieved: 0.00567589
   dataset 146, input: 0.0056609, achieved: 0.0056609
   dataset 147, input: 0.00565562, achieved: 0.00565562
   dataset 148, input: 0.00565609, achieved: 0.00565609
   dataset 149, input: 0.00565985, achieved: 0.00565985
   dataset 150, input: 0.00566283, achieved: 0.00566283
   dataset 151, input: 0.00566377, achieved: 0.00566377
   dataset 152, input: 0.00566027, achieved: 0.00566027
   dataset 153, input: 0.00566137, achieved: 0.00566137
   dataset 154, input: 0.00565834, achieved: 0.00565834
   dataset 155, input: 0.00565202, achieved: 0.00565202
   dataset 156, input: 0.00566074, achieved: 0.00566074
   dataset 157, input: 0.00563488, achieved: 0.00563488
   dataset 158, input: 0.00561373, achieved: 0.00561373
   dataset 159, input: 0.00561127, achieved: 0.00561127
   dataset 160, input: 0.00560751, achieved: 0.00560751
   dataset 161, input: 0.0056038, achieved: 0.0056038
   dataset 162, input: 0.00560271, achieved: 0.00560271
   dataset 163, input: 0.00561707, achieved: 0.00561707
   dataset 164, input: 0.00561383, achieved: 0.00561383
   dataset 165, input: 0.00560652, achieved: 0.00560652
   dataset 166, input: 0.00558949, achieved: 0.00558949
   dataset 167, input: 0.00560913, achieved: 0.00560913
   dataset 168, input: 0.0056013, achieved: 0.0056013
   dataset 169, input: 0.00559644, achieved: 0.00559644
   dataset 170, input: 0.00186901, achieved: 0.00186901
[2025-03-12 09:14:16][I][data/gpt_dataset:191:megatron.data.gpt_dataset] [BuildConcatDataset] Caught args.shuffle_sample_in_corpus=True across 19143785 samples
> building indices for blendable datasets ...
 > sample ratios:
   dataset 0, input: 0.00105726, achieved: 0.00105726
   dataset 1, input: 0.0010793, achieved: 0.0010793
   dataset 2, input: 0.00109673, achieved: 0.00109673
   dataset 3, input: 0.00109395, achieved: 0.00109395
   dataset 4, input: 0.00109939, achieved: 0.00109939
   dataset 5, input: 0.00107491, achieved: 0.00107491
   dataset 6, input: 0.00108898, achieved: 0.00108898
   dataset 7, input: 0.000683736, achieved: 0.000683736
   dataset 8, input: 0.00110611, achieved: 0.00110611
   dataset 9, input: 0.0011095, achieved: 0.0011095
   dataset 10, input: 0.00108457, achieved: 0.00108457
   dataset 11, input: 0.0011302, achieved: 0.0011302
   dataset 12, input: 0.00113902, achieved: 0.00113902
   dataset 13, input: 0.00108227, achieved: 0.00108227
   dataset 14, input: 0.00110647, achieved: 0.00110647
   dataset 15, input: 0.00112896, achieved: 0.00112896
   dataset 16, input: 0.00111692, achieved: 0.00111692
   dataset 17, input: 0.00118773, achieved: 0.00118773
   dataset 18, input: 0.00117348, achieved: 0.00117348
   dataset 19, input: 0.00110704, achieved: 0.00110704
   dataset 20, input: 0.00110023, achieved: 0.00110023
   dataset 21, input: 0.0011074, achieved: 0.0011074
   dataset 22, input: 0.00125367, achieved: 0.00125367
   dataset 23, input: 0.0012452, achieved: 0.0012452
   dataset 24, input: 0.000702815, achieved: 0.000702815
   dataset 25, input: 0.00111163, achieved: 0.00111163
   dataset 26, input: 0.00116457, achieved: 0.00116457
   dataset 27, input: 0.00114034, achieved: 0.00114034
   dataset 28, input: 0.00108085, achieved: 0.00108085
   dataset 29, input: 0.00114094, achieved: 0.00114094
   dataset 30, input: 0.000691019, achieved: 0.000691019
   dataset 31, input: 0.000760015, achieved: 0.000760015
   dataset 32, input: 0.0010096, achieved: 0.0010096
   dataset 33, input: 0.00129976, achieved: 0.00129976
   dataset 34, input: 0.00073602, achieved: 0.00073602
   dataset 35, input: 0.00109682, achieved: 0.00109682
   dataset 36, input: 0.000696862, achieved: 0.000696862
   dataset 37, input: 0.001137, achieved: 0.001137
   dataset 38, input: 0.00115319, achieved: 0.00115319
   dataset 39, input: 0.000760245, achieved: 0.000760245
   dataset 40, input: 0.000760406, achieved: 0.000760406
   dataset 41, input: 0.00118783, achieved: 0.00118783
   dataset 42, input: 0.00119724, achieved: 0.00119724
   dataset 43, input: 0.00127187, achieved: 0.00127187
   dataset 44, input: 0.000751944, achieved: 0.000751944
   dataset 45, input: 0.000882526, achieved: 0.000882526
   dataset 46, input: 0.000878318, achieved: 0.000878318
   dataset 47, input: 0.00129593, achieved: 0.00129593
   dataset 48, input: 0.000647071, achieved: 0.000647071
   dataset 49, input: 0.00107732, achieved: 0.00107732
   dataset 50, input: 0.00110707, achieved: 0.00110707
   dataset 51, input: 0.00127477, achieved: 0.00127477
   dataset 52, input: 0.00135993, achieved: 0.00135993
   dataset 53, input: 0.00111424, achieved: 0.00111424
   dataset 54, input: 0.00112836, achieved: 0.00112836
   dataset 55, input: 0.00107217, achieved: 0.00107217
   dataset 56, input: 0.00111444, achieved: 0.00111444
   dataset 57, input: 0.00109946, achieved: 0.00109946
   dataset 58, input: 0.00109523, achieved: 0.00109523
   dataset 59, input: 0.00110735, achieved: 0.00110735
   dataset 60, input: 0.00110198, achieved: 0.00110198
   dataset 61, input: 0.00112875, achieved: 0.00112875
   dataset 62, input: 0.00110829, achieved: 0.00110829
   dataset 63, input: 0.00111008, achieved: 0.00111008
   dataset 64, input: 0.00110857, achieved: 0.00110857
   dataset 65, input: 0.00109657, achieved: 0.00109657
   dataset 66, input: 0.00110117, achieved: 0.00110117
   dataset 67, input: 0.0010552, achieved: 0.0010552
   dataset 68, input: 0.00102554, achieved: 0.00102554
   dataset 69, input: 0.000947303, achieved: 0.000947303
   dataset 70, input: 0.000869004, achieved: 0.000869004
   dataset 71, input: 0.00111486, achieved: 0.00111486
   dataset 72, input: 0.000803387, achieved: 0.000803387
   dataset 73, input: 0.000821849, achieved: 0.000821849
   dataset 74, input: 0.000815505, achieved: 0.000815505
   dataset 75, input: 0.000801925, achieved: 0.000801925
   dataset 76, input: 0.000820047, achieved: 0.000820047
   dataset 77, input: 0.000797037, achieved: 0.000797037
   dataset 78, input: 0.000825752, achieved: 0.000825752
   dataset 79, input: 0.000809921, achieved: 0.000809921
   dataset 80, input: 0.000809449, achieved: 0.000809449
   dataset 81, input: 0.000808309, achieved: 0.000808309
   dataset 82, input: 0.000798545, achieved: 0.000798545
   dataset 83, input: 0.000805062, achieved: 0.000805062
   dataset 84, input: 0.000795799, achieved: 0.000795799
   dataset 85, input: 0.000741719, achieved: 0.000741719
   dataset 86, input: 0.000721202, achieved: 0.000721202
   dataset 87, input: 0.00073423, achieved: 0.00073423
   dataset 88, input: 0.000713056, achieved: 0.000713056
   dataset 89, input: 0.000717155, achieved: 0.000717155
   dataset 90, input: 0.000723608, achieved: 0.000723608
   dataset 91, input: 0.000740332, achieved: 0.000740332
   dataset 92, input: 0.000739256, achieved: 0.000739256
   dataset 93, input: 0.000734875, achieved: 0.000734875
   dataset 94, input: 0.000711933, achieved: 0.000711933
   dataset 95, input: 0.000912468, achieved: 0.000912468
   dataset 96, input: 0.00105344, achieved: 0.00105344
   dataset 97, input: 0.00107572, achieved: 0.00107572
   dataset 98, input: 0.00102162, achieved: 0.00102162
   dataset 99, input: 0.00103541, achieved: 0.00103541
   dataset 100, input: 0.00104079, achieved: 0.00104079
   dataset 101, input: 0.001014, achieved: 0.001014
   dataset 102, input: 0.00102785, achieved: 0.00102785
   dataset 103, input: 0.00101656, achieved: 0.00101656
   dataset 104, input: 0.00102919, achieved: 0.00102919
   dataset 105, input: 0.00103638, achieved: 0.00103638
   dataset 106, input: 0.00102483, achieved: 0.00102483
   dataset 107, input: 0.000989293, achieved: 0.000989293
   dataset 108, input: 0.000995545, achieved: 0.000995545
   dataset 109, input: 0.00100035, achieved: 0.00100035
   dataset 110, input: 0.00102496, achieved: 0.00102496
   dataset 111, input: 0.00106011, achieved: 0.00106011
   dataset 112, input: 0.00107074, achieved: 0.00107074
   dataset 113, input: 0.00106728, achieved: 0.00106728
   dataset 114, input: 0.0010624, achieved: 0.0010624
   dataset 115, input: 0.00096181, achieved: 0.00096181
   dataset 116, input: 0.000943233, achieved: 0.000943233
   dataset 117, input: 0.000940579, achieved: 0.000940579
   dataset 118, input: 0.00138766, achieved: 0.00138766
   dataset 119, input: 0.00090583, achieved: 0.00090583
   dataset 120, input: 0.000908363, achieved: 0.000908363
   dataset 121, input: 0.000912756, achieved: 0.000912756
   dataset 122, input: 0.000880874, achieved: 0.000880874
   dataset 123, input: 0.000877063, achieved: 0.000877063
   dataset 124, input: 0.000879999, achieved: 0.000879999
   dataset 125, input: 0.000859182, achieved: 0.000859182
   dataset 126, input: 0.000854525, achieved: 0.000854525
   dataset 127, input: 0.000853143, achieved: 0.000853143
   dataset 128, input: 0.000825839, achieved: 0.000825839
   dataset 129, input: 0.000809956, achieved: 0.000809956
   dataset 130, input: 0.000803542, achieved: 0.000803542
   dataset 131, input: 0.000804066, achieved: 0.000804066
   dataset 132, input: 0.000789185, achieved: 0.000789185
   dataset 133, input: 0.000771419, achieved: 0.000771419
   dataset 134, input: 0.000765501, achieved: 0.000765501
   dataset 135, input: 0.000777711, achieved: 0.000777711
   dataset 136, input: 0.00121595, achieved: 0.00121595
   dataset 137, input: 0.00134176, achieved: 0.00134176
   dataset 138, input: 0.00134909, achieved: 0.00134909
   dataset 139, input: 0.00132973, achieved: 0.00132973
   dataset 140, input: 0.00131878, achieved: 0.00131878
   dataset 141, input: 0.00130425, achieved: 0.00130425
   dataset 142, input: 0.000865716, achieved: 0.000865716
   dataset 143, input: 0.000821941, achieved: 0.000821941
   dataset 144, input: 0.00077044, achieved: 0.00077044
   dataset 145, input: 0.00115856, achieved: 0.00115856
   dataset 146, input: 0.00105343, achieved: 0.00105343
   dataset 147, input: 0.00103246, achieved: 0.00103246
   dataset 148, input: 0.00103677, achieved: 0.00103677
   dataset 149, input: 0.00104975, achieved: 0.00104975
   dataset 150, input: 0.00101242, achieved: 0.00101242
   dataset 151, input: 0.00100947, achieved: 0.00100947
   dataset 152, input: 0.00100396, achieved: 0.00100396
   dataset 153, input: 0.0013901, achieved: 0.0013901
   dataset 154, input: 0.00128076, achieved: 0.00128076
   dataset 155, input: 0.00127316, achieved: 0.00127316
   dataset 156, input: 0.00125422, achieved: 0.00125422
   dataset 157, input: 0.00122036, achieved: 0.00122036
   dataset 158, input: 0.00121491, achieved: 0.00121491
   dataset 159, input: 0.00118219, achieved: 0.00118219
   dataset 160, input: 0.00121341, achieved: 0.00121341
   dataset 161, input: 0.00122664, achieved: 0.00122664
   dataset 162, input: 0.000977779, achieved: 0.000977779
   dataset 163, input: 0.000962933, achieved: 0.000962933
   dataset 164, input: 0.000937136, achieved: 0.000937136
   dataset 165, input: 0.000958408, achieved: 0.000958408
   dataset 166, input: 0.000948598, achieved: 0.000948598
   dataset 167, input: 0.000971585, achieved: 0.000971585
   dataset 168, input: 0.000976306, achieved: 0.000976306
   dataset 169, input: 0.000953802, achieved: 0.000953802
   dataset 170, input: 0.000938339, achieved: 0.000938339
   dataset 171, input: 0.000944223, achieved: 0.000944223
   dataset 172, input: 0.00140216, achieved: 0.00140216
   dataset 173, input: 0.00141163, achieved: 0.00141163
   dataset 174, input: 0.00141267, achieved: 0.00141267
   dataset 175, input: 0.0014137, achieved: 0.0014137
   dataset 176, input: 0.000784476, achieved: 0.000784476
   dataset 177, input: 0.000802195, achieved: 0.000802195
   dataset 178, input: 0.00078637, achieved: 0.00078637
   dataset 179, input: 0.000774079, achieved: 0.000774079
   dataset 180, input: 0.000788701, achieved: 0.000788701
   dataset 181, input: 0.000790123, achieved: 0.000790123
   dataset 182, input: 0.000754212, achieved: 0.000754212
   dataset 183, input: 0.000732871, achieved: 0.000732871
   dataset 184, input: 0.00106774, achieved: 0.00106774
   dataset 185, input: 0.00118187, achieved: 0.00118187
   dataset 186, input: 0.00123703, achieved: 0.00123703
   dataset 187, input: 0.000771724, achieved: 0.000771724
   dataset 188, input: 0.000780573, achieved: 0.000780573
   dataset 189, input: 0.00076416, achieved: 0.00076416
   dataset 190, input: 0.000742882, achieved: 0.000742882
   dataset 191, input: 0.000734708, achieved: 0.000734708
   dataset 192, input: 0.000724944, achieved: 0.000724944
   dataset 193, input: 0.000728709, achieved: 0.000728709
   dataset 194, input: 0.00071461, achieved: 0.00071461
   dataset 195, input: 0.00107015, achieved: 0.00107015
   dataset 196, input: 0.00137927, achieved: 0.00137927
   dataset 197, input: 0.000925288, achieved: 0.000925288
   dataset 198, input: 0.00135697, achieved: 0.00135697
   dataset 199, input: 0.00131986, achieved: 0.00131986
   dataset 200, input: 0.00122967, achieved: 0.00122967
   dataset 201, input: 0.00124034, achieved: 0.00124034
   dataset 202, input: 0.00137788, achieved: 0.00137788
   dataset 203, input: 0.00136676, achieved: 0.00136676
   dataset 204, input: 0.00135008, achieved: 0.00135008
   dataset 205, input: 0.00130673, achieved: 0.00130673
   dataset 206, input: 0.00127487, achieved: 0.00127487
   dataset 207, input: 0.00127236, achieved: 0.00127236
   dataset 208, input: 0.00125718, achieved: 0.00125718
   dataset 209, input: 0.00126082, achieved: 0.00126082
   dataset 210, input: 0.00125219, achieved: 0.00125219
   dataset 211, input: 0.00120455, achieved: 0.00120455
   dataset 212, input: 0.00119146, achieved: 0.00119146
   dataset 213, input: 0.00117272, achieved: 0.00117272
   dataset 214, input: 0.0011579, achieved: 0.0011579
   dataset 215, input: 0.000945489, achieved: 0.000945489
   dataset 216, input: 0.000947257, achieved: 0.000947257
   dataset 217, input: 0.0013656, achieved: 0.0013656
   dataset 218, input: 0.00133327, achieved: 0.00133327
   dataset 219, input: 0.00131316, achieved: 0.00131316
   dataset 220, input: 0.00128887, achieved: 0.00128887
   dataset 221, input: 0.00139364, achieved: 0.00139364
   dataset 222, input: 0.000997284, achieved: 0.000997284
   dataset 223, input: 0.000999172, achieved: 0.000999172
   dataset 224, input: 0.00137653, achieved: 0.00137653
   dataset 225, input: 0.00136431, achieved: 0.00136431
   dataset 226, input: 0.00135423, achieved: 0.00135423
   dataset 227, input: 0.00135096, achieved: 0.00135096
   dataset 228, input: 0.00131663, achieved: 0.00131663
   dataset 229, input: 0.00111499, achieved: 0.00111499
   dataset 230, input: 0.00110642, achieved: 0.00110642
   dataset 231, input: 0.00110372, achieved: 0.00110372
   dataset 232, input: 0.00107562, achieved: 0.00107562
   dataset 233, input: 0.00104146, achieved: 0.00104146
   dataset 234, input: 0.00101229, achieved: 0.00101229
   dataset 235, input: 0.001015, achieved: 0.001015
   dataset 236, input: 0.000999558, achieved: 0.000999558
   dataset 237, input: 0.00101254, achieved: 0.00101254
   dataset 238, input: 0.000983254, achieved: 0.000983254
   dataset 239, input: 0.000964383, achieved: 0.000964383
   dataset 240, input: 0.000960549, achieved: 0.000960549
   dataset 241, input: 0.000944424, achieved: 0.000944424
   dataset 242, input: 0.00131418, achieved: 0.00131418
   dataset 243, input: 0.000830243, achieved: 0.000830243
   dataset 244, input: 0.000810105, achieved: 0.000810105
   dataset 245, input: 0.000771126, achieved: 0.000771126
   dataset 246, input: 0.000749013, achieved: 0.000749013
   dataset 247, input: 0.000757131, achieved: 0.000757131
   dataset 248, input: 0.000729739, achieved: 0.000729739
   dataset 249, input: 0.000752784, achieved: 0.000752784
   dataset 250, input: 0.000713528, achieved: 0.000713528
   dataset 251, input: 0.000729751, achieved: 0.000729751
   dataset 252, input: 0.00120029, achieved: 0.00120029
   dataset 253, input: 0.00139873, achieved: 0.00139873
   dataset 254, input: 0.00135716, achieved: 0.00135716
   dataset 255, input: 0.00131714, achieved: 0.00131714
   dataset 256, input: 0.00128543, achieved: 0.00128543
   dataset 257, input: 0.00125699, achieved: 0.00125699
   dataset 258, input: 0.000819005, achieved: 0.000819005
   dataset 259, input: 0.00123535, achieved: 0.00123535
   dataset 260, input: 0.00127962, achieved: 0.00127962
   dataset 261, input: 0.00127487, achieved: 0.00127487
   dataset 262, input: 0.00125334, achieved: 0.00125334
   dataset 263, input: 0.00124844, achieved: 0.00124844
   dataset 264, input: 0.00122773, achieved: 0.00122773
   dataset 265, input: 0.000823599, achieved: 0.000823599
   dataset 266, input: 0.00121828, achieved: 0.00121828
   dataset 267, input: 0.000811965, achieved: 0.000811965
   dataset 268, input: 0.00132362, achieved: 0.00132362
   dataset 269, input: 0.00139815, achieved: 0.00139815
   dataset 270, input: 0.001256, achieved: 0.001256
   dataset 271, input: 0.00108371, achieved: 0.00108371
   dataset 272, input: 0.00107062, achieved: 0.00107062
   dataset 273, input: 0.00105916, achieved: 0.00105916
   dataset 274, input: 0.00102508, achieved: 0.00102508
   dataset 275, input: 0.00111499, achieved: 0.00111499
   dataset 276, input: 0.00109822, achieved: 0.00109822
   dataset 277, input: 0.00117478, achieved: 0.00117478
   dataset 278, input: 0.0011967, achieved: 0.0011967
   dataset 279, input: 0.00114093, achieved: 0.00114093
   dataset 280, input: 0.000779571, achieved: 0.000779571
   dataset 281, input: 0.00123281, achieved: 0.00123281
   dataset 282, input: 0.00062679, achieved: 0.00062679
   dataset 283, input: 0.00125363, achieved: 0.00125363
   dataset 284, input: 0.00109893, achieved: 0.00109893
   dataset 285, input: 0.0012276, achieved: 0.0012276
   dataset 286, input: 0.00127764, achieved: 0.00127764
   dataset 287, input: 0.00117289, achieved: 0.00117289
   dataset 288, input: 0.000738565, achieved: 0.000738565
   dataset 289, input: 0.00106061, achieved: 0.00106061
   dataset 290, input: 0.00123911, achieved: 0.00123911
   dataset 291, input: 0.00130963, achieved: 0.00130963
   dataset 292, input: 0.00122002, achieved: 0.00122002
   dataset 293, input: 0.000671676, achieved: 0.000671676
   dataset 294, input: 0.000733752, achieved: 0.000733752
   dataset 295, input: 0.00113394, achieved: 0.00113394
   dataset 296, input: 0.00123383, achieved: 0.00123383
   dataset 297, input: 0.00115412, achieved: 0.00115412
   dataset 298, input: 0.000686229, achieved: 0.000686229
   dataset 299, input: 0.00125179, achieved: 0.00125179
   dataset 300, input: 0.00123965, achieved: 0.00123965
   dataset 301, input: 0.00107752, achieved: 0.00107752
   dataset 302, input: 0.00115829, achieved: 0.00115829
   dataset 303, input: 0.00119977, achieved: 0.00119977
   dataset 304, input: 0.00117928, achieved: 0.00117928
   dataset 305, input: 0.000645114, achieved: 0.000645114
   dataset 306, input: 0.00123741, achieved: 0.00123741
   dataset 307, input: 0.0012657, achieved: 0.0012657
   dataset 308, input: 0.00114569, achieved: 0.00114569
   dataset 309, input: 0.00119626, achieved: 0.00119626
   dataset 310, input: 0.0012244, achieved: 0.0012244
   dataset 311, input: 0.000677064, achieved: 0.000677064
   dataset 312, input: 0.000732152, achieved: 0.000732152
   dataset 313, input: 0.00120647, achieved: 0.00120647
   dataset 314, input: 0.0012265, achieved: 0.0012265
   dataset 315, input: 0.0011615, achieved: 0.0011615
   dataset 316, input: 0.00121459, achieved: 0.00121459
   dataset 317, input: 0.00119835, achieved: 0.00119835
   dataset 318, input: 0.00127203, achieved: 0.00127203
   dataset 319, input: 0.00110161, achieved: 0.00110161
   dataset 320, input: 0.00109044, achieved: 0.00109044
   dataset 321, input: 0.00119994, achieved: 0.00119994
   dataset 322, input: 0.00109323, achieved: 0.00109323
   dataset 323, input: 0.00118551, achieved: 0.00118551
   dataset 324, input: 0.00115721, achieved: 0.00115721
   dataset 325, input: 0.00123548, achieved: 0.00123548
   dataset 326, input: 0.00118111, achieved: 0.00118111
   dataset 327, input: 0.00118876, achieved: 0.00118876
   dataset 328, input: 0.00107531, achieved: 0.00107531
   dataset 329, input: 0.00107846, achieved: 0.00107846
   dataset 330, input: 0.00124869, achieved: 0.00124869
   dataset 331, input: 0.00110692, achieved: 0.00110692
   dataset 332, input: 0.00102709, achieved: 0.00102709
   dataset 333, input: 0.00117422, achieved: 0.00117422
   dataset 334, input: 0.0011315, achieved: 0.0011315
   dataset 335, input: 0.00111281, achieved: 0.00111281
   dataset 336, input: 0.00110364, achieved: 0.00110364
   dataset 337, input: 0.00121196, achieved: 0.00121196
   dataset 338, input: 0.00119802, achieved: 0.00119802
   dataset 339, input: 0.00115191, achieved: 0.00115191
   dataset 340, input: 0.0011559, achieved: 0.0011559
   dataset 341, input: 0.00119496, achieved: 0.00119496
   dataset 342, input: 0.00104568, achieved: 0.00104568
   dataset 343, input: 0.00107559, achieved: 0.00107559
   dataset 344, input: 0.00109649, achieved: 0.00109649
   dataset 345, input: 0.00113205, achieved: 0.00113205
   dataset 346, input: 0.00101803, achieved: 0.00101803
   dataset 347, input: 0.00109609, achieved: 0.00109609
   dataset 348, input: 0.00106151, achieved: 0.00106151
   dataset 349, input: 0.00119758, achieved: 0.00119758
   dataset 350, input: 0.00130122, achieved: 0.00130122
   dataset 351, input: 0.00127431, achieved: 0.00127431
   dataset 352, input: 0.00124074, achieved: 0.00124074
   dataset 353, input: 0.00125926, achieved: 0.00125926
   dataset 354, input: 0.00121513, achieved: 0.00121513
   dataset 355, input: 0.0012617, achieved: 0.0012617
   dataset 356, input: 0.00125399, achieved: 0.00125399
   dataset 357, input: 0.0012555, achieved: 0.0012555
   dataset 358, input: 0.00118395, achieved: 0.00118395
   dataset 359, input: 0.00124139, achieved: 0.00124139
   dataset 360, input: 0.000609317, achieved: 0.000609317
   dataset 361, input: 0.00107773, achieved: 0.00107773
   dataset 362, input: 0.000908939, achieved: 0.000908939
   dataset 363, input: 0.00089609, achieved: 0.00089609
   dataset 364, input: 0.000916273, achieved: 0.000916273
   dataset 365, input: 0.00115259, achieved: 0.00115259
   dataset 366, input: 0.000930827, achieved: 0.000930827
   dataset 367, input: 0.00108648, achieved: 0.00108648
   dataset 368, input: 0.00108346, achieved: 0.00108346
   dataset 369, input: 0.0010692, achieved: 0.0010692
   dataset 370, input: 0.00108187, achieved: 0.00108187
   dataset 371, input: 0.00107058, achieved: 0.00107058
   dataset 372, input: 0.0010628, achieved: 0.0010628
   dataset 373, input: 0.00105714, achieved: 0.00105714
   dataset 374, input: 0.000961896, achieved: 0.000961896
   dataset 375, input: 0.000869631, achieved: 0.000869631
   dataset 376, input: 0.000964861, achieved: 0.000964861
   dataset 377, input: 0.000934764, achieved: 0.000934764
   dataset 378, input: 0.000975379, achieved: 0.000975379
   dataset 379, input: 0.000934948, achieved: 0.000934948
   dataset 380, input: 0.000880368, achieved: 0.000880368
   dataset 381, input: 0.00091663, achieved: 0.00091663
   dataset 382, input: 0.000851975, achieved: 0.000851975
   dataset 383, input: 0.000893062, achieved: 0.000893062
   dataset 384, input: 0.000926192, achieved: 0.000926192
   dataset 385, input: 0.000934505, achieved: 0.000934505
   dataset 386, input: 0.000911892, achieved: 0.000911892
   dataset 387, input: 0.000905853, achieved: 0.000905853
   dataset 388, input: 0.00111213, achieved: 0.00111213
   dataset 389, input: 0.000974665, achieved: 0.000974665
   dataset 390, input: 0.000943497, achieved: 0.000943497
   dataset 391, input: 0.000927827, achieved: 0.000927827
   dataset 392, input: 0.000950699, achieved: 0.000950699
   dataset 393, input: 0.000920343, achieved: 0.000920343
   dataset 394, input: 0.000930562, achieved: 0.000930562
   dataset 395, input: 0.000935184, achieved: 0.000935184
   dataset 396, input: 0.00091355, achieved: 0.00091355
   dataset 397, input: 0.000896176, achieved: 0.000896176
   dataset 398, input: 0.0008929, achieved: 0.0008929
   dataset 399, input: 0.000873811, achieved: 0.000873811
   dataset 400, input: 0.000873828, achieved: 0.000873828
   dataset 401, input: 0.000937004, achieved: 0.000937004
   dataset 402, input: 0.000879475, achieved: 0.000879475
   dataset 403, input: 0.000877075, achieved: 0.000877075
   dataset 404, input: 0.000863592, achieved: 0.000863592
   dataset 405, input: 0.000869487, achieved: 0.000869487
   dataset 406, input: 0.000827825, achieved: 0.000827825
   dataset 407, input: 0.000860455, achieved: 0.000860455
   dataset 408, input: 0.000857703, achieved: 0.000857703
   dataset 409, input: 0.000894017, achieved: 0.000894017
   dataset 410, input: 0.000883989, achieved: 0.000883989
   dataset 411, input: 0.000877466, achieved: 0.000877466
   dataset 412, input: 0.000880897, achieved: 0.000880897
   dataset 413, input: 0.000841256, achieved: 0.000841256
   dataset 414, input: 0.000850179, achieved: 0.000850179
   dataset 415, input: 0.000808251, achieved: 0.000808251
   dataset 416, input: 0.000844209, achieved: 0.000844209
   dataset 417, input: 0.00080657, achieved: 0.00080657
   dataset 418, input: 0.000799593, achieved: 0.000799593
   dataset 419, input: 0.000804711, achieved: 0.000804711
   dataset 420, input: 0.000806956, achieved: 0.000806956
   dataset 421, input: 0.00077527, achieved: 0.00077527
   dataset 422, input: 0.000757436, achieved: 0.000757436
   dataset 423, input: 0.000966577, achieved: 0.000966577
   dataset 424, input: 0.00113167, achieved: 0.00113167
   dataset 425, input: 0.00111027, achieved: 0.00111027
   dataset 426, input: 0.00109251, achieved: 0.00109251
   dataset 427, input: 0.00107578, achieved: 0.00107578
   dataset 428, input: 0.00107768, achieved: 0.00107768
   dataset 429, input: 0.00107422, achieved: 0.00107422
   dataset 430, input: 0.001056, achieved: 0.001056
   dataset 431, input: 0.000535653, achieved: 0.000535653
   dataset 432, input: 0.00104446, achieved: 0.00104446
   dataset 433, input: 0.00103606, achieved: 0.00103606
   dataset 434, input: 0.00102965, achieved: 0.00102965
   dataset 435, input: 0.0010364, achieved: 0.0010364
   dataset 436, input: 0.00101462, achieved: 0.00101462
   dataset 437, input: 0.00102401, achieved: 0.00102401
   dataset 438, input: 0.000797452, achieved: 0.000797452
   dataset 439, input: 0.000865037, achieved: 0.000865037
   dataset 440, input: 0.000831262, achieved: 0.000831262
   dataset 441, input: 0.000854249, achieved: 0.000854249
   dataset 442, input: 0.000833657, achieved: 0.000833657
   dataset 443, input: 0.00082121, achieved: 0.00082121
   dataset 444, input: 0.000825459, achieved: 0.000825459
   dataset 445, input: 0.000801148, achieved: 0.000801148
   dataset 446, input: 0.000794734, achieved: 0.000794734
   dataset 447, input: 0.000775265, achieved: 0.000775265
   dataset 448, input: 0.000776266, achieved: 0.000776266
   dataset 449, input: 0.000776203, achieved: 0.000776203
   dataset 450, input: 0.000776859, achieved: 0.000776859
   dataset 451, input: 0.000766163, achieved: 0.000766163
   dataset 452, input: 0.0007391, achieved: 0.0007391
   dataset 453, input: 0.000756261, achieved: 0.000756261
   dataset 454, input: 0.0010847, achieved: 0.0010847
   dataset 455, input: 0.00109135, achieved: 0.00109135
   dataset 456, input: 0.0010651, achieved: 0.0010651
   dataset 457, input: 0.00102641, achieved: 0.00102641
   dataset 458, input: 0.00101347, achieved: 0.00101347
   dataset 459, input: 0.000990928, achieved: 0.000990928
   dataset 460, input: 0.000954194, achieved: 0.000954194
   dataset 461, input: 0.000954268, achieved: 0.000954268
   dataset 462, input: 0.000930792, achieved: 0.000930792
   dataset 463, input: 0.000939197, achieved: 0.000939197
   dataset 464, input: 0.000878036, achieved: 0.000878036
   dataset 465, input: 0.000840985, achieved: 0.000840985
   dataset 466, input: 0.000837658, achieved: 0.000837658
   dataset 467, input: 0.000828821, achieved: 0.000828821
   dataset 468, input: 0.00080147, achieved: 0.00080147
   dataset 469, input: 0.000810347, achieved: 0.000810347
   dataset 470, input: 0.000888525, achieved: 0.000888525
   dataset 471, input: 0.00100767, achieved: 0.00100767
   dataset 472, input: 0.000980341, achieved: 0.000980341
   dataset 473, input: 0.000864531, achieved: 0.000864531
   dataset 474, input: 0.000748864, achieved: 0.000748864
   dataset 475, input: 0.000746826, achieved: 0.000746826
   dataset 476, input: 0.000845677, achieved: 0.000845677
   dataset 477, input: 0.000897949, achieved: 0.000897949
   dataset 478, input: 0.000767988, achieved: 0.000767988
   dataset 479, input: 0.000885641, achieved: 0.000885641
   dataset 480, input: 0.000896942, achieved: 0.000896942
   dataset 481, input: 0.00107101, achieved: 0.00107101
   dataset 482, input: 0.00105722, achieved: 0.00105722
   dataset 483, input: 0.00104686, achieved: 0.00104686
   dataset 484, input: 0.000883384, achieved: 0.000883384
   dataset 485, input: 0.000876839, achieved: 0.000876839
   dataset 486, input: 0.00103013, achieved: 0.00103013
   dataset 487, input: 0.00100839, achieved: 0.00100839
   dataset 488, input: 0.00100041, achieved: 0.00100041
   dataset 489, input: 0.000997669, achieved: 0.000997669
   dataset 490, input: 0.00100923, achieved: 0.00100923
   dataset 491, input: 0.000992995, achieved: 0.000992995
   dataset 492, input: 0.000994918, achieved: 0.000994918
   dataset 493, input: 0.00097615, achieved: 0.00097615
   dataset 494, input: 0.000985948, achieved: 0.000985948
   dataset 495, input: 0.000979789, achieved: 0.000979789
   dataset 496, input: 0.000979104, achieved: 0.000979104
   dataset 497, input: 0.00100037, achieved: 0.00100037
   dataset 498, input: 0.000985344, achieved: 0.000985344
   dataset 499, input: 0.000982834, achieved: 0.000982834
   dataset 500, input: 0.000964498, achieved: 0.000964498
   dataset 501, input: 0.00097326, achieved: 0.00097326
   dataset 502, input: 0.000739831, achieved: 0.000739831
   dataset 503, input: 0.000756791, achieved: 0.000756791
   dataset 504, input: 0.000775489, achieved: 0.000775489
   dataset 505, input: 0.000751817, achieved: 0.000751817
   dataset 506, input: 0.000758144, achieved: 0.000758144
   dataset 507, input: 0.00073891, achieved: 0.00073891
   dataset 508, input: 0.000737131, achieved: 0.000737131
   dataset 509, input: 0.000746981, achieved: 0.000746981
   dataset 510, input: 0.0010393, achieved: 0.0010393
   dataset 511, input: 0.0010687, achieved: 0.0010687
   dataset 512, input: 0.00107037, achieved: 0.00107037
   dataset 513, input: 0.00102187, achieved: 0.00102187
   dataset 514, input: 0.000717932, achieved: 0.000717932
   dataset 515, input: 0.000765, achieved: 0.000765
   dataset 516, input: 0.000764068, achieved: 0.000764068
   dataset 517, input: 0.000749324, achieved: 0.000749324
   dataset 518, input: 0.000758766, achieved: 0.000758766
   dataset 519, input: 0.000727022, achieved: 0.000727022
   dataset 520, input: 0.0007318, achieved: 0.0007318
   dataset 521, input: 0.000715865, achieved: 0.000715865
   dataset 522, input: 0.00072765, achieved: 0.00072765
   dataset 523, input: 0.000723706, achieved: 0.000723706
   dataset 524, input: 0.000725168, achieved: 0.000725168
   dataset 525, input: 0.000718652, achieved: 0.000718652
   dataset 526, input: 0.000706188, achieved: 0.000706188
   dataset 527, input: 0.000834537, achieved: 0.000834537
   dataset 528, input: 0.000985672, achieved: 0.000985672
   dataset 529, input: 0.000981216, achieved: 0.000981216
   dataset 530, input: 0.000931402, achieved: 0.000931402
   dataset 531, input: 0.000933538, achieved: 0.000933538
   dataset 532, input: 0.000914684, achieved: 0.000914684
   dataset 533, input: 0.000895554, achieved: 0.000895554
   dataset 534, input: 0.000864565, achieved: 0.000864565
   dataset 535, input: 0.000850898, achieved: 0.000850898
   dataset 536, input: 0.000842159, achieved: 0.000842159
   dataset 537, input: 0.000825511, achieved: 0.000825511
   dataset 538, input: 0.000985655, achieved: 0.000985655
   dataset 539, input: 0.000962852, achieved: 0.000962852
   dataset 540, input: 0.00096352, achieved: 0.00096352
   dataset 541, input: 0.000948794, achieved: 0.000948794
   dataset 542, input: 0.000944591, achieved: 0.000944591
   dataset 543, input: 0.000926906, achieved: 0.000926906
   dataset 544, input: 0.000911397, achieved: 0.000911397
   dataset 545, input: 0.00089347, achieved: 0.00089347
   dataset 546, input: 0.00089225, achieved: 0.00089225
   dataset 547, input: 0.000896596, achieved: 0.000896596
   dataset 548, input: 0.000883045, achieved: 0.000883045
   dataset 549, input: 0.000850363, achieved: 0.000850363
   dataset 550, input: 0.000866114, achieved: 0.000866114
   dataset 551, input: 0.000871318, achieved: 0.000871318
   dataset 552, input: 0.000874812, achieved: 0.000874812
   dataset 553, input: 0.000835568, achieved: 0.000835568
   dataset 554, input: 0.000844802, achieved: 0.000844802
   dataset 555, input: 0.0008349, achieved: 0.0008349
   dataset 556, input: 0.000810583, achieved: 0.000810583
   dataset 557, input: 0.000825891, achieved: 0.000825891
   dataset 558, input: 0.000817474, achieved: 0.000817474
   dataset 559, input: 0.000806801, achieved: 0.000806801
   dataset 560, input: 0.000790112, achieved: 0.000790112
   dataset 561, input: 0.000794055, achieved: 0.000794055
   dataset 562, input: 0.00078519, achieved: 0.00078519
   dataset 563, input: 0.000982667, achieved: 0.000982667
   dataset 564, input: 0.000985471, achieved: 0.000985471
   dataset 565, input: 0.000982846, achieved: 0.000982846
   dataset 566, input: 0.000977215, achieved: 0.000977215
   dataset 567, input: 0.000954913, achieved: 0.000954913
   dataset 568, input: 0.0009296, achieved: 0.0009296
   dataset 569, input: 0.00096196, achieved: 0.00096196
   dataset 570, input: 0.000917315, achieved: 0.000917315
   dataset 571, input: 0.000894956, achieved: 0.000894956
   dataset 572, input: 0.000898772, achieved: 0.000898772
   dataset 573, input: 0.000891709, achieved: 0.000891709
   dataset 574, input: 0.000847087, achieved: 0.000847087
   dataset 575, input: 0.000820721, achieved: 0.000820721
   dataset 576, input: 0.00118844, achieved: 0.00118844
   dataset 577, input: 0.000583665, achieved: 0.000583665
   dataset 578, input: 0.000977946, achieved: 0.000977946
   dataset 579, input: 0.000581754, achieved: 0.000581754
   dataset 580, input: 0.000965414, achieved: 0.000965414
   dataset 581, input: 0.00114483, achieved: 0.00114483
   dataset 582, input: 0.00115279, achieved: 0.00115279
   dataset 583, input: 0.00112374, achieved: 0.00112374
   dataset 584, input: 0.00110307, achieved: 0.00110307
   dataset 585, input: 0.00110299, achieved: 0.00110299
   dataset 586, input: 0.00109774, achieved: 0.00109774
   dataset 587, input: 0.000557592, achieved: 0.000557592
   dataset 588, input: 0.00106251, achieved: 0.00106251
   dataset 589, input: 0.00105417, achieved: 0.00105417
   dataset 590, input: 0.00105698, achieved: 0.00105698
   dataset 591, input: 0.00103661, achieved: 0.00103661
   dataset 592, input: 0.000535267, achieved: 0.000535267
   dataset 593, input: 0.00104656, achieved: 0.00104656
   dataset 594, input: 0.00101346, achieved: 0.00101346
   dataset 595, input: 0.000978136, achieved: 0.000978136
   dataset 596, input: 0.000989408, achieved: 0.000989408
   dataset 597, input: 0.000980071, achieved: 0.000980071
   dataset 598, input: 0.000958649, achieved: 0.000958649
   dataset 599, input: 0.000949352, achieved: 0.000949352
   dataset 600, input: 0.000947706, achieved: 0.000947706
   dataset 601, input: 0.000930498, achieved: 0.000930498
   dataset 602, input: 0.000927868, achieved: 0.000927868
   dataset 603, input: 0.000799121, achieved: 0.000799121
   dataset 604, input: 0.000789657, achieved: 0.000789657
   dataset 605, input: 0.000792391, achieved: 0.000792391
   dataset 606, input: 0.000782346, achieved: 0.000782346
   dataset 607, input: 0.000770412, achieved: 0.000770412
   dataset 608, input: 0.000770268, achieved: 0.000770268
   dataset 609, input: 0.000753256, achieved: 0.000753256
   dataset 610, input: 0.000736429, achieved: 0.000736429
   dataset 611, input: 0.000745162, achieved: 0.000745162
   dataset 612, input: 0.000732543, achieved: 0.000732543
   dataset 613, input: 0.000734477, achieved: 0.000734477
   dataset 614, input: 0.000687916, achieved: 0.000687916
   dataset 615, input: 0.00069228, achieved: 0.00069228
   dataset 616, input: 0.0006846, achieved: 0.0006846
   dataset 617, input: 0.000829074, achieved: 0.000829074
   dataset 618, input: 0.000939076, achieved: 0.000939076
   dataset 619, input: 0.000930521, achieved: 0.000930521
   dataset 620, input: 0.000926008, achieved: 0.000926008
   dataset 621, input: 0.000909221, achieved: 0.000909221
   dataset 622, input: 0.000900448, achieved: 0.000900448
   dataset 623, input: 0.000894213, achieved: 0.000894213
   dataset 624, input: 0.000884432, achieved: 0.000884432
   dataset 625, input: 0.000879136, achieved: 0.000879136
   dataset 626, input: 0.000876395, achieved: 0.000876395
   dataset 627, input: 0.000872642, achieved: 0.000872642
   dataset 628, input: 0.000833783, achieved: 0.000833783
   dataset 629, input: 0.000841089, achieved: 0.000841089
   dataset 630, input: 0.000863345, achieved: 0.000863345
   dataset 631, input: 0.000903764, achieved: 0.000903764
   dataset 632, input: 0.000872659, achieved: 0.000872659
   dataset 633, input: 0.00087841, achieved: 0.00087841
   dataset 634, input: 0.00087677, achieved: 0.00087677
   dataset 635, input: 0.000868462, achieved: 0.000868462
   dataset 636, input: 0.000844658, achieved: 0.000844658
   dataset 637, input: 0.000835349, achieved: 0.000835349
   dataset 638, input: 0.000840162, achieved: 0.000840162
   dataset 639, input: 0.000828855, achieved: 0.000828855
   dataset 640, input: 0.00081185, achieved: 0.00081185
   dataset 641, input: 0.000798148, achieved: 0.000798148
   dataset 642, input: 0.000792564, achieved: 0.000792564
   dataset 643, input: 0.00105035, achieved: 0.00105035
   dataset 644, input: 0.000988677, achieved: 0.000988677
   dataset 645, input: 0.000968522, achieved: 0.000968522
   dataset 646, input: 0.000888888, achieved: 0.000888888
   dataset 647, input: 0.000937597, achieved: 0.000937597
   dataset 648, input: 0.000921195, achieved: 0.000921195
   dataset 649, input: 0.0010635, achieved: 0.0010635
   dataset 650, input: 0.00101841, achieved: 0.00101841
   dataset 651, input: 0.00103273, achieved: 0.00103273
   dataset 652, input: 0.00101029, achieved: 0.00101029
   dataset 653, input: 0.00108704, achieved: 0.00108704
   dataset 654, input: 0.000783543, achieved: 0.000783543
   dataset 655, input: 0.000794947, achieved: 0.000794947
   dataset 656, input: 0.000558715, achieved: 0.000558715
   dataset 657, input: 0.000806749, achieved: 0.000806749
   dataset 658, input: 0.000806363, achieved: 0.000806363
   dataset 659, input: 0.000549026, achieved: 0.000549026
   dataset 660, input: 0.000786911, achieved: 0.000786911
   dataset 661, input: 0.000804302, achieved: 0.000804302
   dataset 662, input: 0.000810002, achieved: 0.000810002
   dataset 663, input: 0.000784867, achieved: 0.000784867
   dataset 664, input: 0.000803306, achieved: 0.000803306
   dataset 665, input: 0.000817773, achieved: 0.000817773
   dataset 666, input: 0.000810462, achieved: 0.000810462
   dataset 667, input: 0.000796237, achieved: 0.000796237
   dataset 668, input: 0.00081022, achieved: 0.00081022
   dataset 669, input: 0.000544075, achieved: 0.000544075
   dataset 670, input: 0.000812494, achieved: 0.000812494
   dataset 671, input: 0.000796784, achieved: 0.000796784
   dataset 672, input: 0.000799766, achieved: 0.000799766
   dataset 673, input: 0.00081931, achieved: 0.00081931
   dataset 674, input: 0.000784775, achieved: 0.000784775
   dataset 675, input: 0.000563591, achieved: 0.000563591
   dataset 676, input: 0.000821325, achieved: 0.000821325
   dataset 677, input: 0.000785777, achieved: 0.000785777
   dataset 678, input: 0.000550114, achieved: 0.000550114
   dataset 679, input: 0.000578357, achieved: 0.000578357
   dataset 680, input: 0.000810796, achieved: 0.000810796
   dataset 681, input: 0.000559705, achieved: 0.000559705
   dataset 682, input: 0.000795966, achieved: 0.000795966
   dataset 683, input: 0.000786197, achieved: 0.000786197
   dataset 684, input: 0.000799184, achieved: 0.000799184
   dataset 685, input: 0.000789743, achieved: 0.000789743
   dataset 686, input: 0.000782046, achieved: 0.000782046
   dataset 687, input: 0.000817969, achieved: 0.000817969
   dataset 688, input: 0.000797555, achieved: 0.000797555
   dataset 689, input: 0.00079584, achieved: 0.00079584
   dataset 690, input: 0.000785581, achieved: 0.000785581
   dataset 691, input: 0.000807636, achieved: 0.000807636
   dataset 692, input: 0.000794182, achieved: 0.000794182
   dataset 693, input: 0.00058298, achieved: 0.00058298
   dataset 694, input: 0.000549394, achieved: 0.000549394
   dataset 695, input: 0.000805367, achieved: 0.000805367
   dataset 696, input: 0.000786951, achieved: 0.000786951
   dataset 697, input: 0.000796905, achieved: 0.000796905
   dataset 698, input: 0.000817675, achieved: 0.000817675
   dataset 699, input: 0.000756083, achieved: 0.000756083
   dataset 700, input: 0.000672539, achieved: 0.000672539
   dataset 701, input: 0.000692878, achieved: 0.000692878
   dataset 702, input: 0.000674531, achieved: 0.000674531
   dataset 703, input: 0.000660484, achieved: 0.000660484
   dataset 704, input: 0.000682953, achieved: 0.000682953
   dataset 705, input: 0.00066068, achieved: 0.00066068
   dataset 706, input: 0.000671561, achieved: 0.000671561
   dataset 707, input: 0.000639236, achieved: 0.000639236
   dataset 708, input: 0.000680311, achieved: 0.000680311
   dataset 709, input: 0.00066102, achieved: 0.00066102
   dataset 710, input: 0.00064946, achieved: 0.00064946
   dataset 711, input: 0.000676552, achieved: 0.000676552
   dataset 712, input: 0.000658683, achieved: 0.000658683
   dataset 713, input: 0.000660358, achieved: 0.000660358
   dataset 714, input: 0.000675844, achieved: 0.000675844
   dataset 715, input: 0.000676673, achieved: 0.000676673
   dataset 716, input: 0.000657514, achieved: 0.000657514
   dataset 717, input: 0.000652097, achieved: 0.000652097
   dataset 718, input: 0.000632949, achieved: 0.000632949
   dataset 719, input: 0.000644077, achieved: 0.000644077
   dataset 720, input: 0.000624821, achieved: 0.000624821
   dataset 721, input: 0.000614925, achieved: 0.000614925
   dataset 722, input: 0.000621021, achieved: 0.000621021
   dataset 723, input: 0.000620705, achieved: 0.000620705
   dataset 724, input: 0.000637515, achieved: 0.000637515
   dataset 725, input: 0.000616416, achieved: 0.000616416
   dataset 726, input: 0.000616744, achieved: 0.000616744
   dataset 727, input: 0.000610244, achieved: 0.000610244
   dataset 728, input: 0.0006293, achieved: 0.0006293
   dataset 729, input: 0.000617204, achieved: 0.000617204
   dataset 730, input: 0.000633824, achieved: 0.000633824
   dataset 731, input: 0.000612714, achieved: 0.000612714
   dataset 732, input: 0.000616692, achieved: 0.000616692
   dataset 733, input: 0.000578432, achieved: 0.000578432
   dataset 734, input: 0.000607326, achieved: 0.000607326
   dataset 735, input: 0.000609617, achieved: 0.000609617
   dataset 736, input: 0.000615345, achieved: 0.000615345
   dataset 737, input: 0.000613687, achieved: 0.000613687
   dataset 738, input: 0.000614948, achieved: 0.000614948
   dataset 739, input: 0.000984855, achieved: 0.000984855
   dataset 740, input: 0.000985448, achieved: 0.000985448
   dataset 741, input: 0.000887086, achieved: 0.000887086
   dataset 742, input: 0.000855233, achieved: 0.000855233
   dataset 743, input: 0.000844566, achieved: 0.000844566
   dataset 744, input: 0.000841031, achieved: 0.000841031
   dataset 745, input: 0.000844733, achieved: 0.000844733
   dataset 746, input: 0.000837727, achieved: 0.000837727
   dataset 747, input: 0.000836903, achieved: 0.000836903
   dataset 748, input: 0.000836932, achieved: 0.000836932
   dataset 749, input: 0.000839937, achieved: 0.000839937
   dataset 750, input: 0.000819592, achieved: 0.000819592
   dataset 751, input: 0.000815016, achieved: 0.000815016
   dataset 752, input: 0.000610244, achieved: 0.000610244
   dataset 753, input: 0.000602288, achieved: 0.000602288
   dataset 754, input: 0.000608322, achieved: 0.000608322
   dataset 755, input: 0.000524525, achieved: 0.000524525
   dataset 756, input: 0.000585824, achieved: 0.000585824
   dataset 757, input: 0.000589232, achieved: 0.000589232
   dataset 758, input: 0.000566659, achieved: 0.000566659
   dataset 759, input: 0.000586077, achieved: 0.000586077
   dataset 760, input: 0.000583429, achieved: 0.000583429
   dataset 761, input: 0.000583193, achieved: 0.000583193
   dataset 762, input: 0.000566153, achieved: 0.000566153
   dataset 763, input: 0.000581903, achieved: 0.000581903
   dataset 764, input: 0.000556303, achieved: 0.000556303
   dataset 765, input: 0.00055731, achieved: 0.00055731
   dataset 766, input: 0.000575899, achieved: 0.000575899
   dataset 767, input: 0.000571932, achieved: 0.000571932
   dataset 768, input: 0.000563602, achieved: 0.000563602
   dataset 769, input: 0.000588656, achieved: 0.000588656
   dataset 770, input: 0.000575513, achieved: 0.000575513
   dataset 771, input: 0.000551945, achieved: 0.000551945
   dataset 772, input: 0.000562606, achieved: 0.000562606
   dataset 773, input: 0.000580199, achieved: 0.000580199
   dataset 774, input: 0.000539078, achieved: 0.000539078
   dataset 775, input: 0.000576227, achieved: 0.000576227
   dataset 776, input: 0.000571069, achieved: 0.000571069
   dataset 777, input: 0.000553971, achieved: 0.000553971
   dataset 778, input: 0.000556343, achieved: 0.000556343
   dataset 779, input: 0.000565208, achieved: 0.000565208
   dataset 780, input: 0.000578202, achieved: 0.000578202
   dataset 781, input: 0.00055328, achieved: 0.00055328
   dataset 782, input: 0.000540569, achieved: 0.000540569
   dataset 783, input: 0.000950008, achieved: 0.000950008
   dataset 784, input: 0.000960895, achieved: 0.000960895
   dataset 785, input: 0.000938356, achieved: 0.000938356
   dataset 786, input: 0.000923251, achieved: 0.000923251
   dataset 787, input: 0.000889642, achieved: 0.000889642
   dataset 788, input: 0.000852568, achieved: 0.000852568
   dataset 789, input: 0.000825833, achieved: 0.000825833
   dataset 790, input: 0.00083254, achieved: 0.00083254
   dataset 791, input: 0.000845838, achieved: 0.000845838
   dataset 792, input: 0.000820208, achieved: 0.000820208
   dataset 793, input: 0.0008303, achieved: 0.0008303
   dataset 794, input: 0.000823841, achieved: 0.000823841
   dataset 795, input: 0.000831118, achieved: 0.000831118
   dataset 796, input: 0.000809506, achieved: 0.000809506
   dataset 797, input: 0.000598662, achieved: 0.000598662
   dataset 798, input: 0.000603699, achieved: 0.000603699
   dataset 799, input: 0.000610561, achieved: 0.000610561
   dataset 800, input: 0.00062835, achieved: 0.00062835
   dataset 801, input: 0.0004126, achieved: 0.0004126
   dataset 802, input: 0.000623502, achieved: 0.000623502
   dataset 803, input: 0.00061428, achieved: 0.00061428
   dataset 804, input: 0.000602461, achieved: 0.000602461
   dataset 805, input: 0.00060013, achieved: 0.00060013
   dataset 806, input: 0.000587954, achieved: 0.000587954
   dataset 807, input: 0.00059572, achieved: 0.00059572
   dataset 808, input: 0.000584413, achieved: 0.000584413
   dataset 809, input: 0.000590314, achieved: 0.000590314
   dataset 810, input: 0.000585968, achieved: 0.000585968
   dataset 811, input: 0.000584528, achieved: 0.000584528
   dataset 812, input: 0.000583216, achieved: 0.000583216
   dataset 813, input: 0.000574414, achieved: 0.000574414
   dataset 814, input: 0.00100597, achieved: 0.00100597
   dataset 815, input: 0.000929882, achieved: 0.000929882
   dataset 816, input: 0.000955161, achieved: 0.000955161
   dataset 817, input: 0.00095287, achieved: 0.00095287
   dataset 818, input: 0.000947809, achieved: 0.000947809
   dataset 819, input: 0.000965408, achieved: 0.000965408
   dataset 820, input: 0.000936514, achieved: 0.000936514
   dataset 821, input: 0.000923797, achieved: 0.000923797
   dataset 822, input: 0.000920545, achieved: 0.000920545
   dataset 823, input: 0.000888013, achieved: 0.000888013
   dataset 824, input: 0.00089427, achieved: 0.00089427
   dataset 825, input: 0.000868382, achieved: 0.000868382
   dataset 826, input: 0.000871191, achieved: 0.000871191
   dataset 827, input: 0.000862648, achieved: 0.000862648
   dataset 828, input: 0.000899383, achieved: 0.000899383
   dataset 829, input: 0.000890776, achieved: 0.000890776
   dataset 830, input: 0.000872118, achieved: 0.000872118
   dataset 831, input: 0.000858716, achieved: 0.000858716
   dataset 832, input: 0.000942375, achieved: 0.000942375
   dataset 833, input: 0.00097664, achieved: 0.00097664
   dataset 834, input: 0.000976415, achieved: 0.000976415
   dataset 835, input: 0.000943877, achieved: 0.000943877
   dataset 836, input: 0.000972915, achieved: 0.000972915
   dataset 837, input: 0.000950475, achieved: 0.000950475
   dataset 838, input: 0.000932116, achieved: 0.000932116
   dataset 839, input: 0.000936877, achieved: 0.000936877
   dataset 840, input: 0.000922053, achieved: 0.000922053
   dataset 841, input: 0.000921414, achieved: 0.000921414
   dataset 842, input: 0.000905502, achieved: 0.000905502
   dataset 843, input: 0.000902296, achieved: 0.000902296
   dataset 844, input: 0.000892515, achieved: 0.000892515
   dataset 845, input: 0.000902002, achieved: 0.000902002
   dataset 846, input: 0.000838193, achieved: 0.000838193
   dataset 847, input: 0.000579629, achieved: 0.000579629
   dataset 848, input: 0.000568634, achieved: 0.000568634
   dataset 849, input: 0.000555479, achieved: 0.000555479
   dataset 850, input: 0.000559682, achieved: 0.000559682
   dataset 851, input: 0.000556631, achieved: 0.000556631
   dataset 852, input: 0.000539009, achieved: 0.000539009
   dataset 853, input: 0.0005394, achieved: 0.0005394
   dataset 854, input: 0.000531801, achieved: 0.000531801
   dataset 855, input: 0.000561173, achieved: 0.000561173
   dataset 856, input: 0.000531629, achieved: 0.000531629
   dataset 857, input: 0.000533246, achieved: 0.000533246
   dataset 858, input: 0.000515049, achieved: 0.000515049
   dataset 859, input: 0.000518376, achieved: 0.000518376
   dataset 860, input: 0.000511019, achieved: 0.000511019
   dataset 861, input: 0.000516874, achieved: 0.000516874
   dataset 862, input: 0.000529982, achieved: 0.000529982
   dataset 863, input: 0.000929583, achieved: 0.000929583
   dataset 864, input: 0.000910326, achieved: 0.000910326
   dataset 865, input: 0.000883459, achieved: 0.000883459
   dataset 866, input: 0.000893321, achieved: 0.000893321
   dataset 867, input: 0.000880621, achieved: 0.000880621
   dataset 868, input: 0.000853667, achieved: 0.000853667
   dataset 869, input: 0.000866125, achieved: 0.000866125
   dataset 870, input: 0.000823737, achieved: 0.000823737
   dataset 871, input: 0.00083064, achieved: 0.00083064
   dataset 872, input: 0.000821176, achieved: 0.000821176
   dataset 873, input: 0.00080596, achieved: 0.00080596
   dataset 874, input: 0.000800526, achieved: 0.000800526
   dataset 875, input: 0.000789501, achieved: 0.000789501
   dataset 876, input: 0.000612507, achieved: 0.000612507
   dataset 877, input: 0.000419422, achieved: 0.000419422
   dataset 878, input: 0.000620112, achieved: 0.000620112
   dataset 879, input: 0.000616053, achieved: 0.000616053
   dataset 880, input: 0.00061641, achieved: 0.00061641
   dataset 881, input: 0.00061097, achieved: 0.00061097
   dataset 882, input: 0.00041374, achieved: 0.00041374
   dataset 883, input: 0.000605483, achieved: 0.000605483
   dataset 884, input: 0.000603071, achieved: 0.000603071
   dataset 885, input: 0.000604476, achieved: 0.000604476
   dataset 886, input: 0.000586445, achieved: 0.000586445
   dataset 887, input: 0.000584419, achieved: 0.000584419
   dataset 888, input: 0.000582807, achieved: 0.000582807
   dataset 889, input: 0.000590763, achieved: 0.000590763
   dataset 890, input: 0.000586969, achieved: 0.000586969
   dataset 891, input: 0.000585047, achieved: 0.000585047
   dataset 892, input: 0.000585812, achieved: 0.000585812
   dataset 893, input: 0.000566763, achieved: 0.000566763
   dataset 894, input: 0.000584592, achieved: 0.000584592
   dataset 895, input: 0.000597522, achieved: 0.000597522
   dataset 896, input: 0.000563038, achieved: 0.000563038
   dataset 897, input: 0.000609916, achieved: 0.000609916
   dataset 898, input: 0.000603687, achieved: 0.000603687
   dataset 899, input: 0.00060656, achieved: 0.00060656
   dataset 900, input: 0.000595449, achieved: 0.000595449
   dataset 901, input: 0.0005934, achieved: 0.0005934
   dataset 902, input: 0.000598495, achieved: 0.000598495
   dataset 903, input: 0.000585409, achieved: 0.000585409
   dataset 904, input: 0.000573487, achieved: 0.000573487
   dataset 905, input: 0.000561046, achieved: 0.000561046
   dataset 906, input: 0.000564195, achieved: 0.000564195
   dataset 907, input: 0.00055229, achieved: 0.00055229
   dataset 908, input: 0.000555117, achieved: 0.000555117
   dataset 909, input: 0.000554253, achieved: 0.000554253
   dataset 910, input: 0.000550684, achieved: 0.000550684
   dataset 911, input: 0.000550862, achieved: 0.000550862
   dataset 912, input: 0.000543902, achieved: 0.000543902
   dataset 913, input: 0.00054921, achieved: 0.00054921
   dataset 914, input: 0.000529614, achieved: 0.000529614
   dataset 915, input: 0.000530984, achieved: 0.000530984
   dataset 916, input: 0.000537909, achieved: 0.000537909
   dataset 917, input: 0.000528825, achieved: 0.000528825
   dataset 918, input: 0.000568196, achieved: 0.000568196
   dataset 919, input: 0.000579606, achieved: 0.000579606
   dataset 920, input: 0.000583786, achieved: 0.000583786
   dataset 921, input: 0.000594424, achieved: 0.000594424
   dataset 922, input: 0.00057625, achieved: 0.00057625
   dataset 923, input: 0.000576302, achieved: 0.000576302
   dataset 924, input: 0.000581754, achieved: 0.000581754
   dataset 925, input: 0.000562393, achieved: 0.000562393
   dataset 926, input: 0.000557368, achieved: 0.000557368
   dataset 927, input: 0.000563827, achieved: 0.000563827
   dataset 928, input: 0.000560781, achieved: 0.000560781
   dataset 929, input: 0.000570775, achieved: 0.000570775
   dataset 930, input: 0.000565093, achieved: 0.000565093
   dataset 931, input: 0.000560223, achieved: 0.000560223
   dataset 932, input: 0.000555491, achieved: 0.000555491
   dataset 933, input: 0.000551185, achieved: 0.000551185
   dataset 934, input: 0.000529211, achieved: 0.000529211
   dataset 935, input: 0.00054556, achieved: 0.00054556
   dataset 936, input: 0.000978931, achieved: 0.000978931
   dataset 937, input: 0.000794878, achieved: 0.000794878
   dataset 938, input: 0.000782467, achieved: 0.000782467
   dataset 939, input: 0.000778414, achieved: 0.000778414
   dataset 940, input: 0.000757557, achieved: 0.000757557
   dataset 941, input: 0.000761224, achieved: 0.000761224
   dataset 942, input: 0.000739296, achieved: 0.000739296
   dataset 943, input: 0.000746855, achieved: 0.000746855
   dataset 944, input: 0.000745041, achieved: 0.000745041
   dataset 945, input: 0.000748403, achieved: 0.000748403
   dataset 946, input: 0.000720511, achieved: 0.000720511
   dataset 947, input: 0.00071898, achieved: 0.00071898
   dataset 948, input: 0.000719446, achieved: 0.000719446
   dataset 949, input: 0.000721668, achieved: 0.000721668
   dataset 950, input: 0.000742105, achieved: 0.000742105
   dataset 951, input: 0.000708859, achieved: 0.000708859
   dataset 952, input: 0.000731443, achieved: 0.000731443
   dataset 953, input: 0.000681008, achieved: 0.000681008
   dataset 954, input: 0.000701537, achieved: 0.000701537
   dataset 955, input: 0.000674882, achieved: 0.000674882
   dataset 956, input: 0.000678187, achieved: 0.000678187
   dataset 957, input: 0.000671376, achieved: 0.000671376
   dataset 958, input: 0.000651492, achieved: 0.000651492
   dataset 959, input: 0.000704962, achieved: 0.000704962
   dataset 960, input: 0.000504583, achieved: 0.000504583
   dataset 961, input: 0.000503161, achieved: 0.000503161
   dataset 962, input: 0.000497197, achieved: 0.000497197
   dataset 963, input: 0.000493328, achieved: 0.000493328
   dataset 964, input: 0.000821043, achieved: 0.000821043
   dataset 965, input: 0.000953791, achieved: 0.000953791
   dataset 966, input: 0.000485079, achieved: 0.000485079
   dataset 967, input: 0.000487566, achieved: 0.000487566
   dataset 968, input: 0.000795356, achieved: 0.000795356
   dataset 969, input: 0.000846615, achieved: 0.000846615
   dataset 970, input: 0.00077272, achieved: 0.00077272
   dataset 971, input: 0.000839523, achieved: 0.000839523
   dataset 972, input: 0.000831319, achieved: 0.000831319
   dataset 973, input: 0.000938535, achieved: 0.000938535
   dataset 974, input: 0.000954499, achieved: 0.000954499
   dataset 975, input: 0.000517185, achieved: 0.000517185
   dataset 976, input: 0.000959271, achieved: 0.000959271
   dataset 977, input: 0.000892843, achieved: 0.000892843
   dataset 978, input: 0.000813767, achieved: 0.000813767
   dataset 979, input: 0.000743331, achieved: 0.000743331
   dataset 980, input: 0.000970877, achieved: 0.000970877
   dataset 981, input: 0.000809443, achieved: 0.000809443
   dataset 982, input: 0.00086548, achieved: 0.00086548
   dataset 983, input: 0.000802667, achieved: 0.000802667
   dataset 984, input: 0.000792725, achieved: 0.000792725
   dataset 985, input: 0.000919405, achieved: 0.000919405
   dataset 986, input: 0.000747833, achieved: 0.000747833
   dataset 987, input: 0.000890626, achieved: 0.000890626
   dataset 988, input: 0.000862942, achieved: 0.000862942
   dataset 989, input: 0.000880938, achieved: 0.000880938
   dataset 990, input: 0.000513448, achieved: 0.000513448
   dataset 991, input: 0.000500029, achieved: 0.000500029
   dataset 992, input: 0.00051479, achieved: 0.00051479
   dataset 993, input: 0.000518336, achieved: 0.000518336
   dataset 994, input: 0.000511537, achieved: 0.000511537
   dataset 995, input: 0.000871421, achieved: 0.000871421
   dataset 996, input: 0.00083969, achieved: 0.00083969
   dataset 997, input: 0.000492499, achieved: 0.000492499
   dataset 998, input: 0.000926595, achieved: 0.000926595
   dataset 999, input: 0.000917632, achieved: 0.000917632
   dataset 1000, input: 0.000902411, achieved: 0.000902411
   dataset 1001, input: 0.00083665, achieved: 0.00083665
   dataset 1002, input: 0.000817319, achieved: 0.000817319
   dataset 1003, input: 0.00080493, achieved: 0.00080493
   dataset 1004, input: 0.000801856, achieved: 0.000801856
   dataset 1005, input: 0.000793059, achieved: 0.000793059
   dataset 1006, input: 0.000802984, achieved: 0.000802984
   dataset 1007, input: 0.000785046, achieved: 0.000785046
   dataset 1008, input: 0.000782996, achieved: 0.000782996
   dataset 1009, input: 0.000772167, achieved: 0.000772167
   dataset 1010, input: 0.000744748, achieved: 0.000744748
   dataset 1011, input: 0.000744771, achieved: 0.000744771
   dataset 1012, input: 0.000755432, achieved: 0.000755432
   dataset 1013, input: 0.000747724, achieved: 0.000747724
   dataset 1014, input: 0.000742577, achieved: 0.000742577
   dataset 1015, input: 0.000747154, achieved: 0.000747154
   dataset 1016, input: 0.000741242, achieved: 0.000741242
   dataset 1017, input: 0.000737016, achieved: 0.000737016
   dataset 1018, input: 0.000725203, achieved: 0.000725203
   dataset 1019, input: 0.000532728, achieved: 0.000532728
   dataset 1020, input: 0.000537788, achieved: 0.000537788
   dataset 1021, input: 0.000518359, achieved: 0.000518359
   dataset 1022, input: 0.0005224, achieved: 0.0005224
   dataset 1023, input: 0.00053407, achieved: 0.00053407
   dataset 1024, input: 0.000529867, achieved: 0.000529867
   dataset 1025, input: 0.000519608, achieved: 0.000519608
   dataset 1026, input: 0.000520368, achieved: 0.000520368
   dataset 1027, input: 0.000528278, achieved: 0.000528278
   dataset 1028, input: 0.00051631, achieved: 0.00051631
   dataset 1029, input: 0.000499977, achieved: 0.000499977
   dataset 1030, input: 0.000511462, achieved: 0.000511462
   dataset 1031, input: 0.000501676, achieved: 0.000501676
   dataset 1032, input: 0.000483968, achieved: 0.000483968
   dataset 1033, input: 0.000497047, achieved: 0.000497047
   dataset 1034, input: 0.000965241, achieved: 0.000965241
   dataset 1035, input: 0.000893631, achieved: 0.000893631
   dataset 1036, input: 0.000792276, achieved: 0.000792276
   dataset 1037, input: 0.000812149, achieved: 0.000812149
   dataset 1038, input: 0.000779801, achieved: 0.000779801
   dataset 1039, input: 0.000767389, achieved: 0.000767389
   dataset 1040, input: 0.000780768, achieved: 0.000780768
   dataset 1041, input: 0.000748691, achieved: 0.000748691
   dataset 1042, input: 0.000746446, achieved: 0.000746446
   dataset 1043, input: 0.000758898, achieved: 0.000758898
   dataset 1044, input: 0.000737569, achieved: 0.000737569
   dataset 1045, input: 0.000737258, achieved: 0.000737258
   dataset 1046, input: 0.000734339, achieved: 0.000734339
   dataset 1047, input: 0.000829788, achieved: 0.000829788
   dataset 1048, input: 0.000944067, achieved: 0.000944067
   dataset 1049, input: 0.000740344, achieved: 0.000740344
   dataset 1050, input: 0.0008554, achieved: 0.0008554
   dataset 1051, input: 0.000902428, achieved: 0.000902428
   dataset 1052, input: 0.000537483, achieved: 0.000537483
   dataset 1053, input: 0.000527743, achieved: 0.000527743
   dataset 1054, input: 0.000966605, achieved: 0.000966605
   dataset 1055, input: 0.00052597, achieved: 0.00052597
   dataset 1056, input: 0.000525624, achieved: 0.000525624
   dataset 1057, input: 0.000511917, achieved: 0.000511917
   dataset 1058, input: 0.000517409, achieved: 0.000517409
   dataset 1059, input: 0.000518751, achieved: 0.000518751
   dataset 1060, input: 0.000502862, achieved: 0.000502862
   dataset 1061, input: 0.000940636, achieved: 0.000940636
   dataset 1062, input: 0.000509183, achieved: 0.000509183
   dataset 1063, input: 0.000489868, achieved: 0.000489868
   dataset 1064, input: 0.000501607, achieved: 0.000501607
   dataset 1065, input: 0.000520357, achieved: 0.000520357
   dataset 1066, input: 0.000504819, achieved: 0.000504819
   dataset 1067, input: 0.000495124, achieved: 0.000495124
   dataset 1068, input: 0.00050433, achieved: 0.00050433
   dataset 1069, input: 0.000496259, achieved: 0.000496259
   dataset 1070, input: 0.000496921, achieved: 0.000496921
   dataset 1071, input: 0.000503748, achieved: 0.000503748
   dataset 1072, input: 0.000503909, achieved: 0.000503909
   dataset 1073, input: 0.00051418, achieved: 0.00051418
   dataset 1074, input: 0.000507133, achieved: 0.000507133
   dataset 1075, input: 0.00096447, achieved: 0.00096447
   dataset 1076, input: 0.000485263, achieved: 0.000485263
   dataset 1077, input: 0.000972345, achieved: 0.000972345
   dataset 1078, input: 0.000958966, achieved: 0.000958966
   dataset 1079, input: 0.000914431, achieved: 0.000914431
   dataset 1080, input: 0.00046318, achieved: 0.00046318
   dataset 1081, input: 0.000540638, achieved: 0.000540638
   dataset 1082, input: 0.000524755, achieved: 0.000524755
   dataset 1083, input: 0.000541979, achieved: 0.000541979
   dataset 1084, input: 0.000535797, achieved: 0.000535797
   dataset 1085, input: 0.000507628, achieved: 0.000507628
   dataset 1086, input: 0.000522061, achieved: 0.000522061
   dataset 1087, input: 0.00051236, achieved: 0.00051236
   dataset 1088, input: 0.000476426, achieved: 0.000476426
   dataset 1089, input: 0.000523085, achieved: 0.000523085
   dataset 1090, input: 0.000883776, achieved: 0.000883776
   dataset 1091, input: 0.000917741, achieved: 0.000917741
   dataset 1092, input: 0.000892238, achieved: 0.000892238
   dataset 1093, input: 0.000882492, achieved: 0.000882492
   dataset 1094, input: 0.000867708, achieved: 0.000867708
   dataset 1095, input: 0.000831797, achieved: 0.000831797
   dataset 1096, input: 0.000831331, achieved: 0.000831331
   dataset 1097, input: 0.000664307, achieved: 0.000664307
   dataset 1098, input: 0.00586935, achieved: 0.00586935
   dataset 1099, input: 0.00614957, achieved: 0.00614957
   dataset 1100, input: 0.00595225, achieved: 0.00595225
   dataset 1101, input: 0.00596338, achieved: 0.00596338
   dataset 1102, input: 0.00481169, achieved: 0.00481169
   dataset 1103, input: 0.00538109, achieved: 0.00538109
   dataset 1104, input: 0.00548131, achieved: 0.00548131
   dataset 1105, input: 0.00176982, achieved: 0.00176982
   dataset 1106, input: 0.00505266, achieved: 0.00505266
[2025-03-12 09:18:56][I][data/gpt_dataset:191:megatron.data.gpt_dataset] [BuildConcatDataset] Caught args.shuffle_sample_in_corpus=True across 173705836 samples
> building indices for blendable datasets ...
 > sample ratios:
   dataset 0, input: 0.00201113, achieved: 0.00201113
   dataset 1, input: 0.00203688, achieved: 0.00203688
   dataset 2, input: 0.00202405, achieved: 0.00202405
   dataset 3, input: 0.00200025, achieved: 0.00200025
   dataset 4, input: 0.00203933, achieved: 0.00203933
   dataset 5, input: 0.00203023, achieved: 0.00203023
   dataset 6, input: 0.00201149, achieved: 0.00201149
   dataset 7, input: 0.00206305, achieved: 0.00206305
   dataset 8, input: 0.00200524, achieved: 0.00200524
   dataset 9, input: 0.00212208, achieved: 0.00212208
   dataset 10, input: 0.00200116, achieved: 0.00200116
   dataset 11, input: 0.00201295, achieved: 0.00201295
   dataset 12, input: 0.00205124, achieved: 0.00205124
   dataset 13, input: 0.00199443, achieved: 0.00199443
   dataset 14, input: 0.00200746, achieved: 0.00200746
   dataset 15, input: 0.00205857, achieved: 0.00205857
   dataset 16, input: 0.00202633, achieved: 0.00202633
   dataset 17, input: 0.0020443, achieved: 0.0020443
   dataset 18, input: 0.00205557, achieved: 0.00205557
   dataset 19, input: 0.00200902, achieved: 0.00200902
   dataset 20, input: 0.00203939, achieved: 0.00203939
   dataset 21, input: 0.00203243, achieved: 0.00203243
   dataset 22, input: 0.00204173, achieved: 0.00204173
   dataset 23, input: 0.00198551, achieved: 0.00198551
   dataset 24, input: 0.00204998, achieved: 0.00204998
   dataset 25, input: 0.00205231, achieved: 0.00205231
   dataset 26, input: 0.00201839, achieved: 0.00201839
   dataset 27, input: 0.00201409, achieved: 0.00201409
   dataset 28, input: 0.00203525, achieved: 0.00203525
   dataset 29, input: 0.00196943, achieved: 0.00196943
   dataset 30, input: 0.00201016, achieved: 0.00201016
   dataset 31, input: 0.0019708, achieved: 0.0019708
   dataset 32, input: 0.00202278, achieved: 0.00202278
   dataset 33, input: 0.00201717, achieved: 0.00201717
   dataset 34, input: 0.00199786, achieved: 0.00199786
   dataset 35, input: 0.00199099, achieved: 0.00199099
   dataset 36, input: 0.00203597, achieved: 0.00203597
   dataset 37, input: 0.00199189, achieved: 0.00199189
   dataset 38, input: 0.00203205, achieved: 0.00203205
   dataset 39, input: 0.00199151, achieved: 0.00199151
   dataset 40, input: 0.00204107, achieved: 0.00204107
   dataset 41, input: 0.00202455, achieved: 0.00202455
   dataset 42, input: 0.00202748, achieved: 0.00202748
   dataset 43, input: 0.00203717, achieved: 0.00203717
   dataset 44, input: 0.00199452, achieved: 0.00199452
   dataset 45, input: 0.00200821, achieved: 0.00200821
   dataset 46, input: 0.00203652, achieved: 0.00203652
   dataset 47, input: 0.00201588, achieved: 0.00201588
   dataset 48, input: 0.00200612, achieved: 0.00200612
   dataset 49, input: 0.00201153, achieved: 0.00201153
   dataset 50, input: 0.00197638, achieved: 0.00197638
   dataset 51, input: 0.00199369, achieved: 0.00199369
   dataset 52, input: 0.00197681, achieved: 0.00197681
   dataset 53, input: 0.00201665, achieved: 0.00201665
   dataset 54, input: 0.00201861, achieved: 0.00201861
   dataset 55, input: 0.00200126, achieved: 0.00200126
   dataset 56, input: 0.00202826, achieved: 0.00202826
   dataset 57, input: 0.00201311, achieved: 0.00201311
   dataset 58, input: 0.00197356, achieved: 0.00197356
   dataset 59, input: 0.00199114, achieved: 0.00199114
   dataset 60, input: 0.00197452, achieved: 0.00197452
   dataset 61, input: 0.00202484, achieved: 0.00202484
   dataset 62, input: 0.00199182, achieved: 0.00199182
   dataset 63, input: 0.00201434, achieved: 0.00201434
   dataset 64, input: 0.00199537, achieved: 0.00199537
   dataset 65, input: 0.00199465, achieved: 0.00199465
   dataset 66, input: 0.00203359, achieved: 0.00203359
   dataset 67, input: 0.0020232, achieved: 0.0020232
   dataset 68, input: 0.00207336, achieved: 0.00207336
   dataset 69, input: 0.00200639, achieved: 0.00200639
   dataset 70, input: 0.00205115, achieved: 0.00205115
   dataset 71, input: 0.00199221, achieved: 0.00199221
   dataset 72, input: 0.00204669, achieved: 0.00204669
   dataset 73, input: 0.00198504, achieved: 0.00198504
   dataset 74, input: 0.00201612, achieved: 0.00201612
   dataset 75, input: 0.00198663, achieved: 0.00198663
   dataset 76, input: 0.00203672, achieved: 0.00203672
   dataset 77, input: 0.00198562, achieved: 0.00198562
   dataset 78, input: 0.00200258, achieved: 0.00200258
   dataset 79, input: 0.00203129, achieved: 0.00203129
   dataset 80, input: 0.00202148, achieved: 0.00202148
   dataset 81, input: 0.00196622, achieved: 0.00196622
   dataset 82, input: 0.0020394, achieved: 0.0020394
   dataset 83, input: 0.00199123, achieved: 0.00199123
   dataset 84, input: 0.00203435, achieved: 0.00203435
   dataset 85, input: 0.00199754, achieved: 0.00199754
   dataset 86, input: 0.00199452, achieved: 0.00199452
   dataset 87, input: 0.00203307, achieved: 0.00203307
   dataset 88, input: 0.00195825, achieved: 0.00195825
   dataset 89, input: 0.00200366, achieved: 0.00200366
   dataset 90, input: 0.00200647, achieved: 0.00200647
   dataset 91, input: 0.00199224, achieved: 0.00199224
   dataset 92, input: 0.00200645, achieved: 0.00200645
   dataset 93, input: 0.00199574, achieved: 0.00199574
   dataset 94, input: 0.00197874, achieved: 0.00197874
   dataset 95, input: 0.00200092, achieved: 0.00200092
   dataset 96, input: 0.00197647, achieved: 0.00197647
   dataset 97, input: 0.0020373, achieved: 0.0020373
   dataset 98, input: 0.00200963, achieved: 0.00200963
   dataset 99, input: 0.00200034, achieved: 0.00200034
   dataset 100, input: 0.00201418, achieved: 0.00201418
   dataset 101, input: 0.00206569, achieved: 0.00206569
   dataset 102, input: 0.00199721, achieved: 0.00199721
   dataset 103, input: 0.00203849, achieved: 0.00203849
   dataset 104, input: 0.00200207, achieved: 0.00200207
   dataset 105, input: 0.00201067, achieved: 0.00201067
   dataset 106, input: 0.00202678, achieved: 0.00202678
   dataset 107, input: 0.00199882, achieved: 0.00199882
   dataset 108, input: 0.00205433, achieved: 0.00205433
   dataset 109, input: 0.00197736, achieved: 0.00197736
   dataset 110, input: 0.00203965, achieved: 0.00203965
   dataset 111, input: 0.00201698, achieved: 0.00201698
   dataset 112, input: 0.00200679, achieved: 0.00200679
   dataset 113, input: 0.00204826, achieved: 0.00204826
   dataset 114, input: 0.0020257, achieved: 0.0020257
   dataset 115, input: 0.00202766, achieved: 0.00202766
   dataset 116, input: 0.00199577, achieved: 0.00199577
   dataset 117, input: 0.00204043, achieved: 0.00204043
   dataset 118, input: 0.00200747, achieved: 0.00200747
   dataset 119, input: 0.00206065, achieved: 0.00206065
   dataset 120, input: 0.00200509, achieved: 0.00200509
   dataset 121, input: 0.00204367, achieved: 0.00204367
   dataset 122, input: 0.00199742, achieved: 0.00199742
   dataset 123, input: 0.00204939, achieved: 0.00204939
   dataset 124, input: 0.00203831, achieved: 0.00203831
   dataset 125, input: 0.00203946, achieved: 0.00203946
   dataset 126, input: 0.0020197, achieved: 0.0020197
   dataset 127, input: 0.00203092, achieved: 0.00203092
   dataset 128, input: 0.00198265, achieved: 0.00198265
   dataset 129, input: 0.00201218, achieved: 0.00201218
   dataset 130, input: 0.00200084, achieved: 0.00200084
   dataset 131, input: 0.00196281, achieved: 0.00196281
   dataset 132, input: 0.00201619, achieved: 0.00201619
   dataset 133, input: 0.00200941, achieved: 0.00200941
   dataset 134, input: 0.00202944, achieved: 0.00202944
   dataset 135, input: 0.00205041, achieved: 0.00205041
   dataset 136, input: 0.00198587, achieved: 0.00198587
   dataset 137, input: 0.00199012, achieved: 0.00199012
   dataset 138, input: 0.00199975, achieved: 0.00199975
   dataset 139, input: 0.00198554, achieved: 0.00198554
   dataset 140, input: 0.0020279, achieved: 0.0020279
   dataset 141, input: 0.00195539, achieved: 0.00195539
   dataset 142, input: 0.00203125, achieved: 0.00203125
   dataset 143, input: 0.00197089, achieved: 0.00197089
   dataset 144, input: 0.00200413, achieved: 0.00200413
   dataset 145, input: 0.00202022, achieved: 0.00202022
   dataset 146, input: 0.00205043, achieved: 0.00205043
   dataset 147, input: 0.00200325, achieved: 0.00200325
   dataset 148, input: 0.00201247, achieved: 0.00201247
   dataset 149, input: 0.00200451, achieved: 0.00200451
   dataset 150, input: 0.00200506, achieved: 0.00200506
   dataset 151, input: 0.00198113, achieved: 0.00198113
   dataset 152, input: 0.00198442, achieved: 0.00198442
   dataset 153, input: 0.00201764, achieved: 0.00201764
   dataset 154, input: 0.00199903, achieved: 0.00199903
   dataset 155, input: 0.00199982, achieved: 0.00199982
   dataset 156, input: 0.00202104, achieved: 0.00202104
   dataset 157, input: 0.0020341, achieved: 0.0020341
   dataset 158, input: 0.00201057, achieved: 0.00201057
   dataset 159, input: 0.00196761, achieved: 0.00196761
   dataset 160, input: 0.00197657, achieved: 0.00197657
   dataset 161, input: 0.00198405, achieved: 0.00198405
   dataset 162, input: 0.00199077, achieved: 0.00199077
   dataset 163, input: 0.00200681, achieved: 0.00200681
   dataset 164, input: 0.00203225, achieved: 0.00203225
   dataset 165, input: 0.00200364, achieved: 0.00200364
   dataset 166, input: 0.00201747, achieved: 0.00201747
   dataset 167, input: 0.00197556, achieved: 0.00197556
   dataset 168, input: 0.00200294, achieved: 0.00200294
   dataset 169, input: 0.00201973, achieved: 0.00201973
   dataset 170, input: 0.00197594, achieved: 0.00197594
   dataset 171, input: 0.00203594, achieved: 0.00203594
   dataset 172, input: 0.00197428, achieved: 0.00197428
   dataset 173, input: 0.00201685, achieved: 0.00201685
   dataset 174, input: 0.00197956, achieved: 0.00197956
   dataset 175, input: 0.00198333, achieved: 0.00198333
   dataset 176, input: 0.00200983, achieved: 0.00200983
   dataset 177, input: 0.00196253, achieved: 0.00196253
   dataset 178, input: 0.00204462, achieved: 0.00204462
   dataset 179, input: 0.00201332, achieved: 0.00201332
   dataset 180, input: 0.00199941, achieved: 0.00199941
   dataset 181, input: 0.00201077, achieved: 0.00201077
   dataset 182, input: 0.00198192, achieved: 0.00198192
   dataset 183, input: 0.00200514, achieved: 0.00200514
   dataset 184, input: 0.00197811, achieved: 0.00197811
   dataset 185, input: 0.00198718, achieved: 0.00198718
   dataset 186, input: 0.00198823, achieved: 0.00198823
   dataset 187, input: 0.00201967, achieved: 0.00201967
   dataset 188, input: 0.00201973, achieved: 0.00201973
   dataset 189, input: 0.00197839, achieved: 0.00197839
   dataset 190, input: 0.00202711, achieved: 0.00202711
   dataset 191, input: 0.00198607, achieved: 0.00198607
   dataset 192, input: 0.00200322, achieved: 0.00200322
   dataset 193, input: 0.00195696, achieved: 0.00195696
   dataset 194, input: 0.00201389, achieved: 0.00201389
   dataset 195, input: 0.00197174, achieved: 0.00197174
   dataset 196, input: 0.00197988, achieved: 0.00197988
   dataset 197, input: 0.00198332, achieved: 0.00198332
   dataset 198, input: 0.00193868, achieved: 0.00193868
   dataset 199, input: 0.00200076, achieved: 0.00200076
   dataset 200, input: 0.00196373, achieved: 0.00196373
   dataset 201, input: 0.00199055, achieved: 0.00199055
   dataset 202, input: 0.00197423, achieved: 0.00197423
   dataset 203, input: 0.00198089, achieved: 0.00198089
   dataset 204, input: 0.00196067, achieved: 0.00196067
   dataset 205, input: 0.00200831, achieved: 0.00200831
   dataset 206, input: 0.00197001, achieved: 0.00197001
   dataset 207, input: 0.00203996, achieved: 0.00203996
   dataset 208, input: 0.00198582, achieved: 0.00198582
   dataset 209, input: 0.00203606, achieved: 0.00203606
   dataset 210, input: 0.00202745, achieved: 0.00202745
   dataset 211, input: 0.00199511, achieved: 0.00199511
   dataset 212, input: 0.00201206, achieved: 0.00201206
   dataset 213, input: 0.00202258, achieved: 0.00202258
   dataset 214, input: 0.0019911, achieved: 0.0019911
   dataset 215, input: 0.00203567, achieved: 0.00203567
   dataset 216, input: 0.00197059, achieved: 0.00197059
   dataset 217, input: 0.00199777, achieved: 0.00199777
   dataset 218, input: 0.0020007, achieved: 0.0020007
   dataset 219, input: 0.00199421, achieved: 0.00199421
   dataset 220, input: 0.00201738, achieved: 0.00201738
   dataset 221, input: 0.00197962, achieved: 0.00197962
   dataset 222, input: 0.00196012, achieved: 0.00196012
   dataset 223, input: 0.00201847, achieved: 0.00201847
   dataset 224, input: 0.00200071, achieved: 0.00200071
   dataset 225, input: 0.00199779, achieved: 0.00199779
   dataset 226, input: 0.00194927, achieved: 0.00194927
   dataset 227, input: 0.00203959, achieved: 0.00203959
   dataset 228, input: 0.00195352, achieved: 0.00195352
   dataset 229, input: 0.00201395, achieved: 0.00201395
   dataset 230, input: 0.00197575, achieved: 0.00197575
   dataset 231, input: 0.00198012, achieved: 0.00198012
   dataset 232, input: 0.00202959, achieved: 0.00202959
   dataset 233, input: 0.00198276, achieved: 0.00198276
   dataset 234, input: 0.00202782, achieved: 0.00202782
   dataset 235, input: 0.00201818, achieved: 0.00201818
   dataset 236, input: 0.00198894, achieved: 0.00198894
   dataset 237, input: 0.00202542, achieved: 0.00202542
   dataset 238, input: 0.00201675, achieved: 0.00201675
   dataset 239, input: 0.00198354, achieved: 0.00198354
   dataset 240, input: 0.00204488, achieved: 0.00204488
   dataset 241, input: 0.00195691, achieved: 0.00195691
   dataset 242, input: 0.00203593, achieved: 0.00203593
   dataset 243, input: 0.0019985, achieved: 0.0019985
   dataset 244, input: 0.00200537, achieved: 0.00200537
   dataset 245, input: 0.00198656, achieved: 0.00198656
   dataset 246, input: 0.00198817, achieved: 0.00198817
   dataset 247, input: 0.0019854, achieved: 0.0019854
   dataset 248, input: 0.00200875, achieved: 0.00200875
   dataset 249, input: 0.00199226, achieved: 0.00199226
   dataset 250, input: 0.00200942, achieved: 0.00200942
   dataset 251, input: 0.00194812, achieved: 0.00194812
   dataset 252, input: 0.00199182, achieved: 0.00199182
   dataset 253, input: 0.00198928, achieved: 0.00198928
   dataset 254, input: 0.00194932, achieved: 0.00194932
   dataset 255, input: 0.00198438, achieved: 0.00198438
   dataset 256, input: 0.00193166, achieved: 0.00193166
   dataset 257, input: 0.00203037, achieved: 0.00203037
   dataset 258, input: 0.0019644, achieved: 0.0019644
   dataset 259, input: 0.00196128, achieved: 0.00196128
   dataset 260, input: 0.00195087, achieved: 0.00195087
   dataset 261, input: 0.00199522, achieved: 0.00199522
   dataset 262, input: 0.00194635, achieved: 0.00194635
   dataset 263, input: 0.00200943, achieved: 0.00200943
   dataset 264, input: 0.00198645, achieved: 0.00198645
   dataset 265, input: 0.00197595, achieved: 0.00197595
   dataset 266, input: 0.00200411, achieved: 0.00200411
   dataset 267, input: 0.001968, achieved: 0.001968
   dataset 268, input: 0.00201966, achieved: 0.00201966
   dataset 269, input: 0.00197707, achieved: 0.00197707
   dataset 270, input: 0.0019676, achieved: 0.0019676
   dataset 271, input: 0.00200136, achieved: 0.00200136
   dataset 272, input: 0.00199096, achieved: 0.00199096
   dataset 273, input: 0.00199364, achieved: 0.00199364
   dataset 274, input: 0.00199713, achieved: 0.00199713
   dataset 275, input: 0.00199779, achieved: 0.00199779
   dataset 276, input: 0.00199867, achieved: 0.00199867
   dataset 277, input: 0.0019876, achieved: 0.0019876
   dataset 278, input: 0.00200159, achieved: 0.00200159
   dataset 279, input: 0.00198123, achieved: 0.00198123
   dataset 280, input: 0.00200744, achieved: 0.00200744
   dataset 281, input: 0.00200768, achieved: 0.00200768
   dataset 282, input: 0.00200034, achieved: 0.00200034
   dataset 283, input: 0.00200793, achieved: 0.00200793
   dataset 284, input: 0.00198041, achieved: 0.00198041
   dataset 285, input: 0.00199651, achieved: 0.00199651
   dataset 286, input: 0.00198473, achieved: 0.00198473
   dataset 287, input: 0.00198241, achieved: 0.00198241
   dataset 288, input: 0.00197559, achieved: 0.00197559
   dataset 289, input: 0.0019839, achieved: 0.0019839
   dataset 290, input: 0.00202364, achieved: 0.00202364
   dataset 291, input: 0.00195941, achieved: 0.00195941
   dataset 292, input: 0.00201392, achieved: 0.00201392
   dataset 293, input: 0.00198147, achieved: 0.00198147
   dataset 294, input: 0.00198221, achieved: 0.00198221
   dataset 295, input: 0.00196622, achieved: 0.00196622
   dataset 296, input: 0.00198548, achieved: 0.00198548
   dataset 297, input: 0.00201581, achieved: 0.00201581
   dataset 298, input: 0.0019925, achieved: 0.0019925
   dataset 299, input: 0.00201974, achieved: 0.00201974
   dataset 300, input: 0.00198622, achieved: 0.00198622
   dataset 301, input: 0.0019734, achieved: 0.0019734
   dataset 302, input: 0.00205455, achieved: 0.00205455
   dataset 303, input: 0.00199679, achieved: 0.00199679
   dataset 304, input: 0.00200021, achieved: 0.00200021
   dataset 305, input: 0.00198209, achieved: 0.00198209
   dataset 306, input: 0.00199429, achieved: 0.00199429
   dataset 307, input: 0.00199805, achieved: 0.00199805
   dataset 308, input: 0.00198826, achieved: 0.00198826
   dataset 309, input: 0.00205141, achieved: 0.00205141
   dataset 310, input: 0.00198975, achieved: 0.00198975
   dataset 311, input: 0.00199753, achieved: 0.00199753
   dataset 312, input: 0.00200083, achieved: 0.00200083
   dataset 313, input: 0.00197186, achieved: 0.00197186
   dataset 314, input: 0.0019857, achieved: 0.0019857
   dataset 315, input: 0.00199205, achieved: 0.00199205
   dataset 316, input: 0.00197297, achieved: 0.00197297
   dataset 317, input: 0.00202313, achieved: 0.00202313
   dataset 318, input: 0.00197722, achieved: 0.00197722
   dataset 319, input: 0.00199539, achieved: 0.00199539
   dataset 320, input: 0.00197916, achieved: 0.00197916
   dataset 321, input: 0.00199966, achieved: 0.00199966
   dataset 322, input: 0.00199809, achieved: 0.00199809
   dataset 323, input: 0.00198669, achieved: 0.00198669
   dataset 324, input: 0.00196921, achieved: 0.00196921
   dataset 325, input: 0.00198212, achieved: 0.00198212
   dataset 326, input: 0.00198042, achieved: 0.00198042
   dataset 327, input: 0.00197724, achieved: 0.00197724
   dataset 328, input: 0.00199837, achieved: 0.00199837
   dataset 329, input: 0.00197848, achieved: 0.00197848
   dataset 330, input: 0.00201437, achieved: 0.00201437
   dataset 331, input: 0.00197589, achieved: 0.00197589
   dataset 332, input: 0.00198677, achieved: 0.00198677
   dataset 333, input: 0.00200253, achieved: 0.00200253
   dataset 334, input: 0.0019858, achieved: 0.0019858
   dataset 335, input: 0.00203128, achieved: 0.00203128
   dataset 336, input: 0.00198225, achieved: 0.00198225
   dataset 337, input: 0.00202889, achieved: 0.00202889
   dataset 338, input: 0.0019937, achieved: 0.0019937
   dataset 339, input: 0.00204612, achieved: 0.00204612
   dataset 340, input: 0.00198548, achieved: 0.00198548
   dataset 341, input: 0.00202937, achieved: 0.00202937
   dataset 342, input: 0.00202249, achieved: 0.00202249
   dataset 343, input: 0.00204788, achieved: 0.00204788
   dataset 344, input: 0.00201989, achieved: 0.00201989
   dataset 345, input: 0.00201566, achieved: 0.00201566
   dataset 346, input: 0.00198901, achieved: 0.00198901
   dataset 347, input: 0.00203753, achieved: 0.00203753
   dataset 348, input: 0.00201961, achieved: 0.00201961
   dataset 349, input: 0.0020532, achieved: 0.0020532
   dataset 350, input: 0.00200715, achieved: 0.00200715
   dataset 351, input: 0.00203764, achieved: 0.00203764
   dataset 352, input: 0.00202134, achieved: 0.00202134
   dataset 353, input: 0.00201916, achieved: 0.00201916
   dataset 354, input: 0.00202358, achieved: 0.00202358
   dataset 355, input: 0.00199338, achieved: 0.00199338
   dataset 356, input: 0.00198666, achieved: 0.00198666
   dataset 357, input: 0.00201884, achieved: 0.00201884
   dataset 358, input: 0.00201062, achieved: 0.00201062
   dataset 359, input: 0.00193955, achieved: 0.00193955
   dataset 360, input: 0.00201229, achieved: 0.00201229
   dataset 361, input: 0.00197716, achieved: 0.00197716
   dataset 362, input: 0.0019915, achieved: 0.0019915
   dataset 363, input: 0.00195477, achieved: 0.00195477
   dataset 364, input: 0.00196181, achieved: 0.00196181
   dataset 365, input: 0.00197722, achieved: 0.00197722
   dataset 366, input: 0.00195339, achieved: 0.00195339
   dataset 367, input: 0.00199259, achieved: 0.00199259
   dataset 368, input: 0.00202141, achieved: 0.00202141
   dataset 369, input: 0.00201304, achieved: 0.00201304
   dataset 370, input: 0.00197201, achieved: 0.00197201
   dataset 371, input: 0.00196272, achieved: 0.00196272
   dataset 372, input: 0.00199954, achieved: 0.00199954
   dataset 373, input: 0.00197394, achieved: 0.00197394
   dataset 374, input: 0.00197392, achieved: 0.00197392
   dataset 375, input: 0.00199316, achieved: 0.00199316
   dataset 376, input: 0.00197107, achieved: 0.00197107
   dataset 377, input: 0.00195359, achieved: 0.00195359
   dataset 378, input: 0.00197163, achieved: 0.00197163
   dataset 379, input: 0.00199371, achieved: 0.00199371
   dataset 380, input: 0.00195553, achieved: 0.00195553
   dataset 381, input: 0.00197494, achieved: 0.00197494
   dataset 382, input: 0.00196467, achieved: 0.00196467
   dataset 383, input: 0.00197625, achieved: 0.00197625
   dataset 384, input: 0.00197162, achieved: 0.00197162
   dataset 385, input: 0.00198161, achieved: 0.00198161
   dataset 386, input: 0.00197809, achieved: 0.00197809
   dataset 387, input: 0.00197793, achieved: 0.00197793
   dataset 388, input: 0.00196932, achieved: 0.00196932
   dataset 389, input: 0.00196275, achieved: 0.00196275
   dataset 390, input: 0.00207994, achieved: 0.00207994
   dataset 391, input: 0.00196643, achieved: 0.00196643
   dataset 392, input: 0.00199967, achieved: 0.00199967
   dataset 393, input: 0.00196605, achieved: 0.00196605
   dataset 394, input: 0.001958, achieved: 0.001958
   dataset 395, input: 0.00200748, achieved: 0.00200748
   dataset 396, input: 0.00195977, achieved: 0.00195977
   dataset 397, input: 0.00199792, achieved: 0.00199792
   dataset 398, input: 0.00195095, achieved: 0.00195095
   dataset 399, input: 0.00199475, achieved: 0.00199475
   dataset 400, input: 0.00198847, achieved: 0.00198847
   dataset 401, input: 0.0020001, achieved: 0.0020001
   dataset 402, input: 0.00194736, achieved: 0.00194736
   dataset 403, input: 0.00197202, achieved: 0.00197202
   dataset 404, input: 0.00194803, achieved: 0.00194803
   dataset 405, input: 0.00197877, achieved: 0.00197877
   dataset 406, input: 0.00194879, achieved: 0.00194879
   dataset 407, input: 0.00197684, achieved: 0.00197684
   dataset 408, input: 0.00195959, achieved: 0.00195959
   dataset 409, input: 0.00196377, achieved: 0.00196377
   dataset 410, input: 0.0019844, achieved: 0.0019844
   dataset 411, input: 0.00196261, achieved: 0.00196261
   dataset 412, input: 0.00205223, achieved: 0.00205223
   dataset 413, input: 0.00197666, achieved: 0.00197666
   dataset 414, input: 0.00194823, achieved: 0.00194823
   dataset 415, input: 0.0020134, achieved: 0.0020134
   dataset 416, input: 0.001984, achieved: 0.001984
   dataset 417, input: 0.00197547, achieved: 0.00197547
   dataset 418, input: 0.00198856, achieved: 0.00198856
   dataset 419, input: 0.00200158, achieved: 0.00200158
   dataset 420, input: 0.00198087, achieved: 0.00198087
   dataset 421, input: 0.00196932, achieved: 0.00196932
   dataset 422, input: 0.00200459, achieved: 0.00200459
   dataset 423, input: 0.00201814, achieved: 0.00201814
   dataset 424, input: 0.00198034, achieved: 0.00198034
   dataset 425, input: 0.00200004, achieved: 0.00200004
   dataset 426, input: 0.00199841, achieved: 0.00199841
   dataset 427, input: 0.00197246, achieved: 0.00197246
   dataset 428, input: 0.00200993, achieved: 0.00200993
   dataset 429, input: 0.00196069, achieved: 0.00196069
   dataset 430, input: 0.00199288, achieved: 0.00199288
   dataset 431, input: 0.00196902, achieved: 0.00196902
   dataset 432, input: 0.00200046, achieved: 0.00200046
   dataset 433, input: 0.00196048, achieved: 0.00196048
   dataset 434, input: 0.00202402, achieved: 0.00202402
   dataset 435, input: 0.00198936, achieved: 0.00198936
   dataset 436, input: 0.00199778, achieved: 0.00199778
   dataset 437, input: 0.00196553, achieved: 0.00196553
   dataset 438, input: 0.00197938, achieved: 0.00197938
   dataset 439, input: 0.00198927, achieved: 0.00198927
   dataset 440, input: 0.00197504, achieved: 0.00197504
   dataset 441, input: 0.00197566, achieved: 0.00197566
   dataset 442, input: 0.0019837, achieved: 0.0019837
   dataset 443, input: 0.00197988, achieved: 0.00197988
   dataset 444, input: 0.00200077, achieved: 0.00200077
   dataset 445, input: 0.00202142, achieved: 0.00202142
   dataset 446, input: 0.00206867, achieved: 0.00206867
   dataset 447, input: 0.00201977, achieved: 0.00201977
   dataset 448, input: 0.00201878, achieved: 0.00201878
   dataset 449, input: 0.00200204, achieved: 0.00200204
   dataset 450, input: 0.00200432, achieved: 0.00200432
   dataset 451, input: 0.00200904, achieved: 0.00200904
   dataset 452, input: 0.00199522, achieved: 0.00199522
   dataset 453, input: 0.00202661, achieved: 0.00202661
   dataset 454, input: 0.00199262, achieved: 0.00199262
   dataset 455, input: 0.0019765, achieved: 0.0019765
   dataset 456, input: 0.00195362, achieved: 0.00195362
   dataset 457, input: 0.00198928, achieved: 0.00198928
   dataset 458, input: 0.00200292, achieved: 0.00200292
   dataset 459, input: 0.00197836, achieved: 0.00197836
   dataset 460, input: 0.00199555, achieved: 0.00199555
   dataset 461, input: 0.00201214, achieved: 0.00201214
   dataset 462, input: 0.00198803, achieved: 0.00198803
   dataset 463, input: 0.002008, achieved: 0.002008
   dataset 464, input: 0.00199764, achieved: 0.00199764
   dataset 465, input: 0.0020095, achieved: 0.0020095
   dataset 466, input: 0.00198134, achieved: 0.00198134
   dataset 467, input: 0.00200822, achieved: 0.00200822
   dataset 468, input: 0.00199996, achieved: 0.00199996
   dataset 469, input: 0.00200902, achieved: 0.00200902
   dataset 470, input: 0.00198222, achieved: 0.00198222
   dataset 471, input: 0.00201403, achieved: 0.00201403
   dataset 472, input: 0.00198319, achieved: 0.00198319
   dataset 473, input: 0.00199601, achieved: 0.00199601
   dataset 474, input: 0.00199925, achieved: 0.00199925
   dataset 475, input: 0.00197859, achieved: 0.00197859
   dataset 476, input: 0.00203123, achieved: 0.00203123
   dataset 477, input: 0.00195045, achieved: 0.00195045
   dataset 478, input: 0.00197523, achieved: 0.00197523
   dataset 479, input: 0.00201481, achieved: 0.00201481
   dataset 480, input: 0.00200153, achieved: 0.00200153
   dataset 481, input: 0.0019841, achieved: 0.0019841
   dataset 482, input: 0.00198675, achieved: 0.00198675
   dataset 483, input: 0.00199473, achieved: 0.00199473
   dataset 484, input: 0.00199132, achieved: 0.00199132
   dataset 485, input: 0.00205589, achieved: 0.00205589
   dataset 486, input: 0.00198399, achieved: 0.00198399
   dataset 487, input: 0.002051, achieved: 0.002051
   dataset 488, input: 0.00200399, achieved: 0.00200399
   dataset 489, input: 0.00197661, achieved: 0.00197661
   dataset 490, input: 0.00205123, achieved: 0.00205123
   dataset 491, input: 0.00198621, achieved: 0.00198621
   dataset 492, input: 0.00198749, achieved: 0.00198749
   dataset 493, input: 0.00198305, achieved: 0.00198305
   dataset 494, input: 0.0020149, achieved: 0.0020149
   dataset 495, input: 0.0019891, achieved: 0.0019891
   dataset 496, input: 0.00198587, achieved: 0.00198587
   dataset 497, input: 0.00200655, achieved: 0.00200655
   dataset 498, input: 0.00201202, achieved: 0.00201202
   dataset 499, input: 0.00204447, achieved: 0.00204447
[2025-03-12 09:19:59][I][data/gpt_dataset:191:megatron.data.gpt_dataset] [BuildConcatDataset] Caught args.shuffle_sample_in_corpus=True across 86572112 samples
> building indices for blendable datasets ...
 > sample ratios:
   dataset 0, input: 0.000965894, achieved: 0.000965894
   dataset 1, input: 0.00373222, achieved: 0.00373222
   dataset 2, input: 0.000860371, achieved: 0.000860371
   dataset 3, input: 0.00369798, achieved: 0.00369798
   dataset 4, input: 0.00355292, achieved: 0.00355292
   dataset 5, input: 0.00366778, achieved: 0.00366778
   dataset 6, input: 0.000820216, achieved: 0.000820216
   dataset 7, input: 0.0020784, achieved: 0.0020784
   dataset 8, input: 0.00199747, achieved: 0.00199747
   dataset 9, input: 0.00204509, achieved: 0.00204509
   dataset 10, input: 0.00190938, achieved: 0.00190938
   dataset 11, input: 0.00214439, achieved: 0.00214439
   dataset 12, input: 0.00208307, achieved: 0.00208307
   dataset 13, input: 0.00187015, achieved: 0.00187015
   dataset 14, input: 0.00189412, achieved: 0.00189412
   dataset 15, input: 0.00198159, achieved: 0.00198159
   dataset 16, input: 0.00213069, achieved: 0.00213069
   dataset 17, input: 0.00200805, achieved: 0.00200805
   dataset 18, input: 0.00178642, achieved: 0.00178642
   dataset 19, input: 0.00208929, achieved: 0.00208929
   dataset 20, input: 0.00212976, achieved: 0.00212976
   dataset 21, input: 0.00217272, achieved: 0.00217272
   dataset 22, input: 0.00201677, achieved: 0.00201677
   dataset 23, input: 0.00174471, achieved: 0.00174471
   dataset 24, input: 0.00196914, achieved: 0.00196914
   dataset 25, input: 0.00208525, achieved: 0.00208525
   dataset 26, input: 0.00189443, achieved: 0.00189443
   dataset 27, input: 0.00176557, achieved: 0.00176557
   dataset 28, input: 0.00187825, achieved: 0.00187825
   dataset 29, input: 0.00185428, achieved: 0.00185428
   dataset 30, input: 0.00203171, achieved: 0.00203171
   dataset 31, input: 0.00190377, achieved: 0.00190377
   dataset 32, input: 0.00224058, achieved: 0.00224058
   dataset 33, input: 0.00181444, achieved: 0.00181444
   dataset 34, input: 0.00197661, achieved: 0.00197661
   dataset 35, input: 0.00211731, achieved: 0.00211731
   dataset 36, input: 0.00201334, achieved: 0.00201334
   dataset 37, input: 0.00173164, achieved: 0.00173164
   dataset 38, input: 0.00196914, achieved: 0.00196914
   dataset 39, input: 0.00205786, achieved: 0.00205786
   dataset 40, input: 0.00186704, achieved: 0.00186704
   dataset 41, input: 0.00192992, achieved: 0.00192992
   dataset 42, input: 0.00195949, achieved: 0.00195949
   dataset 43, input: 0.00208089, achieved: 0.00208089
   dataset 44, input: 0.00189661, achieved: 0.00189661
   dataset 45, input: 0.00201241, achieved: 0.00201241
   dataset 46, input: 0.0019791, achieved: 0.0019791
   dataset 47, input: 0.00189848, achieved: 0.00189848
   dataset 48, input: 0.00189692, achieved: 0.00189692
   dataset 49, input: 0.00185366, achieved: 0.00185366
   dataset 50, input: 0.00199965, achieved: 0.00199965
   dataset 51, input: 0.00194206, achieved: 0.00194206
   dataset 52, input: 0.00851593, achieved: 0.00851593
   dataset 53, input: 0.00882534, achieved: 0.00882534
   dataset 54, input: 0.00817197, achieved: 0.00817197
   dataset 55, input: 0.00866908, achieved: 0.00866908
   dataset 56, input: 0.00837492, achieved: 0.00837492
   dataset 57, input: 0.00773774, achieved: 0.00773774
   dataset 58, input: 0.00822551, achieved: 0.00822551
   dataset 59, input: 0.00744233, achieved: 0.00744233
   dataset 60, input: 0.00692406, achieved: 0.00692406
   dataset 61, input: 0.00874627, achieved: 0.00874627
   dataset 62, input: 0.00748716, achieved: 0.00748716
   dataset 63, input: 0.00874192, achieved: 0.00874192
   dataset 64, input: 0.00939404, achieved: 0.00939404
   dataset 65, input: 0.00797026, achieved: 0.00797026
   dataset 66, input: 0.00762287, achieved: 0.00762287
   dataset 67, input: 0.00896074, achieved: 0.00896074
   dataset 68, input: 0.00764404, achieved: 0.00764404
   dataset 69, input: 0.00732311, achieved: 0.00732311
   dataset 70, input: 0.00830239, achieved: 0.00830239
   dataset 71, input: 0.0076926, achieved: 0.0076926
   dataset 72, input: 0.00858846, achieved: 0.00858846
   dataset 73, input: 0.00768171, achieved: 0.00768171
   dataset 74, input: 0.00854861, achieved: 0.00854861
   dataset 75, input: 0.00920759, achieved: 0.00920759
   dataset 76, input: 0.00845616, achieved: 0.00845616
   dataset 77, input: 0.00914066, achieved: 0.00914066
   dataset 78, input: 0.00782427, achieved: 0.00782427
   dataset 79, input: 0.00842784, achieved: 0.00842784
   dataset 80, input: 0.00789555, achieved: 0.00789555
   dataset 81, input: 0.00868277, achieved: 0.00868277
   dataset 82, input: 0.00796248, achieved: 0.00796248
   dataset 83, input: 0.0074165, achieved: 0.0074165
   dataset 84, input: 0.00838457, achieved: 0.00838457
   dataset 85, input: 0.00816481, achieved: 0.00816481
   dataset 86, input: 0.00740965, achieved: 0.00740965
   dataset 87, input: 0.00783952, achieved: 0.00783952
   dataset 88, input: 0.00873662, achieved: 0.00873662
   dataset 89, input: 0.00877553, achieved: 0.00877553
   dataset 90, input: 0.00749525, achieved: 0.00749525
   dataset 91, input: 0.00822271, achieved: 0.00822271
   dataset 92, input: 0.00737634, achieved: 0.00737634
   dataset 93, input: 0.00892339, achieved: 0.00892339
   dataset 94, input: 0.00788061, achieved: 0.00788061
   dataset 95, input: 0.00921288, achieved: 0.00921288
   dataset 96, input: 0.00833383, achieved: 0.00833383
   dataset 97, input: 0.00815983, achieved: 0.00815983
   dataset 98, input: 0.00785197, achieved: 0.00785197
   dataset 99, input: 0.0087609, achieved: 0.0087609
   dataset 100, input: 0.00836122, achieved: 0.00836122
   dataset 101, input: 0.00835687, achieved: 0.00835687
   dataset 102, input: 0.0075326, achieved: 0.0075326
   dataset 103, input: 0.00779688, achieved: 0.00779688
   dataset 104, input: 0.00841694, achieved: 0.00841694
   dataset 105, input: 0.00863453, achieved: 0.00863453
   dataset 106, input: 0.00874876, achieved: 0.00874876
   dataset 107, input: 0.00861274, achieved: 0.00861274
   dataset 108, input: 0.00857383, achieved: 0.00857383
   dataset 109, input: 0.00764248, achieved: 0.00764248
   dataset 110, input: 0.0085483, achieved: 0.0085483
   dataset 111, input: 0.00844433, achieved: 0.00844433
   dataset 112, input: 0.00797898, achieved: 0.00797898
   dataset 113, input: 0.00808792, achieved: 0.00808792
   dataset 114, input: 0.00336927, achieved: 0.00336927
   dataset 115, input: 0.00219482, achieved: 0.00219482
   dataset 116, input: 0.0054237, achieved: 0.0054237
   dataset 117, input: 0.00465049, achieved: 0.00465049
   dataset 118, input: 0.00344304, achieved: 0.00344304
   dataset 119, input: 0.00410606, achieved: 0.00410606
   dataset 120, input: 0.00387167, achieved: 0.00387167
   dataset 121, input: 0.0040575, achieved: 0.0040575
   dataset 122, input: 0.00389159, achieved: 0.00389159
   dataset 123, input: 0.00385704, achieved: 0.00385704
   dataset 124, input: 0.00394544, achieved: 0.00394544
   dataset 125, input: 0.00367027, achieved: 0.00367027
   dataset 126, input: 0.00365564, achieved: 0.00365564
   dataset 127, input: 0.00382467, achieved: 0.00382467
   dataset 128, input: 0.00407805, achieved: 0.00407805
   dataset 129, input: 0.00326966, achieved: 0.00326966
   dataset 130, input: 0.00399805, achieved: 0.00399805
   dataset 131, input: 0.00235014, achieved: 0.00235014
   dataset 132, input: 0.00196634, achieved: 0.00196634
   dataset 133, input: 0.00344491, achieved: 0.00344491
   dataset 134, input: 0.00379323, achieved: 0.00379323
   dataset 135, input: 0.00142129, achieved: 0.00142129
   dataset 136, input: 0.00332942, achieved: 0.00332942
   dataset 137, input: 0.00301721, achieved: 0.00301721
   dataset 138, input: 0.00423742, achieved: 0.00423742
   dataset 139, input: 0.00424863, achieved: 0.00424863
   dataset 140, input: 0.0063538, achieved: 0.0063538
   dataset 141, input: 0.00418762, achieved: 0.00418762
   dataset 142, input: 0.00323448, achieved: 0.00323448
   dataset 143, input: 0.00261722, achieved: 0.00261722
   dataset 144, input: 0.00267201, achieved: 0.00267201
   dataset 145, input: 0.00267325, achieved: 0.00267325
   dataset 146, input: 0.00254065, achieved: 0.00254065
   dataset 147, input: 0.00254781, achieved: 0.00254781
   dataset 148, input: 0.00237069, achieved: 0.00237069
   dataset 149, input: 0.00252415, achieved: 0.00252415
   dataset 150, input: 0.00246252, achieved: 0.00246252
   dataset 151, input: 0.00254189, achieved: 0.00254189
   dataset 152, input: 0.00262718, achieved: 0.00262718
   dataset 153, input: 0.00240275, achieved: 0.00240275
   dataset 154, input: 0.0023212, achieved: 0.0023212
   dataset 155, input: 0.00254781, achieved: 0.00254781
   dataset 156, input: 0.00243761, achieved: 0.00243761
   dataset 157, input: 0.00256991, achieved: 0.00256991
   dataset 158, input: 0.00254189, achieved: 0.00254189
   dataset 159, input: 0.00261255, achieved: 0.00261255
   dataset 160, input: 0.00240337, achieved: 0.00240337
   dataset 161, input: 0.00259263, achieved: 0.00259263
   dataset 162, input: 0.00253722, achieved: 0.00253722
   dataset 163, input: 0.00246096, achieved: 0.00246096
   dataset 164, input: 0.00225925, achieved: 0.00225925
   dataset 165, input: 0.00250423, achieved: 0.00250423
   dataset 166, input: 0.00202268, achieved: 0.00202268
   dataset 167, input: 0.0013111, achieved: 0.0013111
   dataset 168, input: 0.00125476, achieved: 0.00125476
   dataset 169, input: 0.00145896, achieved: 0.00145896
   dataset 170, input: 0.00140137, achieved: 0.00140137
   dataset 171, input: 0.00116605, achieved: 0.00116605
   dataset 172, input: 0.00132511, achieved: 0.00132511
   dataset 173, input: 0.0013939, achieved: 0.0013939
   dataset 174, input: 0.00127468, achieved: 0.00127468
   dataset 175, input: 0.00137367, achieved: 0.00137367
   dataset 176, input: 0.00133943, achieved: 0.00133943
   dataset 177, input: 0.00121803, achieved: 0.00121803
   dataset 178, input: 0.00137585, achieved: 0.00137585
   dataset 179, input: 0.00133289, achieved: 0.00133289
   dataset 180, input: 0.00185677, achieved: 0.00185677
   dataset 181, input: 0.000504892, achieved: 0.000504892
   dataset 182, input: 0.00314577, achieved: 0.00314577
   dataset 183, input: 0.0028927, achieved: 0.0028927
   dataset 184, input: 0.00265364, achieved: 0.00265364
   dataset 185, input: 0.00227855, achieved: 0.00227855
   dataset 186, input: 0.0034972, achieved: 0.0034972
   dataset 187, input: 0.00437127, achieved: 0.00437127
   dataset 188, input: 0.00135157, achieved: 0.00135157
   dataset 189, input: 0.00277659, achieved: 0.00277659
   dataset 190, input: 0.00277286, achieved: 0.00277286
   dataset 191, input: 0.00273302, achieved: 0.00273302
   dataset 192, input: 0.00270251, achieved: 0.00270251
   dataset 193, input: 0.00280056, achieved: 0.00280056
   dataset 194, input: 0.00287091, achieved: 0.00287091
   dataset 195, input: 0.00263029, achieved: 0.00263029
   dataset 196, input: 0.00314857, achieved: 0.00314857
   dataset 197, input: 0.0029478, achieved: 0.0029478
   dataset 198, input: 0.00308383, achieved: 0.00308383
   dataset 199, input: 0.000790645, achieved: 0.000790645
   dataset 200, input: 0.00204976, achieved: 0.00204976
   dataset 201, input: 0.00168214, achieved: 0.00168214
   dataset 202, input: 0.00171887, achieved: 0.00171887
   dataset 203, input: 0.00195576, achieved: 0.00195576
   dataset 204, input: 0.00199218, achieved: 0.00199218
   dataset 205, input: 0.00205132, achieved: 0.00205132
   dataset 206, input: 0.0020177, achieved: 0.0020177
   dataset 207, input: 0.00205474, achieved: 0.00205474
   dataset 208, input: 0.00187638, achieved: 0.00187638
   dataset 209, input: 0.00189599, achieved: 0.00189599
   dataset 210, input: 0.00211046, achieved: 0.00211046
   dataset 211, input: 0.0021226, achieved: 0.0021226
   dataset 212, input: 0.00188012, achieved: 0.00188012
   dataset 213, input: 0.0020314, achieved: 0.0020314
   dataset 214, input: 0.00168245, achieved: 0.00168245
   dataset 215, input: 0.00231155, achieved: 0.00231155
   dataset 216, input: 0.00169179, achieved: 0.00169179
   dataset 217, input: 0.00196758, achieved: 0.00196758
   dataset 218, input: 0.00146954, achieved: 0.00146954
   dataset 219, input: 0.00189194, achieved: 0.00189194
   dataset 220, input: 0.00179763, achieved: 0.00179763
   dataset 221, input: 0.00196634, achieved: 0.00196634
   dataset 222, input: 0.00183467, achieved: 0.00183467
   dataset 223, input: 0.00213443, achieved: 0.00213443
   dataset 224, input: 0.00187638, achieved: 0.00187638
   dataset 225, input: 0.00193397, achieved: 0.00193397
   dataset 226, input: 0.00237038, achieved: 0.00237038
   dataset 227, input: 0.00170642, achieved: 0.00170642
   dataset 228, input: 0.00174658, achieved: 0.00174658
   dataset 229, input: 0.000758583, achieved: 0.000758583
   dataset 230, input: 0.00167965, achieved: 0.00167965
   dataset 231, input: 0.00197941, achieved: 0.00197941
   dataset 232, input: 0.00421439, achieved: 0.00421439
   dataset 233, input: 0.00431835, achieved: 0.00431835
   dataset 234, input: 0.0035579, achieved: 0.0035579
   dataset 235, input: 0.00370171, achieved: 0.00370171
   dataset 236, input: 0.00189942, achieved: 0.00189942
   dataset 237, input: 0.00250329, achieved: 0.00250329
   dataset 238, input: 0.00499974, achieved: 0.00499974
   dataset 239, input: 0.00092605, achieved: 0.00092605
   dataset 240, input: 0.0019044, achieved: 0.0019044
   dataset 241, input: 0.0019458, achieved: 0.0019458
   dataset 242, input: 0.00158627, achieved: 0.00158627
   dataset 243, input: 0.00399711, achieved: 0.00399711
   dataset 244, input: 0.00557654, achieved: 0.00557654
   dataset 245, input: 0.00230159, achieved: 0.00230159
   dataset 246, input: 0.0017058, achieved: 0.0017058
   dataset 247, input: 0.00237505, achieved: 0.00237505
   dataset 248, input: 0.00192214, achieved: 0.00192214
   dataset 249, input: 0.00204696, achieved: 0.00204696
   dataset 250, input: 0.00197755, achieved: 0.00197755
   dataset 251, input: 0.00167592, achieved: 0.00167592
   dataset 252, input: 0.00195358, achieved: 0.00195358
   dataset 253, input: 0.00214626, achieved: 0.00214626
   dataset 254, input: 0.00203856, achieved: 0.00203856
   dataset 255, input: 0.000746443, achieved: 0.000746443
   dataset 256, input: 0.00500628, achieved: 0.00500628
   dataset 257, input: 0.00535522, achieved: 0.00535522
   dataset 258, input: 0.00502215, achieved: 0.00502215
   dataset 259, input: 0.00479181, achieved: 0.00479181
   dataset 260, input: 0.00486682, achieved: 0.00486682
   dataset 261, input: 0.00111095, achieved: 0.00111095
[2025-03-12 09:20:01][I][data/gpt_dataset:191:megatron.data.gpt_dataset] [BuildConcatDataset] Caught args.shuffle_sample_in_corpus=True across 3212568 samples
> building indices for blendable datasets ...
 > sample ratios:
   dataset 0, input: 0.0835781, achieved: 0.0835781
   dataset 1, input: 0.0834322, achieved: 0.0834322
   dataset 2, input: 0.0510322, achieved: 0.0510322
   dataset 3, input: 0.104354, achieved: 0.104354
   dataset 4, input: 0.0513543, achieved: 0.0513543
   dataset 5, input: 0.00400847, achieved: 0.00400847
   dataset 6, input: 0.115667, achieved: 0.115667
   dataset 7, input: 0.0827875, achieved: 0.0827875
   dataset 8, input: 0.103788, achieved: 0.103788
   dataset 9, input: 0.11266, achieved: 0.11266
   dataset 10, input: 0.0508509, achieved: 0.0508509
   dataset 11, input: 0.0513192, achieved: 0.0513192
   dataset 12, input: 0.105168, achieved: 0.105168
[2025-03-12 09:20:01][I][data/gpt_dataset:191:megatron.data.gpt_dataset] [BuildConcatDataset] Caught args.shuffle_sample_in_corpus=True across 8520716 samples
> building indices for blendable datasets ...
 > sample ratios:
   dataset 0, input: 0.0182376, achieved: 0.0182376
   dataset 1, input: 0.0182962, achieved: 0.0182962
   dataset 2, input: 0.018299, achieved: 0.018299
   dataset 3, input: 0.0182779, achieved: 0.0182779
   dataset 4, input: 0.0182861, achieved: 0.0182861
   dataset 5, input: 0.0181745, achieved: 0.0181745
   dataset 6, input: 0.0183693, achieved: 0.0183693
   dataset 7, input: 0.0220028, achieved: 0.0220028
   dataset 8, input: 0.0486005, achieved: 0.0486005
   dataset 9, input: 0.0484891, achieved: 0.0484891
   dataset 10, input: 0.0512474, achieved: 0.0512474
   dataset 11, input: 0.0512, achieved: 0.0512
   dataset 12, input: 0.0512732, achieved: 0.0512732
   dataset 13, input: 0.0485441, achieved: 0.0485441
   dataset 14, input: 0.0485733, achieved: 0.0485733
   dataset 15, input: 0.0511485, achieved: 0.0511485
   dataset 16, input: 0.0485108, achieved: 0.0485108
   dataset 17, input: 0.0485108, achieved: 0.0485108
   dataset 18, input: 0.0487117, achieved: 0.0487117
   dataset 19, input: 0.0511297, achieved: 0.0511297
   dataset 20, input: 0.0487391, achieved: 0.0487391
   dataset 21, input: 0.0512226, achieved: 0.0512226
   dataset 22, input: 0.0486001, achieved: 0.0486001
   dataset 23, input: 0.0487371, achieved: 0.0487371
   dataset 24, input: 0.0511531, achieved: 0.0511531
   dataset 25, input: 0.00566538, achieved: 0.00566538
[2025-03-12 09:20:04][I][data/gpt_dataset:191:megatron.data.gpt_dataset] [BuildConcatDataset] Caught args.shuffle_sample_in_corpus=True across 33633053 samples
> building indices for blendable datasets ...
 > sample ratios:
   dataset 0, input: 0.0130268, achieved: 0.0130268
   dataset 1, input: 0.0134792, achieved: 0.0134792
   dataset 2, input: 0.0136289, achieved: 0.0136289
   dataset 3, input: 0.013172, achieved: 0.013172
   dataset 4, input: 0.0132409, achieved: 0.0132409
   dataset 5, input: 0.0133521, achieved: 0.0133521
   dataset 6, input: 0.0134121, achieved: 0.0134121
   dataset 7, input: 0.0128782, achieved: 0.0128782
   dataset 8, input: 0.0128348, achieved: 0.0128348
   dataset 9, input: 0.013055, achieved: 0.013055
   dataset 10, input: 0.0128415, achieved: 0.0128415
   dataset 11, input: 0.0128921, achieved: 0.0128921
   dataset 12, input: 0.0128222, achieved: 0.0128222
   dataset 13, input: 0.0128935, achieved: 0.0128935
   dataset 14, input: 0.0131081, achieved: 0.0131081
   dataset 15, input: 0.0129717, achieved: 0.0129717
   dataset 16, input: 0.013004, achieved: 0.013004
   dataset 17, input: 0.0128759, achieved: 0.0128759
   dataset 18, input: 0.0129287, achieved: 0.0129287
   dataset 19, input: 0.0130294, achieved: 0.0130294
   dataset 20, input: 0.0128648, achieved: 0.0128648
   dataset 21, input: 0.0131353, achieved: 0.0131353
   dataset 22, input: 0.0129143, achieved: 0.0129143
   dataset 23, input: 0.0129005, achieved: 0.0129005
   dataset 24, input: 0.0132579, achieved: 0.0132579
   dataset 25, input: 0.0129311, achieved: 0.0129311
   dataset 26, input: 0.0132489, achieved: 0.0132489
   dataset 27, input: 0.0131445, achieved: 0.0131445
   dataset 28, input: 0.0131264, achieved: 0.0131264
   dataset 29, input: 0.0128914, achieved: 0.0128914
   dataset 30, input: 0.0129347, achieved: 0.0129347
   dataset 31, input: 0.0132695, achieved: 0.0132695
   dataset 32, input: 0.0129615, achieved: 0.0129615
   dataset 33, input: 0.0129188, achieved: 0.0129188
   dataset 34, input: 0.0128966, achieved: 0.0128966
   dataset 35, input: 0.0128921, achieved: 0.0128921
   dataset 36, input: 0.0131809, achieved: 0.0131809
   dataset 37, input: 0.0130498, achieved: 0.0130498
   dataset 38, input: 0.0129444, achieved: 0.0129444
   dataset 39, input: 0.0130167, achieved: 0.0130167
   dataset 40, input: 0.0127474, achieved: 0.0127474
   dataset 41, input: 0.0127562, achieved: 0.0127562
   dataset 42, input: 0.01274, achieved: 0.01274
   dataset 43, input: 0.012751, achieved: 0.012751
   dataset 44, input: 0.012733, achieved: 0.012733
   dataset 45, input: 0.012737, achieved: 0.012737
   dataset 46, input: 0.0127356, achieved: 0.0127356
   dataset 47, input: 0.0127288, achieved: 0.0127288
   dataset 48, input: 0.0127203, achieved: 0.0127203
   dataset 49, input: 0.0127124, achieved: 0.0127124
   dataset 50, input: 0.0126972, achieved: 0.0126972
   dataset 51, input: 0.0127032, achieved: 0.0127032
   dataset 52, input: 0.0126907, achieved: 0.0126907
   dataset 53, input: 0.0126746, achieved: 0.0126746
   dataset 54, input: 0.0126725, achieved: 0.0126725
   dataset 55, input: 0.0126815, achieved: 0.0126815
   dataset 56, input: 0.0126769, achieved: 0.0126769
   dataset 57, input: 0.0126962, achieved: 0.0126962
   dataset 58, input: 0.0126882, achieved: 0.0126882
   dataset 59, input: 0.012665, achieved: 0.012665
   dataset 60, input: 0.0126732, achieved: 0.0126732
   dataset 61, input: 0.0126632, achieved: 0.0126632
   dataset 62, input: 0.0126542, achieved: 0.0126542
   dataset 63, input: 0.0126673, achieved: 0.0126673
   dataset 64, input: 0.0126969, achieved: 0.0126969
   dataset 65, input: 0.0138538, achieved: 0.0138538
   dataset 66, input: 0.0124739, achieved: 0.0124739
   dataset 67, input: 0.0124994, achieved: 0.0124994
   dataset 68, input: 0.0123766, achieved: 0.0123766
   dataset 69, input: 0.0124441, achieved: 0.0124441
   dataset 70, input: 0.0122456, achieved: 0.0122456
   dataset 71, input: 0.0124694, achieved: 0.0124694
   dataset 72, input: 0.0121932, achieved: 0.0121932
   dataset 73, input: 0.0122485, achieved: 0.0122485
   dataset 74, input: 0.0117788, achieved: 0.0117788
   dataset 75, input: 0.0133205, achieved: 0.0133205
   dataset 76, input: 0.0131683, achieved: 0.0131683
   dataset 77, input: 0.00943809, achieved: 0.00943809
[2025-03-12 09:20:07][I][data/gpt_dataset:191:megatron.data.gpt_dataset] [BuildConcatDataset] Caught args.shuffle_sample_in_corpus=True across 21697919 samples
> building indices for blendable datasets ...
 > sample ratios:
   dataset 0, input: 0.0380461, achieved: 0.0380461
   dataset 1, input: 0.0413854, achieved: 0.0413854
   dataset 2, input: 0.0406095, achieved: 0.0406095
   dataset 3, input: 0.0365558, achieved: 0.0365558
   dataset 4, input: 0.0341426, achieved: 0.0341426
   dataset 5, input: 0.0350147, achieved: 0.0350147
   dataset 6, input: 0.0358745, achieved: 0.0358745
   dataset 7, input: 0.036827, achieved: 0.036827
   dataset 8, input: 0.0375283, achieved: 0.0375283
   dataset 9, input: 0.0379557, achieved: 0.0379557
   dataset 10, input: 0.0381706, achieved: 0.0381706
   dataset 11, input: 0.0385559, achieved: 0.0385559
   dataset 12, input: 0.0388884, achieved: 0.0388884
   dataset 13, input: 0.0391665, achieved: 0.0391665
   dataset 14, input: 0.0393856, achieved: 0.0393856
   dataset 15, input: 0.0397974, achieved: 0.0397974
   dataset 16, input: 0.0400668, achieved: 0.0400668
   dataset 17, input: 0.0403879, achieved: 0.0403879
   dataset 18, input: 0.0408309, achieved: 0.0408309
   dataset 19, input: 0.0411837, achieved: 0.0411837
   dataset 20, input: 0.0418468, achieved: 0.0418468
   dataset 21, input: 0.0425558, achieved: 0.0425558
   dataset 22, input: 0.0428142, achieved: 0.0428142
   dataset 23, input: 0.0425711, achieved: 0.0425711
   dataset 24, input: 0.0388549, achieved: 0.0388549
   dataset 25, input: 0.0209839, achieved: 0.0209839
[2025-03-12 09:20:08][I][data/gpt_dataset:191:megatron.data.gpt_dataset] [BuildConcatDataset] Caught args.shuffle_sample_in_corpus=True across 12890828 samples
> building indices for blendable datasets ...
 > sample ratios:
   dataset 0, input: 0.0235833, achieved: 0.0235833
   dataset 1, input: 0.0216057, achieved: 0.0216057
   dataset 2, input: 0.027075, achieved: 0.027075
   dataset 3, input: 0.0271066, achieved: 0.0271066
   dataset 4, input: 0.0274384, achieved: 0.0274384
   dataset 5, input: 0.0257867, achieved: 0.0257867
   dataset 6, input: 0.0255337, achieved: 0.0255337
   dataset 7, input: 0.027977, achieved: 0.027977
   dataset 8, input: 0.0270093, achieved: 0.0270093
   dataset 9, input: 0.0285904, achieved: 0.0285904
   dataset 10, input: 0.0283677, achieved: 0.0283677
   dataset 11, input: 0.0153407, achieved: 0.0153407
   dataset 12, input: 0.014138, achieved: 0.014138
   dataset 13, input: 0.0141798, achieved: 0.0141798
   dataset 14, input: 0.0141663, achieved: 0.0141663
   dataset 15, input: 0.0150215, achieved: 0.0150215
   dataset 16, input: 0.0280528, achieved: 0.0280528
   dataset 17, input: 0.0234551, achieved: 0.0234551
   dataset 18, input: 0.0247761, achieved: 0.0247761
   dataset 19, input: 0.0205734, achieved: 0.0205734
   dataset 20, input: 0.0205842, achieved: 0.0205842
   dataset 21, input: 0.020579, achieved: 0.020579
   dataset 22, input: 0.0205939, achieved: 0.0205939
   dataset 23, input: 0.0203349, achieved: 0.0203349
   dataset 24, input: 0.0199823, achieved: 0.0199823
   dataset 25, input: 0.0199573, achieved: 0.0199573
   dataset 26, input: 0.0199854, achieved: 0.0199854
   dataset 27, input: 0.0168266, achieved: 0.0168266
   dataset 28, input: 0.0172125, achieved: 0.0172125
   dataset 29, input: 0.018342, achieved: 0.018342
   dataset 30, input: 0.0149189, achieved: 0.0149189
   dataset 31, input: 0.0149787, achieved: 0.0149787
   dataset 32, input: 0.0149735, achieved: 0.0149735
   dataset 33, input: 0.0149414, achieved: 0.0149414
   dataset 34, input: 0.0149689, achieved: 0.0149689
   dataset 35, input: 0.0149673, achieved: 0.0149673
   dataset 36, input: 0.0230039, achieved: 0.0230039
   dataset 37, input: 0.0215731, achieved: 0.0215731
   dataset 38, input: 0.0215682, achieved: 0.0215682
   dataset 39, input: 0.0211097, achieved: 0.0211097
   dataset 40, input: 0.0190818, achieved: 0.0190818
   dataset 41, input: 0.0191069, achieved: 0.0191069
   dataset 42, input: 0.0189985, achieved: 0.0189985
   dataset 43, input: 0.0186603, achieved: 0.0186603
   dataset 44, input: 0.0219256, achieved: 0.0219256
   dataset 45, input: 0.0310232, achieved: 0.0310232
   dataset 46, input: 0.0189782, achieved: 0.0189782
   dataset 47, input: 0.0184155, achieved: 0.0184155
   dataset 48, input: 0.00263107, achieved: 0.00263107
[2025-03-12 09:20:18][I][data/gpt_dataset:191:megatron.data.gpt_dataset] [BuildConcatDataset] Caught args.shuffle_sample_in_corpus=True across 93109107 samples
> building indices for blendable datasets ...
 > sample ratios:
   dataset 0, input: 0.0180889, achieved: 0.0180889
   dataset 1, input: 0.0157124, achieved: 0.0157124
   dataset 2, input: 0.0156303, achieved: 0.0156303
   dataset 3, input: 0.0150717, achieved: 0.0150717
   dataset 4, input: 0.0142971, achieved: 0.0142971
   dataset 5, input: 0.0129064, achieved: 0.0129064
   dataset 6, input: 0.0162971, achieved: 0.0162971
   dataset 7, input: 0.0174739, achieved: 0.0174739
   dataset 8, input: 0.0174769, achieved: 0.0174769
   dataset 9, input: 0.0148979, achieved: 0.0148979
   dataset 10, input: 0.0158722, achieved: 0.0158722
   dataset 11, input: 0.0164827, achieved: 0.0164827
   dataset 12, input: 0.0147446, achieved: 0.0147446
   dataset 13, input: 0.0160169, achieved: 0.0160169
   dataset 14, input: 0.016473, achieved: 0.016473
   dataset 15, input: 0.0169843, achieved: 0.0169843
   dataset 16, input: 0.0147628, achieved: 0.0147628
   dataset 17, input: 0.0152291, achieved: 0.0152291
   dataset 18, input: 0.0156107, achieved: 0.0156107
   dataset 19, input: 0.0155986, achieved: 0.0155986
   dataset 20, input: 0.0157209, achieved: 0.0157209
   dataset 21, input: 0.0135195, achieved: 0.0135195
   dataset 22, input: 0.0106685, achieved: 0.0106685
   dataset 23, input: 0.012068, achieved: 0.012068
   dataset 24, input: 0.0143946, achieved: 0.0143946
   dataset 25, input: 0.0133679, achieved: 0.0133679
   dataset 26, input: 0.0116764, achieved: 0.0116764
   dataset 27, input: 0.0121379, achieved: 0.0121379
   dataset 28, input: 0.0188352, achieved: 0.0188352
   dataset 29, input: 0.0185594, achieved: 0.0185594
   dataset 30, input: 0.0184982, achieved: 0.0184982
   dataset 31, input: 0.0163945, achieved: 0.0163945
   dataset 32, input: 0.0160773, achieved: 0.0160773
   dataset 33, input: 0.0161072, achieved: 0.0161072
   dataset 34, input: 0.0160858, achieved: 0.0160858
   dataset 35, input: 0.0160155, achieved: 0.0160155
   dataset 36, input: 0.0157348, achieved: 0.0157348
   dataset 37, input: 0.0157353, achieved: 0.0157353
   dataset 38, input: 0.0155213, achieved: 0.0155213
   dataset 39, input: 0.0154415, achieved: 0.0154415
   dataset 40, input: 0.0172908, achieved: 0.0172908
   dataset 41, input: 0.0117513, achieved: 0.0117513
   dataset 42, input: 0.0169456, achieved: 0.0169456
   dataset 43, input: 0.0181807, achieved: 0.0181807
   dataset 44, input: 0.0184588, achieved: 0.0184588
   dataset 45, input: 0.0184332, achieved: 0.0184332
   dataset 46, input: 0.020941, achieved: 0.020941
   dataset 47, input: 0.0209187, achieved: 0.0209187
   dataset 48, input: 0.0186022, achieved: 0.0186022
   dataset 49, input: 0.01186, achieved: 0.01186
   dataset 50, input: 0.0118779, achieved: 0.0118779
   dataset 51, input: 0.0118403, achieved: 0.0118403
   dataset 52, input: 0.0118559, achieved: 0.0118559
   dataset 53, input: 0.0118533, achieved: 0.0118533
   dataset 54, input: 0.0118465, achieved: 0.0118465
   dataset 55, input: 0.011829, achieved: 0.011829
   dataset 56, input: 0.0118282, achieved: 0.0118282
   dataset 57, input: 0.0155565, achieved: 0.0155565
   dataset 58, input: 0.0138154, achieved: 0.0138154
   dataset 59, input: 0.0173808, achieved: 0.0173808
   dataset 60, input: 0.0151028, achieved: 0.0151028
   dataset 61, input: 0.0143709, achieved: 0.0143709
   dataset 62, input: 0.0144483, achieved: 0.0144483
   dataset 63, input: 0.0170886, achieved: 0.0170886
   dataset 64, input: 0.0156728, achieved: 0.0156728
   dataset 65, input: 0.00206319, achieved: 0.00206319
[2025-03-12 09:20:19][I][data/gpt_dataset:191:megatron.data.gpt_dataset] [BuildConcatDataset] Caught args.shuffle_sample_in_corpus=True across 8932775 samples
> building indices for blendable datasets ...
 > sample ratios:
   dataset 0, input: 0.658845, achieved: 0.658845
   dataset 1, input: 0.341155, achieved: 0.341155
[2025-03-12 09:20:19][I][data/gpt_dataset:191:megatron.data.gpt_dataset] [BuildConcatDataset] Caught args.shuffle_sample_in_corpus=True across 2642687 samples
[2025-03-12 09:20:19][W][utils/_logger:68:megatron.data.gpt_dataset]  > WARNING: could not find index map files, building on rank 0
    using:
     number of documents:       775
     number of epochs:          59
     sequence length:           4096
     total number of samples:   183996
[2025-03-12 09:20:19][W][utils/_logger:68:megatron.data.gpt_dataset]  > WARNING: could not find index map files, building on rank 0
    using:
     number of documents:       163
     number of epochs:          43
     sequence length:           4096
     total number of samples:   31441
[2025-03-12 09:20:19][W][utils/_logger:68:megatron.data.gpt_dataset]  > WARNING: could not find index map files, building on rank 0
    using:
     number of documents:       140
     number of epochs:          37
     sequence length:           4096
     total number of samples:   28003
[2025-03-12 09:20:19][W][utils/_logger:68:megatron.data.gpt_dataset]  > WARNING: could not find index map files, building on rank 0
    using:
     number of documents:       124
     number of epochs:          36
     sequence length:           4096
     total number of samples:   25067
[2025-03-12 09:20:20][W][utils/_logger:68:megatron.data.gpt_dataset]  > WARNING: could not find index map files, building on rank 0
    using:
     number of documents:       137
     number of epochs:          42
     sequence length:           4096
     total number of samples:   26230
[2025-03-12 09:20:20][W][utils/_logger:68:megatron.data.gpt_dataset]  > WARNING: could not find index map files, building on rank 0
    using:
     number of documents:       15512
     number of epochs:          10
     sequence length:           4096
     total number of samples:   23777
[2025-03-12 09:20:20][W][utils/_logger:68:megatron.data.gpt_dataset]  > WARNING: could not find index map files, building on rank 0
    using:
     number of documents:       15517
     number of epochs:          11
     sequence length:           4096
     total number of samples:   25781
[2025-03-12 09:20:20][W][utils/_logger:68:megatron.data.gpt_dataset]  > WARNING: could not find index map files, building on rank 0
    using:
     number of documents:       15002
     number of epochs:          11
     sequence length:           4096
     total number of samples:   25131
[2025-03-12 09:20:20][W][utils/_logger:68:megatron.data.gpt_dataset]  > WARNING: could not find index map files, building on rank 0
    using:
     number of documents:       14676
     number of epochs:          10
     sequence length:           4096
     total number of samples:   22539
[2025-03-12 09:20:20][W][utils/_logger:68:megatron.data.gpt_dataset]  > WARNING: could not find index map files, building on rank 0
    using:
     number of documents:       14422
     number of epochs:          11
     sequence length:           4096
     total number of samples:   24161
[2025-03-12 09:20:20][W][utils/_logger:68:megatron.data.gpt_dataset]  > WARNING: could not find index map files, building on rank 0
    using:
     number of documents:       14328
     number of epochs:          10
     sequence length:           4096
     total number of samples:   22480
[2025-03-12 09:20:20][W][utils/_logger:68:megatron.data.gpt_dataset]  > WARNING: could not find index map files, building on rank 0
    using:
     number of documents:       14088
     number of epochs:          11
     sequence length:           4096
     total number of samples:   23577
[2025-03-12 09:20:20][W][utils/_logger:68:megatron.data.gpt_dataset]  > WARNING: could not find index map files, building on rank 0
    using:
     number of documents:       13912
     number of epochs:          11
     sequence length:           4096
     total number of samples:   23443
[2025-03-12 09:20:20][W][utils/_logger:68:megatron.data.gpt_dataset]  > WARNING: could not find index map files, building on rank 0
    using:
     number of documents:       19705
     number of epochs:          11
     sequence length:           4096
     total number of samples:   44530
[2025-03-12 09:20:20][W][utils/_logger:68:megatron.data.gpt_dataset]  > WARNING: could not find index map files, building on rank 0
    using:
     number of documents:       14001
     number of epochs:          10
     sequence length:           4096
     total number of samples:   33130
[2025-03-12 09:20:20][W][utils/_logger:68:megatron.data.gpt_dataset]  > WARNING: could not find index map files, building on rank 0
    using:
     number of documents:       14672
     number of epochs:          11
     sequence length:           4096
     total number of samples:   31661
[2025-03-12 09:20:21][W][utils/_logger:68:megatron.data.gpt_dataset]  > WARNING: could not find index map files, building on rank 0
    using:
     number of documents:       12151
     number of epochs:          10
     sequence length:           4096
     total number of samples:   24866
[2025-03-12 09:20:21][W][utils/_logger:68:megatron.data.gpt_dataset]  > WARNING: could not find index map files, building on rank 0
    using:
     number of documents:       23064
     number of epochs:          11
     sequence length:           4096
     total number of samples:   52512
[2025-03-12 09:20:21][W][utils/_logger:68:megatron.data.gpt_dataset]  > WARNING: could not find index map files, building on rank 0
    using:
     number of documents:       15395
     number of epochs:          11
     sequence length:           4096
     total number of samples:   32910
[2025-03-12 09:20:21][W][utils/_logger:68:megatron.data.gpt_dataset]  > WARNING: could not find index map files, building on rank 0
    using:
     number of documents:       15829
     number of epochs:          10
     sequence length:           4096
     total number of samples:   33963
[2025-03-12 09:20:21][W][utils/_logger:68:megatron.data.gpt_dataset]  > WARNING: could not find index map files, building on rank 0
    using:
     number of documents:       12971
     number of epochs:          11
     sequence length:           4096
     total number of samples:   27905
[2025-03-12 09:20:21][W][utils/_logger:68:megatron.data.gpt_dataset]  > WARNING: could not find index map files, building on rank 0
    using:
     number of documents:       19431
     number of epochs:          11
     sequence length:           4096
     total number of samples:   44037
[2025-03-12 09:20:21][W][utils/_logger:68:megatron.data.gpt_dataset]  > WARNING: could not find index map files, building on rank 0
    using:
     number of documents:       16901
     number of epochs:          11
     sequence length:           4096
     total number of samples:   37245
[2025-03-12 09:20:21][W][utils/_logger:68:megatron.data.gpt_dataset]  > WARNING: could not find index map files, building on rank 0
    using:
     number of documents:       21247
     number of epochs:          11
     sequence length:           4096
     total number of samples:   47851
[2025-03-12 09:20:21][W][utils/_logger:68:megatron.data.gpt_dataset]  > WARNING: could not find index map files, building on rank 0
    using:
     number of documents:       20078
     number of epochs:          11
     sequence length:           4096
     total number of samples:   46771
[2025-03-12 09:20:21][W][utils/_logger:68:megatron.data.gpt_dataset]  > WARNING: could not find index map files, building on rank 0
    using:
     number of documents:       19598
     number of epochs:          11
     sequence length:           4096
     total number of samples:   44827
[2025-03-12 09:20:21][W][utils/_logger:68:megatron.data.gpt_dataset]  > WARNING: could not find index map files, building on rank 0
    using:
     number of documents:       18285
     number of epochs:          10
     sequence length:           4096
     total number of samples:   39746
[2025-03-12 09:20:21][W][utils/_logger:68:megatron.data.gpt_dataset]  > WARNING: could not find index map files, building on rank 0
    using:
     number of documents:       18529
     number of epochs:          10
     sequence length:           4096
     total number of samples:   42204
[2025-03-12 09:20:22][W][utils/_logger:68:megatron.data.gpt_dataset]  > WARNING: could not find index map files, building on rank 0
    using:
     number of documents:       15445
     number of epochs:          10
     sequence length:           4096
     total number of samples:   30535
[2025-03-12 09:20:22][W][utils/_logger:68:megatron.data.gpt_dataset]  > WARNING: could not find index map files, building on rank 0
    using:
     number of documents:       14543
     number of epochs:          11
     sequence length:           4096
     total number of samples:   31296
[2025-03-12 09:20:22][W][utils/_logger:68:megatron.data.gpt_dataset]  > WARNING: could not find index map files, building on rank 0
    using:
     number of documents:       19724
     number of epochs:          11
     sequence length:           4096
     total number of samples:   38889
[2025-03-12 09:20:22][W][utils/_logger:68:megatron.data.gpt_dataset]  > WARNING: could not find index map files, building on rank 0
    using:
     number of documents:       19584
     number of epochs:          11
     sequence length:           4096
     total number of samples:   39778
[2025-03-12 09:20:22][W][utils/_logger:68:megatron.data.gpt_dataset]  > WARNING: could not find index map files, building on rank 0
    using:
     number of documents:       12380
     number of epochs:          11
     sequence length:           4096
     total number of samples:   29061
[2025-03-12 09:20:22][W][utils/_logger:68:megatron.data.gpt_dataset]  > WARNING: could not find index map files, building on rank 0
    using:
     number of documents:       12015
     number of epochs:          10
     sequence length:           4096
     total number of samples:   25767
[2025-03-12 09:20:22][W][utils/_logger:68:megatron.data.gpt_dataset]  > WARNING: could not find index map files, building on rank 0
    using:
     number of documents:       12283
     number of epochs:          10
     sequence length:           4096
     total number of samples:   24851
[2025-03-12 09:20:22][W][utils/_logger:68:megatron.data.gpt_dataset]  > WARNING: could not find index map files, building on rank 0
    using:
     number of documents:       15157
     number of epochs:          11
     sequence length:           4096
     total number of samples:   32441
[2025-03-12 09:20:22][W][utils/_logger:68:megatron.data.gpt_dataset]  > WARNING: could not find index map files, building on rank 0
    using:
     number of documents:       15009
     number of epochs:          11
     sequence length:           4096
     total number of samples:   31241
[2025-03-12 09:20:22][W][utils/_logger:68:megatron.data.gpt_dataset]  > WARNING: could not find index map files, building on rank 0
    using:
     number of documents:       17556
     number of epochs:          10
     sequence length:           4096
     total number of samples:   33780
[2025-03-12 09:20:22][W][utils/_logger:68:megatron.data.gpt_dataset]  > WARNING: could not find index map files, building on rank 0
    using:
     number of documents:       16009
     number of epochs:          11
     sequence length:           4096
     total number of samples:   33213
[2025-03-12 09:20:23][W][utils/_logger:68:megatron.data.gpt_dataset]  > WARNING: could not find index map files, building on rank 0
    using:
     number of documents:       15193
     number of epochs:          12
     sequence length:           4096
     total number of samples:   33500
[2025-03-12 09:20:23][W][utils/_logger:68:megatron.data.gpt_dataset]  > WARNING: could not find index map files, building on rank 0
    using:
     number of documents:       19515
     number of epochs:          11
     sequence length:           4096
     total number of samples:   30353
[2025-03-12 09:20:23][W][utils/_logger:68:megatron.data.gpt_dataset]  > WARNING: could not find index map files, building on rank 0
    using:
     number of documents:       13401
     number of epochs:          11
     sequence length:           4096
     total number of samples:   20941
[2025-03-12 09:20:23][W][utils/_logger:68:megatron.data.gpt_dataset]  > WARNING: could not find index map files, building on rank 0
    using:
     number of documents:       14934
     number of epochs:          11
     sequence length:           4096
     total number of samples:   23461
[2025-03-12 09:20:23][W][utils/_logger:68:megatron.data.gpt_dataset]  > WARNING: could not find index map files, building on rank 0
    using:
     number of documents:       18897
     number of epochs:          11
     sequence length:           4096
     total number of samples:   31956
[2025-03-12 09:20:23][W][utils/_logger:68:megatron.data.gpt_dataset]  > WARNING: could not find index map files, building on rank 0
    using:
     number of documents:       13603
     number of epochs:          11
     sequence length:           4096
     total number of samples:   20963
[2025-03-12 09:20:23][W][utils/_logger:68:megatron.data.gpt_dataset]  > WARNING: could not find index map files, building on rank 0
    using:
     number of documents:       18864
     number of epochs:          11
     sequence length:           4096
     total number of samples:   30855
[2025-03-12 09:20:23][W][utils/_logger:68:megatron.data.gpt_dataset]  > WARNING: could not find index map files, building on rank 0
    using:
     number of documents:       22022
     number of epochs:          9
     sequence length:           4096
     total number of samples:   35167
[2025-03-12 09:20:23][W][utils/_logger:68:megatron.data.gpt_dataset]  > WARNING: could not find index map files, building on rank 0
    using:
     number of documents:       18971
     number of epochs:          10
     sequence length:           4096
     total number of samples:   32559
[2025-03-12 09:20:23][W][utils/_logger:68:megatron.data.gpt_dataset]  > WARNING: could not find index map files, building on rank 0
    using:
     number of documents:       10941
     number of epochs:          10
     sequence length:           4096
     total number of samples:   19655
[2025-03-12 09:20:23][W][utils/_logger:68:megatron.data.gpt_dataset]  > WARNING: could not find index map files, building on rank 0
    using:
     number of documents:       10578
     number of epochs:          10
     sequence length:           4096
     total number of samples:   20791
[2025-03-12 09:20:23][W][utils/_logger:68:megatron.data.gpt_dataset]  > WARNING: could not find index map files, building on rank 0
    using:
     number of documents:       9162
     number of epochs:          11
     sequence length:           4096
     total number of samples:   19610
[2025-03-12 09:20:24][W][utils/_logger:68:megatron.data.gpt_dataset]  > WARNING: could not find index map files, building on rank 0
    using:
     number of documents:       10026
     number of epochs:          10
     sequence length:           4096
     total number of samples:   19075
[2025-03-12 09:20:24][W][utils/_logger:68:megatron.data.gpt_dataset]  > WARNING: could not find index map files, building on rank 0
    using:
     number of documents:       13651
     number of epochs:          11
     sequence length:           4096
     total number of samples:   23980
[2025-03-12 09:20:24][W][utils/_logger:68:megatron.data.gpt_dataset]  > WARNING: could not find index map files, building on rank 0
    using:
     number of documents:       16560
     number of epochs:          11
     sequence length:           4096
     total number of samples:   32183
[2025-03-12 09:20:24][W][utils/_logger:68:megatron.data.gpt_dataset]  > WARNING: could not find index map files, building on rank 0
    using:
     number of documents:       17185
     number of epochs:          10
     sequence length:           4096
     total number of samples:   28294
[2025-03-12 09:20:24][W][utils/_logger:68:megatron.data.gpt_dataset]  > WARNING: could not find index map files, building on rank 0
    using:
     number of documents:       10909
     number of epochs:          10
     sequence length:           4096
     total number of samples:   17972
[2025-03-12 09:20:24][W][utils/_logger:68:megatron.data.gpt_dataset]  > WARNING: could not find index map files, building on rank 0
    using:
     number of documents:       19120
     number of epochs:          10
     sequence length:           4096
     total number of samples:   34020
[2025-03-12 09:20:24][W][utils/_logger:68:megatron.data.gpt_dataset]  > WARNING: could not find index map files, building on rank 0
    using:
     number of documents:       19179
     number of epochs:          11
     sequence length:           4096
     total number of samples:   36582
[2025-03-12 09:20:24][W][utils/_logger:68:megatron.data.gpt_dataset]  > WARNING: could not find index map files, building on rank 0
    using:
     number of documents:       23466
     number of epochs:          38
     sequence length:           4096
     total number of samples:   171575
[2025-03-12 09:20:24][W][utils/_logger:68:megatron.data.gpt_dataset]  > WARNING: could not find index map files, building on rank 0
    using:
     number of documents:       13525
     number of epochs:          11
     sequence length:           4096
     total number of samples:   36214
[2025-03-12 09:20:24][W][utils/_logger:68:megatron.data.gpt_dataset]  > WARNING: could not find index map files, building on rank 0
    using:
     number of documents:       13580
     number of epochs:          15
     sequence length:           4096
     total number of samples:   36283
[2025-03-12 09:20:24][W][utils/_logger:68:megatron.data.gpt_dataset]  > WARNING: could not find index map files, building on rank 0
    using:
     number of documents:       13538
     number of epochs:          15
     sequence length:           4096
     total number of samples:   35909
[2025-03-12 09:20:25][W][utils/_logger:68:megatron.data.gpt_dataset]  > WARNING: could not find index map files, building on rank 0
    using:
     number of documents:       13489
     number of epochs:          14
     sequence length:           4096
     total number of samples:   36945
[2025-03-12 09:20:25][W][utils/_logger:68:megatron.data.gpt_dataset]  > WARNING: could not find index map files, building on rank 0
    using:
     number of documents:       13552
     number of epochs:          14
     sequence length:           4096
     total number of samples:   36041
[2025-03-12 09:20:25][W][utils/_logger:68:megatron.data.gpt_dataset]  > WARNING: could not find index map files, building on rank 0
    using:
     number of documents:       13314
     number of epochs:          14
     sequence length:           4096
     total number of samples:   34932
[2025-03-12 09:20:25][W][utils/_logger:68:megatron.data.gpt_dataset]  > WARNING: could not find index map files, building on rank 0
    using:
     number of documents:       13390
     number of epochs:          15
     sequence length:           4096
     total number of samples:   36179
[2025-03-12 09:20:25][W][utils/_logger:68:megatron.data.gpt_dataset]  > WARNING: could not find index map files, building on rank 0
    using:
     number of documents:       13324
     number of epochs:          14
     sequence length:           4096
     total number of samples:   35342
[2025-03-12 09:20:25][W][utils/_logger:68:megatron.data.gpt_dataset]  > WARNING: could not find index map files, building on rank 0
    using:
     number of documents:       13277
     number of epochs:          14
     sequence length:           4096
     total number of samples:   36406
[2025-03-12 09:20:25][W][utils/_logger:68:megatron.data.gpt_dataset]  > WARNING: could not find index map files, building on rank 0
    using:
     number of documents:       13372
     number of epochs:          14
     sequence length:           4096
     total number of samples:   35805
[2025-03-12 09:20:25][W][utils/_logger:68:megatron.data.gpt_dataset]  > WARNING: could not find index map files, building on rank 0
    using:
     number of documents:       13129
     number of epochs:          14
     sequence length:           4096
     total number of samples:   35447
[2025-03-12 09:20:25][W][utils/_logger:68:megatron.data.gpt_dataset]  > WARNING: could not find index map files, building on rank 0
    using:
     number of documents:       13155
     number of epochs:          15
     sequence length:           4096
     total number of samples:   36897
[2025-03-12 09:20:25][W][utils/_logger:68:megatron.data.gpt_dataset]  > WARNING: could not find index map files, building on rank 0
    using:
     number of documents:       13198
     number of epochs:          15
     sequence length:           4096
     total number of samples:   34777
[2025-03-12 09:20:25][W][utils/_logger:68:megatron.data.gpt_dataset]  > WARNING: could not find index map files, building on rank 0
    using:
     number of documents:       13166
     number of epochs:          16
     sequence length:           4096
     total number of samples:   36330
[2025-03-12 09:20:25][W][utils/_logger:68:megatron.data.gpt_dataset]  > WARNING: could not find index map files, building on rank 0
    using:
     number of documents:       13118
     number of epochs:          14
     sequence length:           4096
     total number of samples:   35401
[2025-03-12 09:20:25][W][utils/_logger:68:megatron.data.gpt_dataset]  > WARNING: could not find index map files, building on rank 0
    using:
     number of documents:       13224
     number of epochs:          15
     sequence length:           4096
     total number of samples:   35667
[2025-03-12 09:20:26][W][utils/_logger:68:megatron.data.gpt_dataset]  > WARNING: could not find index map files, building on rank 0
    using:
     number of documents:       12979
     number of epochs:          14
     sequence length:           4096
     total number of samples:   35048
[2025-03-12 09:20:26][W][utils/_logger:68:megatron.data.gpt_dataset]  > WARNING: could not find index map files, building on rank 0
    using:
     number of documents:       13034
     number of epochs:          14
     sequence length:           4096
     total number of samples:   34449
[2025-03-12 09:20:26][W][utils/_logger:68:megatron.data.gpt_dataset]  > WARNING: could not find index map files, building on rank 0
    using:
     number of documents:       13113
     number of epochs:          13
     sequence length:           4096
     total number of samples:   35932
[2025-03-12 09:20:26][W][utils/_logger:68:megatron.data.gpt_dataset]  > WARNING: could not find index map files, building on rank 0
    using:
     number of documents:       13049
     number of epochs:          14
     sequence length:           4096
     total number of samples:   35677
[2025-03-12 09:20:26][W][utils/_logger:68:megatron.data.gpt_dataset]  > WARNING: could not find index map files, building on rank 0
    using:
     number of documents:       13130
     number of epochs:          14
     sequence length:           4096
     total number of samples:   36438
[2025-03-12 09:20:26][W][utils/_logger:68:megatron.data.gpt_dataset]  > WARNING: could not find index map files, building on rank 0
    using:
     number of documents:       62
     number of epochs:          57
     sequence length:           4096
     total number of samples:   1134
[2025-03-12 09:20:26][W][utils/_logger:68:megatron.data.gpt_dataset]  > WARNING: could not find index map files, building on rank 0
    using:
     number of documents:       64
     number of epochs:          55
     sequence length:           4096
     total number of samples:   1235
[2025-03-12 09:20:26][W][utils/_logger:68:megatron.data.gpt_dataset]  > WARNING: could not find index map files, building on rank 0
    using:
     number of documents:       235
     number of epochs:          55
     sequence length:           4096
     total number of samples:   4953
[2025-03-12 09:20:26][W][utils/_logger:68:megatron.data.gpt_dataset]  > WARNING: could not find index map files, building on rank 0
    using:
     number of documents:       271
     number of epochs:          63
     sequence length:           4096
     total number of samples:   5949
[2025-03-12 09:20:26][W][utils/_logger:68:megatron.data.gpt_dataset]  > WARNING: could not find index map files, building on rank 0
    using:
     number of documents:       124
     number of epochs:          53
     sequence length:           4096
     total number of samples:   2662
[2025-03-12 09:20:26][W][utils/_logger:68:megatron.data.gpt_dataset]  > WARNING: could not find index map files, building on rank 0
    using:
     number of documents:       87
     number of epochs:          55
     sequence length:           4096
     total number of samples:   1692
[2025-03-12 09:20:26][W][utils/_logger:68:megatron.data.gpt_dataset]  > WARNING: could not find index map files, building on rank 0
    using:
     number of documents:       45
     number of epochs:          100
     sequence length:           4096
     total number of samples:   851
[2025-03-12 09:20:26][W][utils/_logger:68:megatron.data.gpt_dataset]  > WARNING: could not find index map files, building on rank 0
    using:
     number of documents:       100
     number of epochs:          32
     sequence length:           4096
     total number of samples:   1811
[2025-03-12 09:20:26][W][utils/_logger:68:megatron.data.gpt_dataset]  > WARNING: could not find index map files, building on rank 0
    using:
     number of documents:       77
     number of epochs:          44
     sequence length:           4096
     total number of samples:   1519
[2025-03-12 09:20:26][W][utils/_logger:68:megatron.data.gpt_dataset]  > WARNING: could not find index map files, building on rank 0
    using:
     number of documents:       34
     number of epochs:          35
     sequence length:           4096
     total number of samples:   604
[2025-03-12 09:20:26][W][utils/_logger:68:megatron.data.gpt_dataset]  > WARNING: could not find index map files, building on rank 0
    using:
     number of documents:       4987
     number of epochs:          52
     sequence length:           4096
     total number of samples:   142860
[2025-03-12 09:20:26][W][utils/_logger:68:megatron.data.gpt_dataset]  > WARNING: could not find index map files, building on rank 0
    using:
     number of documents:       4936
     number of epochs:          41
     sequence length:           4096
     total number of samples:   350411
[2025-03-12 09:20:26][W][utils/_logger:68:megatron.data.gpt_dataset]  > WARNING: could not find index map files, building on rank 0
    using:
     number of documents:       50251
     number of epochs:          21
     sequence length:           4096
     total number of samples:   58293
[2025-03-12 09:20:26][W][utils/_logger:68:megatron.data.gpt_dataset]  > WARNING: could not find index map files, building on rank 0
    using:
     number of documents:       50433
     number of epochs:          21
     sequence length:           4096
     total number of samples:   58240
[2025-03-12 09:20:26][W][utils/_logger:68:megatron.data.gpt_dataset]  > WARNING: could not find index map files, building on rank 0
    using:
     number of documents:       49875
     number of epochs:          21
     sequence length:           4096
     total number of samples:   57788
[2025-03-12 09:20:27][W][utils/_logger:68:megatron.data.gpt_dataset]  > WARNING: could not find index map files, building on rank 0
    using:
     number of documents:       9771
     number of epochs:          58
     sequence length:           4096
     total number of samples:   89340
[2025-03-12 09:20:27][W][utils/_logger:68:megatron.data.gpt_dataset]  > WARNING: could not find index map files, building on rank 0
    using:
     number of documents:       32972
     number of epochs:          22
     sequence length:           4096
     total number of samples:   518930
[2025-03-12 09:20:27][W][utils/_logger:68:megatron.data.gpt_dataset]  > WARNING: could not find index map files, building on rank 0
    using:
     number of documents:       51115
     number of epochs:          28
     sequence length:           4096
     total number of samples:   379054
[2025-03-12 09:20:27][W][utils/_logger:68:megatron.data.gpt_dataset]  > WARNING: could not find index map files, building on rank 0
    using:
     number of documents:       4998
     number of epochs:          41
     sequence length:           4096
     total number of samples:   28231
[2025-03-12 09:20:27][W][utils/_logger:68:megatron.data.gpt_dataset]  > WARNING: could not find index map files, building on rank 0
    using:
     number of documents:       14044
     number of epochs:          79
     sequence length:           4096
     total number of samples:   24062
[2025-03-12 09:20:27][W][utils/_logger:68:megatron.data.gpt_dataset]  > WARNING: could not find index map files, building on rank 0
    using:
     number of documents:       2000
     number of epochs:          40
     sequence length:           4096
     total number of samples:   21279
> building indices for blendable datasets ...
 > sample ratios:
   dataset 0, input: 0.108499, achieved: 0.108499
   dataset 1, input: 0.103053, achieved: 0.103053
   dataset 2, input: 0.0854747, achieved: 0.0854747
   dataset 3, input: 0.0433844, achieved: 0.0433844
   dataset 4, input: 0.0113772, achieved: 0.0113772
   dataset 5, input: 0.0527751, achieved: 0.0527751
   dataset 6, input: 0.00885534, achieved: 0.00885534
   dataset 7, input: 0.0852544, achieved: 0.0852544
   dataset 8, input: 0.0730518, achieved: 0.0730518
   dataset 9, input: 0.0799135, achieved: 0.0799135
   dataset 10, input: 0.0413842, achieved: 0.0413842
   dataset 11, input: 0.0496325, achieved: 0.0496325
   dataset 12, input: 0.0116255, achieved: 0.0116255
   dataset 13, input: 0.0320609, achieved: 0.0320609
   dataset 14, input: 0.106373, achieved: 0.106373
   dataset 15, input: 0.107285, achieved: 0.107285
[2025-03-12 09:20:27][I][data/gpt_dataset:191:megatron.data.gpt_dataset] [BuildConcatDataset] Caught args.shuffle_sample_in_corpus=True across 1675373 samples
> building indices for blendable datasets ...
 > sample ratios:
   dataset 0, input: 0.00859641, achieved: 0.00859641
   dataset 1, input: 0.00880487, achieved: 0.00880487
   dataset 2, input: 0.0105312, achieved: 0.0105312
   dataset 3, input: 0.00971668, achieved: 0.00971668
   dataset 4, input: 0.0094472, achieved: 0.0094472
   dataset 5, input: 0.0101289, achieved: 0.0101289
   dataset 6, input: 0.0105224, achieved: 0.0105224
   dataset 7, input: 0.0102594, achieved: 0.0102594
   dataset 8, input: 0.0105675, achieved: 0.0105675
   dataset 9, input: 0.0084371, achieved: 0.0084371
   dataset 10, input: 0.0102051, achieved: 0.0102051
   dataset 11, input: 0.00864047, achieved: 0.00864047
   dataset 12, input: 0.012604, achieved: 0.012604
   dataset 13, input: 0.00930382, achieved: 0.00930382
   dataset 14, input: 0.0111695, achieved: 0.0111695
   dataset 15, input: 0.010155, achieved: 0.010155
   dataset 16, input: 0.0107163, achieved: 0.0107163
   dataset 17, input: 0.0110763, achieved: 0.0110763
   dataset 18, input: 0.01032, achieved: 0.01032
   dataset 19, input: 0.0107664, achieved: 0.0107664
   dataset 20, input: 0.0116291, achieved: 0.0116291
   dataset 21, input: 0.0093872, achieved: 0.0093872
   dataset 22, input: 0.0101136, achieved: 0.0101136
   dataset 23, input: 0.00983294, achieved: 0.00983294
   dataset 24, input: 0.00962855, achieved: 0.00962855
   dataset 25, input: 0.00957126, achieved: 0.00957126
   dataset 26, input: 0.00977464, achieved: 0.00977464
   dataset 27, input: 0.00901977, achieved: 0.00901977
   dataset 28, input: 0.0103566, achieved: 0.0103566
   dataset 29, input: 0.00999056, achieved: 0.00999056
   dataset 30, input: 0.0124182, achieved: 0.0124182
   dataset 31, input: 0.00891062, achieved: 0.00891062
   dataset 32, input: 0.00931399, achieved: 0.00931399
   dataset 33, input: 0.0114227, achieved: 0.0114227
   dataset 34, input: 0.0119182, achieved: 0.0119182
   dataset 35, input: 0.0103448, achieved: 0.0103448
   dataset 36, input: 0.00920281, achieved: 0.00920281
   dataset 37, input: 0.0100794, achieved: 0.0100794
   dataset 38, input: 0.00899367, achieved: 0.00899367
   dataset 39, input: 0.0100109, achieved: 0.0100109
   dataset 40, input: 0.00949635, achieved: 0.00949635
   dataset 41, input: 0.00916891, achieved: 0.00916891
   dataset 42, input: 0.0105868, achieved: 0.0105868
   dataset 43, input: 0.0110166, achieved: 0.0110166
   dataset 44, input: 0.00956516, achieved: 0.00956516
   dataset 45, input: 0.010096, achieved: 0.010096
   dataset 46, input: 0.0111118, achieved: 0.0111118
   dataset 47, input: 0.00861403, achieved: 0.00861403
   dataset 48, input: 0.00969295, achieved: 0.00969295
   dataset 49, input: 0.00888452, achieved: 0.00888452
   dataset 50, input: 0.0106549, achieved: 0.0106549
   dataset 51, input: 0.0107085, achieved: 0.0107085
   dataset 52, input: 0.0105183, achieved: 0.0105183
   dataset 53, input: 0.0105936, achieved: 0.0105936
   dataset 54, input: 0.0101075, achieved: 0.0101075
   dataset 55, input: 0.0106142, achieved: 0.0106142
   dataset 56, input: 0.00844354, achieved: 0.00844354
   dataset 57, input: 0.01004, achieved: 0.01004
   dataset 58, input: 0.00954313, achieved: 0.00954313
   dataset 59, input: 0.0104014, achieved: 0.0104014
   dataset 60, input: 0.0115471, achieved: 0.0115471
   dataset 61, input: 0.00886656, achieved: 0.00886656
   dataset 62, input: 0.0115071, achieved: 0.0115071
   dataset 63, input: 0.00804085, achieved: 0.00804085
   dataset 64, input: 0.0102777, achieved: 0.0102777
   dataset 65, input: 0.00969363, achieved: 0.00969363
   dataset 66, input: 0.00880419, achieved: 0.00880419
   dataset 67, input: 0.0101621, achieved: 0.0101621
   dataset 68, input: 0.0106685, achieved: 0.0106685
   dataset 69, input: 0.0103031, achieved: 0.0103031
   dataset 70, input: 0.00776019, achieved: 0.00776019
   dataset 71, input: 0.010156, achieved: 0.010156
   dataset 72, input: 0.0117694, achieved: 0.0117694
   dataset 73, input: 0.00965532, achieved: 0.00965532
   dataset 74, input: 0.00980277, achieved: 0.00980277
   dataset 75, input: 0.00957092, achieved: 0.00957092
   dataset 76, input: 0.0102658, achieved: 0.0102658
   dataset 77, input: 0.0101736, achieved: 0.0101736
   dataset 78, input: 0.00952516, achieved: 0.00952516
   dataset 79, input: 0.00954821, achieved: 0.00954821
   dataset 80, input: 0.00878894, achieved: 0.00878894
   dataset 81, input: 0.00989429, achieved: 0.00989429
   dataset 82, input: 0.0107254, achieved: 0.0107254
   dataset 83, input: 0.0105407, achieved: 0.0105407
   dataset 84, input: 0.0103658, achieved: 0.0103658
   dataset 85, input: 0.0113505, achieved: 0.0113505
   dataset 86, input: 0.00890893, achieved: 0.00890893
   dataset 87, input: 0.0101038, achieved: 0.0101038
   dataset 88, input: 0.00923094, achieved: 0.00923094
   dataset 89, input: 0.00903197, achieved: 0.00903197
   dataset 90, input: 0.00932958, achieved: 0.00932958
   dataset 91, input: 0.0107644, achieved: 0.0107644
   dataset 92, input: 0.0102692, achieved: 0.0102692
   dataset 93, input: 0.0113081, achieved: 0.0113081
   dataset 94, input: 0.0102949, achieved: 0.0102949
   dataset 95, input: 0.00908553, achieved: 0.00908553
   dataset 96, input: 0.00956041, achieved: 0.00956041
   dataset 97, input: 0.0103095, achieved: 0.0103095
   dataset 98, input: 0.00869708, achieved: 0.00869708
   dataset 99, input: 0.00959601, achieved: 0.00959601
[2025-03-12 09:20:28][I][data/gpt_dataset:191:megatron.data.gpt_dataset] [BuildConcatDataset] Caught args.shuffle_sample_in_corpus=True across 2950186 samples
> building indices for blendable datasets ...
 > sample ratios:
   dataset 0, input: 0.430653, achieved: 0.430653
   dataset 1, input: 0.430584, achieved: 0.430584
   dataset 2, input: 0.138763, achieved: 0.138763
[2025-03-12 09:20:28][I][data/gpt_dataset:191:megatron.data.gpt_dataset] [BuildConcatDataset] Caught args.shuffle_sample_in_corpus=True across 707075 samples
> building indices for blendable datasets ...
 > sample ratios:
   dataset 0, input: 0.00616803, achieved: 0.00616803
   dataset 1, input: 0.00616438, achieved: 0.00616438
   dataset 2, input: 0.00616803, achieved: 0.00616803
   dataset 3, input: 0.00616986, achieved: 0.00616986
   dataset 4, input: 0.00616647, achieved: 0.00616647
   dataset 5, input: 0.00616803, achieved: 0.00616803
   dataset 6, input: 0.00617952, achieved: 0.00617952
   dataset 7, input: 0.00615837, achieved: 0.00615837
   dataset 8, input: 0.00617012, achieved: 0.00617012
   dataset 9, input: 0.00617064, achieved: 0.00617064
   dataset 10, input: 0.00615915, achieved: 0.00615915
   dataset 11, input: 0.00617534, achieved: 0.00617534
   dataset 12, input: 0.00617299, achieved: 0.00617299
   dataset 13, input: 0.00617404, achieved: 0.00617404
   dataset 14, input: 0.00617743, achieved: 0.00617743
   dataset 15, input: 0.0061696, achieved: 0.0061696
   dataset 16, input: 0.00618814, achieved: 0.00618814
   dataset 17, input: 0.00618919, achieved: 0.00618919
   dataset 18, input: 0.00615941, achieved: 0.00615941
   dataset 19, input: 0.0061568, achieved: 0.0061568
   dataset 20, input: 0.00614792, achieved: 0.00614792
   dataset 21, input: 0.00615367, achieved: 0.00615367
   dataset 22, input: 0.00615602, achieved: 0.00615602
   dataset 23, input: 0.00617482, achieved: 0.00617482
   dataset 24, input: 0.00617273, achieved: 0.00617273
   dataset 25, input: 0.00615759, achieved: 0.00615759
   dataset 26, input: 0.00617195, achieved: 0.00617195
   dataset 27, input: 0.00617456, achieved: 0.00617456
   dataset 28, input: 0.00617273, achieved: 0.00617273
   dataset 29, input: 0.00616281, achieved: 0.00616281
   dataset 30, input: 0.00618161, achieved: 0.00618161
   dataset 31, input: 0.00605077, achieved: 0.00605077
   dataset 32, input: 0.00601655, achieved: 0.00601655
   dataset 33, input: 0.00600637, achieved: 0.00600637
   dataset 34, input: 0.0060001, achieved: 0.0060001
   dataset 35, input: 0.00602073, achieved: 0.00602073
   dataset 36, input: 0.00600297, achieved: 0.00600297
   dataset 37, input: 0.00600402, achieved: 0.00600402
   dataset 38, input: 0.00600976, achieved: 0.00600976
   dataset 39, input: 0.00599227, achieved: 0.00599227
   dataset 40, input: 0.00601446, achieved: 0.00601446
   dataset 41, input: 0.00599566, achieved: 0.00599566
   dataset 42, input: 0.00599932, achieved: 0.00599932
   dataset 43, input: 0.00599958, achieved: 0.00599958
   dataset 44, input: 0.00598521, achieved: 0.00598521
   dataset 45, input: 0.00599331, achieved: 0.00599331
   dataset 46, input: 0.0059954, achieved: 0.0059954
   dataset 47, input: 0.00598887, achieved: 0.00598887
   dataset 48, input: 0.00600376, achieved: 0.00600376
   dataset 49, input: 0.0060035, achieved: 0.0060035
   dataset 50, input: 0.00598991, achieved: 0.00598991
   dataset 51, input: 0.00598417, achieved: 0.00598417
   dataset 52, input: 0.00599122, achieved: 0.00599122
   dataset 53, input: 0.00599801, achieved: 0.00599801
   dataset 54, input: 0.00598756, achieved: 0.00598756
   dataset 55, input: 0.00598234, achieved: 0.00598234
   dataset 56, input: 0.00599435, achieved: 0.00599435
   dataset 57, input: 0.00597294, achieved: 0.00597294
   dataset 58, input: 0.00599253, achieved: 0.00599253
   dataset 59, input: 0.00597868, achieved: 0.00597868
   dataset 60, input: 0.00597424, achieved: 0.00597424
   dataset 61, input: 0.00598339, achieved: 0.00598339
   dataset 62, input: 0.00594682, achieved: 0.00594682
   dataset 63, input: 0.0059019, achieved: 0.0059019
   dataset 64, input: 0.00589198, achieved: 0.00589198
   dataset 65, input: 0.00587683, achieved: 0.00587683
   dataset 66, input: 0.00587683, achieved: 0.00587683
   dataset 67, input: 0.00587604, achieved: 0.00587604
   dataset 68, input: 0.00586952, achieved: 0.00586952
   dataset 69, input: 0.00587631, achieved: 0.00587631
   dataset 70, input: 0.00586508, achieved: 0.00586508
   dataset 71, input: 0.00586116, achieved: 0.00586116
   dataset 72, input: 0.0058797, achieved: 0.0058797
   dataset 73, input: 0.00587448, achieved: 0.00587448
   dataset 74, input: 0.00587448, achieved: 0.00587448
   dataset 75, input: 0.00587213, achieved: 0.00587213
   dataset 76, input: 0.00588205, achieved: 0.00588205
   dataset 77, input: 0.00587134, achieved: 0.00587134
   dataset 78, input: 0.00588127, achieved: 0.00588127
   dataset 79, input: 0.00588623, achieved: 0.00588623
   dataset 80, input: 0.00585855, achieved: 0.00585855
   dataset 81, input: 0.00587604, achieved: 0.00587604
   dataset 82, input: 0.00585776, achieved: 0.00585776
   dataset 83, input: 0.00585672, achieved: 0.00585672
   dataset 84, input: 0.00586899, achieved: 0.00586899
   dataset 85, input: 0.00585358, achieved: 0.00585358
   dataset 86, input: 0.00586795, achieved: 0.00586795
   dataset 87, input: 0.00584627, achieved: 0.00584627
   dataset 88, input: 0.00585123, achieved: 0.00585123
   dataset 89, input: 0.00587187, achieved: 0.00587187
   dataset 90, input: 0.00586403, achieved: 0.00586403
   dataset 91, input: 0.00585123, achieved: 0.00585123
   dataset 92, input: 0.0058622, achieved: 0.0058622
   dataset 93, input: 0.00584836, achieved: 0.00584836
   dataset 94, input: 0.00579247, achieved: 0.00579247
   dataset 95, input: 0.00578359, achieved: 0.00578359
   dataset 96, input: 0.0057909, achieved: 0.0057909
   dataset 97, input: 0.0057849, achieved: 0.0057849
   dataset 98, input: 0.0057862, achieved: 0.0057862
   dataset 99, input: 0.00576975, achieved: 0.00576975
   dataset 100, input: 0.00578411, achieved: 0.00578411
   dataset 101, input: 0.00578385, achieved: 0.00578385
   dataset 102, input: 0.0057687, achieved: 0.0057687
   dataset 103, input: 0.00577393, achieved: 0.00577393
   dataset 104, input: 0.00577576, achieved: 0.00577576
   dataset 105, input: 0.0057533, achieved: 0.0057533
   dataset 106, input: 0.00575747, achieved: 0.00575747
   dataset 107, input: 0.00575591, achieved: 0.00575591
   dataset 108, input: 0.00575408, achieved: 0.00575408
   dataset 109, input: 0.00576792, achieved: 0.00576792
   dataset 110, input: 0.00575565, achieved: 0.00575565
   dataset 111, input: 0.00576348, achieved: 0.00576348
   dataset 112, input: 0.00575878, achieved: 0.00575878
   dataset 113, input: 0.00575565, achieved: 0.00575565
   dataset 114, input: 0.00576061, achieved: 0.00576061
   dataset 115, input: 0.00575878, achieved: 0.00575878
   dataset 116, input: 0.0057499, achieved: 0.0057499
   dataset 117, input: 0.00576139, achieved: 0.00576139
   dataset 118, input: 0.0057593, achieved: 0.0057593
   dataset 119, input: 0.00573266, achieved: 0.00573266
   dataset 120, input: 0.00575695, achieved: 0.00575695
   dataset 121, input: 0.0057499, achieved: 0.0057499
   dataset 122, input: 0.00576009, achieved: 0.00576009
   dataset 123, input: 0.00575538, achieved: 0.00575538
   dataset 124, input: 0.00575643, achieved: 0.00575643
   dataset 125, input: 0.00571438, achieved: 0.00571438
   dataset 126, input: 0.00570002, achieved: 0.00570002
   dataset 127, input: 0.00568461, achieved: 0.00568461
   dataset 128, input: 0.0057042, achieved: 0.0057042
   dataset 129, input: 0.00567416, achieved: 0.00567416
   dataset 130, input: 0.00567677, achieved: 0.00567677
   dataset 131, input: 0.00568356, achieved: 0.00568356
   dataset 132, input: 0.00567625, achieved: 0.00567625
   dataset 133, input: 0.00566502, achieved: 0.00566502
   dataset 134, input: 0.00567938, achieved: 0.00567938
   dataset 135, input: 0.0056739, achieved: 0.0056739
   dataset 136, input: 0.00568017, achieved: 0.00568017
   dataset 137, input: 0.00567599, achieved: 0.00567599
   dataset 138, input: 0.0056705, achieved: 0.0056705
   dataset 139, input: 0.00566972, achieved: 0.00566972
   dataset 140, input: 0.00565927, achieved: 0.00565927
   dataset 141, input: 0.00566319, achieved: 0.00566319
   dataset 142, input: 0.00566267, achieved: 0.00566267
   dataset 143, input: 0.00565431, achieved: 0.00565431
   dataset 144, input: 0.00566659, achieved: 0.00566659
   dataset 145, input: 0.00567599, achieved: 0.00567599
   dataset 146, input: 0.00566084, achieved: 0.00566084
   dataset 147, input: 0.00565562, achieved: 0.00565562
   dataset 148, input: 0.00565614, achieved: 0.00565614
   dataset 149, input: 0.0056598, achieved: 0.0056598
   dataset 150, input: 0.00566293, achieved: 0.00566293
   dataset 151, input: 0.00566371, achieved: 0.00566371
   dataset 152, input: 0.00566032, achieved: 0.00566032
   dataset 153, input: 0.00566136, achieved: 0.00566136
   dataset 154, input: 0.00565823, achieved: 0.00565823
   dataset 155, input: 0.00565196, achieved: 0.00565196
   dataset 156, input: 0.00566084, achieved: 0.00566084
   dataset 157, input: 0.00563499, achieved: 0.00563499
   dataset 158, input: 0.00561383, achieved: 0.00561383
   dataset 159, input: 0.00561122, achieved: 0.00561122
   dataset 160, input: 0.00560756, achieved: 0.00560756
   dataset 161, input: 0.00560391, achieved: 0.00560391
   dataset 162, input: 0.0056026, achieved: 0.0056026
   dataset 163, input: 0.00561697, achieved: 0.00561697
   dataset 164, input: 0.00561383, achieved: 0.00561383
   dataset 165, input: 0.00560652, achieved: 0.00560652
   dataset 166, input: 0.00558954, achieved: 0.00558954
   dataset 167, input: 0.00560913, achieved: 0.00560913
   dataset 168, input: 0.00560129, achieved: 0.00560129
   dataset 169, input: 0.00559633, achieved: 0.00559633
   dataset 170, input: 0.00186919, achieved: 0.00186919
[2025-03-12 09:20:29][I][data/gpt_dataset:191:megatron.data.gpt_dataset] [BuildConcatDataset] Caught args.shuffle_sample_in_corpus=True across 3828936 samples
> building indices for blendable datasets ...
 > sample ratios:
   dataset 0, input: 0.00105726, achieved: 0.00105726
   dataset 1, input: 0.00107928, achieved: 0.00107928
   dataset 2, input: 0.00109672, achieved: 0.00109672
   dataset 3, input: 0.00109393, achieved: 0.00109393
   dataset 4, input: 0.0010994, achieved: 0.0010994
   dataset 5, input: 0.00107491, achieved: 0.00107491
   dataset 6, input: 0.00108898, achieved: 0.00108898
   dataset 7, input: 0.000683743, achieved: 0.000683743
   dataset 8, input: 0.00110611, achieved: 0.00110611
   dataset 9, input: 0.0011095, achieved: 0.0011095
   dataset 10, input: 0.00108458, achieved: 0.00108458
   dataset 11, input: 0.0011302, achieved: 0.0011302
   dataset 12, input: 0.00113901, achieved: 0.00113901
   dataset 13, input: 0.00108227, achieved: 0.00108227
   dataset 14, input: 0.00110645, achieved: 0.00110645
   dataset 15, input: 0.00112896, achieved: 0.00112896
   dataset 16, input: 0.00111693, achieved: 0.00111693
   dataset 17, input: 0.00118774, achieved: 0.00118774
   dataset 18, input: 0.00117349, achieved: 0.00117349
   dataset 19, input: 0.00110703, achieved: 0.00110703
   dataset 20, input: 0.00110024, achieved: 0.00110024
   dataset 21, input: 0.0011074, achieved: 0.0011074
   dataset 22, input: 0.00125365, achieved: 0.00125365
   dataset 23, input: 0.00124519, achieved: 0.00124519
   dataset 24, input: 0.000702826, achieved: 0.000702826
   dataset 25, input: 0.00111163, achieved: 0.00111163
   dataset 26, input: 0.00116457, achieved: 0.00116457
   dataset 27, input: 0.00114033, achieved: 0.00114033
   dataset 28, input: 0.00108084, achieved: 0.00108084
   dataset 29, input: 0.00114093, achieved: 0.00114093
   dataset 30, input: 0.000691025, achieved: 0.000691025
   dataset 31, input: 0.000760019, achieved: 0.000760019
   dataset 32, input: 0.0010096, achieved: 0.0010096
   dataset 33, input: 0.00129976, achieved: 0.00129976
   dataset 34, input: 0.000736013, achieved: 0.000736013
   dataset 35, input: 0.00109681, achieved: 0.00109681
   dataset 36, input: 0.000696868, achieved: 0.000696868
   dataset 37, input: 0.00113699, achieved: 0.00113699
   dataset 38, input: 0.0011532, achieved: 0.0011532
   dataset 39, input: 0.000760249, achieved: 0.000760249
   dataset 40, input: 0.000760421, achieved: 0.000760421
   dataset 41, input: 0.00118782, achieved: 0.00118782
   dataset 42, input: 0.00119723, achieved: 0.00119723
   dataset 43, input: 0.00127187, achieved: 0.00127187
   dataset 44, input: 0.00075193, achieved: 0.00075193
   dataset 45, input: 0.00088252, achieved: 0.00088252
   dataset 46, input: 0.000878317, achieved: 0.000878317
   dataset 47, input: 0.00129593, achieved: 0.00129593
   dataset 48, input: 0.000647074, achieved: 0.000647074
   dataset 49, input: 0.00107732, achieved: 0.00107732
   dataset 50, input: 0.00110709, achieved: 0.00110709
   dataset 51, input: 0.00127478, achieved: 0.00127478
   dataset 52, input: 0.00135992, achieved: 0.00135992
   dataset 53, input: 0.00111425, achieved: 0.00111425
   dataset 54, input: 0.00112836, achieved: 0.00112836
   dataset 55, input: 0.00107217, achieved: 0.00107217
   dataset 56, input: 0.00111443, achieved: 0.00111443
   dataset 57, input: 0.00109946, achieved: 0.00109946
   dataset 58, input: 0.00109523, achieved: 0.00109523
   dataset 59, input: 0.00110734, achieved: 0.00110734
   dataset 60, input: 0.00110199, achieved: 0.00110199
   dataset 61, input: 0.00112873, achieved: 0.00112873
   dataset 62, input: 0.00110829, achieved: 0.00110829
   dataset 63, input: 0.00111008, achieved: 0.00111008
   dataset 64, input: 0.00110855, achieved: 0.00110855
   dataset 65, input: 0.00109658, achieved: 0.00109658
   dataset 66, input: 0.00110116, achieved: 0.00110116
   dataset 67, input: 0.00105519, achieved: 0.00105519
   dataset 68, input: 0.00102554, achieved: 0.00102554
   dataset 69, input: 0.00094731, achieved: 0.00094731
   dataset 70, input: 0.000868991, achieved: 0.000868991
   dataset 71, input: 0.00111486, achieved: 0.00111486
   dataset 72, input: 0.000803395, achieved: 0.000803395
   dataset 73, input: 0.000821845, achieved: 0.000821845
   dataset 74, input: 0.000815512, achieved: 0.000815512
   dataset 75, input: 0.000801927, achieved: 0.000801927
   dataset 76, input: 0.00082006, achieved: 0.00082006
   dataset 77, input: 0.000797034, achieved: 0.000797034
   dataset 78, input: 0.000825759, achieved: 0.000825759
   dataset 79, input: 0.000809928, achieved: 0.000809928
   dataset 80, input: 0.000809439, achieved: 0.000809439
   dataset 81, input: 0.000808317, achieved: 0.000808317
   dataset 82, input: 0.000798559, achieved: 0.000798559
   dataset 83, input: 0.000805064, achieved: 0.000805064
   dataset 84, input: 0.000795796, achieved: 0.000795796
   dataset 85, input: 0.000741712, achieved: 0.000741712
   dataset 86, input: 0.00072119, achieved: 0.00072119
   dataset 87, input: 0.000734229, achieved: 0.000734229
   dataset 88, input: 0.000713044, achieved: 0.000713044
   dataset 89, input: 0.00071716, achieved: 0.00071716
   dataset 90, input: 0.000723608, achieved: 0.000723608
   dataset 91, input: 0.000740331, achieved: 0.000740331
   dataset 92, input: 0.000739266, achieved: 0.000739266
   dataset 93, input: 0.000734891, achieved: 0.000734891
   dataset 94, input: 0.000711922, achieved: 0.000711922
   dataset 95, input: 0.000912483, achieved: 0.000912483
   dataset 96, input: 0.00105343, achieved: 0.00105343
   dataset 97, input: 0.00107571, achieved: 0.00107571
   dataset 98, input: 0.00102163, achieved: 0.00102163
   dataset 99, input: 0.00103542, achieved: 0.00103542
   dataset 100, input: 0.0010408, achieved: 0.0010408
   dataset 101, input: 0.001014, achieved: 0.001014
   dataset 102, input: 0.00102785, achieved: 0.00102785
   dataset 103, input: 0.00101656, achieved: 0.00101656
   dataset 104, input: 0.0010292, achieved: 0.0010292
   dataset 105, input: 0.00103639, achieved: 0.00103639
   dataset 106, input: 0.00102482, achieved: 0.00102482
   dataset 107, input: 0.000989305, achieved: 0.000989305
   dataset 108, input: 0.000995551, achieved: 0.000995551
   dataset 109, input: 0.00100036, achieved: 0.00100036
   dataset 110, input: 0.00102494, achieved: 0.00102494
   dataset 111, input: 0.00106011, achieved: 0.00106011
   dataset 112, input: 0.00107073, achieved: 0.00107073
   dataset 113, input: 0.00106728, achieved: 0.00106728
   dataset 114, input: 0.00106239, achieved: 0.00106239
   dataset 115, input: 0.000961817, achieved: 0.000961817
   dataset 116, input: 0.000943223, achieved: 0.000943223
   dataset 117, input: 0.000940575, achieved: 0.000940575
   dataset 118, input: 0.00138766, achieved: 0.00138766
   dataset 119, input: 0.000905834, achieved: 0.000905834
   dataset 120, input: 0.000908367, achieved: 0.000908367
   dataset 121, input: 0.000912771, achieved: 0.000912771
   dataset 122, input: 0.000880879, achieved: 0.000880879
   dataset 123, input: 0.000877051, achieved: 0.000877051
   dataset 124, input: 0.000880015, achieved: 0.000880015
   dataset 125, input: 0.000859176, achieved: 0.000859176
   dataset 126, input: 0.000854514, achieved: 0.000854514
   dataset 127, input: 0.000853132, achieved: 0.000853132
   dataset 128, input: 0.000825846, achieved: 0.000825846
   dataset 129, input: 0.000809957, achieved: 0.000809957
   dataset 130, input: 0.000803539, achieved: 0.000803539
   dataset 131, input: 0.000804057, achieved: 0.000804057
   dataset 132, input: 0.000789176, achieved: 0.000789176
   dataset 133, input: 0.000771417, achieved: 0.000771417
   dataset 134, input: 0.000765516, achieved: 0.000765516
   dataset 135, input: 0.00077772, achieved: 0.00077772
   dataset 136, input: 0.00121594, achieved: 0.00121594
   dataset 137, input: 0.00134175, achieved: 0.00134175
   dataset 138, input: 0.00134909, achieved: 0.00134909
   dataset 139, input: 0.00132972, achieved: 0.00132972
   dataset 140, input: 0.00131879, achieved: 0.00131879
   dataset 141, input: 0.00130425, achieved: 0.00130425
   dataset 142, input: 0.00086571, achieved: 0.00086571
   dataset 143, input: 0.000821931, achieved: 0.000821931
   dataset 144, input: 0.000770438, achieved: 0.000770438
   dataset 145, input: 0.00115855, achieved: 0.00115855
   dataset 146, input: 0.00105343, achieved: 0.00105343
   dataset 147, input: 0.00103245, achieved: 0.00103245
   dataset 148, input: 0.00103677, achieved: 0.00103677
   dataset 149, input: 0.00104975, achieved: 0.00104975
   dataset 150, input: 0.00101242, achieved: 0.00101242
   dataset 151, input: 0.00100948, achieved: 0.00100948
   dataset 152, input: 0.00100396, achieved: 0.00100396
   dataset 153, input: 0.00139008, achieved: 0.00139008
   dataset 154, input: 0.00128076, achieved: 0.00128076
   dataset 155, input: 0.00127316, achieved: 0.00127316
   dataset 156, input: 0.00125422, achieved: 0.00125422
   dataset 157, input: 0.00122035, achieved: 0.00122035
   dataset 158, input: 0.00121491, achieved: 0.00121491
   dataset 159, input: 0.00118218, achieved: 0.00118218
   dataset 160, input: 0.00121341, achieved: 0.00121341
   dataset 161, input: 0.00122665, achieved: 0.00122665
   dataset 162, input: 0.000977763, achieved: 0.000977763
   dataset 163, input: 0.00096294, achieved: 0.00096294
   dataset 164, input: 0.00093715, achieved: 0.00093715
   dataset 165, input: 0.000958421, achieved: 0.000958421
   dataset 166, input: 0.000948606, achieved: 0.000948606
   dataset 167, input: 0.000971575, achieved: 0.000971575
   dataset 168, input: 0.000976295, achieved: 0.000976295
   dataset 169, input: 0.000953787, achieved: 0.000953787
   dataset 170, input: 0.00093833, achieved: 0.00093833
   dataset 171, input: 0.000944231, achieved: 0.000944231
   dataset 172, input: 0.00140214, achieved: 0.00140214
   dataset 173, input: 0.00141161, achieved: 0.00141161
   dataset 174, input: 0.00141268, achieved: 0.00141268
   dataset 175, input: 0.00141368, achieved: 0.00141368
   dataset 176, input: 0.000784484, achieved: 0.000784484
   dataset 177, input: 0.000802186, achieved: 0.000802186
   dataset 178, input: 0.000786384, achieved: 0.000786384
   dataset 179, input: 0.000774093, achieved: 0.000774093
   dataset 180, input: 0.000788715, achieved: 0.000788715
   dataset 181, input: 0.000790126, achieved: 0.000790126
   dataset 182, input: 0.000754204, achieved: 0.000754204
   dataset 183, input: 0.000732876, achieved: 0.000732876
   dataset 184, input: 0.00106774, achieved: 0.00106774
   dataset 185, input: 0.00118186, achieved: 0.00118186
   dataset 186, input: 0.00123704, achieved: 0.00123704
   dataset 187, input: 0.000771733, achieved: 0.000771733
   dataset 188, input: 0.00078057, achieved: 0.00078057
   dataset 189, input: 0.000764163, achieved: 0.000764163
   dataset 190, input: 0.000742893, achieved: 0.000742893
   dataset 191, input: 0.000734718, achieved: 0.000734718
   dataset 192, input: 0.000724961, achieved: 0.000724961
   dataset 193, input: 0.000728702, achieved: 0.000728702
   dataset 194, input: 0.000714628, achieved: 0.000714628
   dataset 195, input: 0.00107016, achieved: 0.00107016
   dataset 196, input: 0.00137926, achieved: 0.00137926
   dataset 197, input: 0.000925291, achieved: 0.000925291
   dataset 198, input: 0.00135698, achieved: 0.00135698
   dataset 199, input: 0.00131985, achieved: 0.00131985
   dataset 200, input: 0.00122967, achieved: 0.00122967
   dataset 201, input: 0.00124032, achieved: 0.00124032
   dataset 202, input: 0.00137788, achieved: 0.00137788
   dataset 203, input: 0.00136677, achieved: 0.00136677
   dataset 204, input: 0.00135007, achieved: 0.00135007
   dataset 205, input: 0.00130673, achieved: 0.00130673
   dataset 206, input: 0.00127486, achieved: 0.00127486
   dataset 207, input: 0.00127236, achieved: 0.00127236
   dataset 208, input: 0.00125716, achieved: 0.00125716
   dataset 209, input: 0.00126082, achieved: 0.00126082
   dataset 210, input: 0.00125218, achieved: 0.00125218
   dataset 211, input: 0.00120455, achieved: 0.00120455
   dataset 212, input: 0.00119145, achieved: 0.00119145
   dataset 213, input: 0.00117271, achieved: 0.00117271
   dataset 214, input: 0.00115789, achieved: 0.00115789
   dataset 215, input: 0.000945497, achieved: 0.000945497
   dataset 216, input: 0.000947253, achieved: 0.000947253
   dataset 217, input: 0.00136559, achieved: 0.00136559
   dataset 218, input: 0.00133326, achieved: 0.00133326
   dataset 219, input: 0.00131314, achieved: 0.00131314
   dataset 220, input: 0.00128888, achieved: 0.00128888
   dataset 221, input: 0.00139362, achieved: 0.00139362
   dataset 222, input: 0.000997278, achieved: 0.000997278
   dataset 223, input: 0.000999178, achieved: 0.000999178
   dataset 224, input: 0.00137652, achieved: 0.00137652
   dataset 225, input: 0.00136432, achieved: 0.00136432
   dataset 226, input: 0.00135422, achieved: 0.00135422
   dataset 227, input: 0.00135094, achieved: 0.00135094
   dataset 228, input: 0.00131663, achieved: 0.00131663
   dataset 229, input: 0.001115, achieved: 0.001115
   dataset 230, input: 0.00110642, achieved: 0.00110642
   dataset 231, input: 0.00110372, achieved: 0.00110372
   dataset 232, input: 0.00107563, achieved: 0.00107563
   dataset 233, input: 0.00104146, achieved: 0.00104146
   dataset 234, input: 0.00101227, achieved: 0.00101227
   dataset 235, input: 0.00101501, achieved: 0.00101501
   dataset 236, input: 0.000999552, achieved: 0.000999552
   dataset 237, input: 0.00101253, achieved: 0.00101253
   dataset 238, input: 0.00098326, achieved: 0.00098326
   dataset 239, input: 0.000964379, achieved: 0.000964379
   dataset 240, input: 0.000960551, achieved: 0.000960551
   dataset 241, input: 0.000944432, achieved: 0.000944432
   dataset 242, input: 0.00131418, achieved: 0.00131418
   dataset 243, input: 0.000830249, achieved: 0.000830249
   dataset 244, input: 0.000810101, achieved: 0.000810101
   dataset 245, input: 0.000771129, achieved: 0.000771129
   dataset 246, input: 0.000749023, achieved: 0.000749023
   dataset 247, input: 0.00075714, achieved: 0.00075714
   dataset 248, input: 0.000729739, achieved: 0.000729739
   dataset 249, input: 0.000752794, achieved: 0.000752794
   dataset 250, input: 0.000713534, achieved: 0.000713534
   dataset 251, input: 0.000729767, achieved: 0.000729767
   dataset 252, input: 0.00120029, achieved: 0.00120029
   dataset 253, input: 0.00139872, achieved: 0.00139872
   dataset 254, input: 0.00135715, achieved: 0.00135715
   dataset 255, input: 0.00131714, achieved: 0.00131714
   dataset 256, input: 0.00128543, achieved: 0.00128543
   dataset 257, input: 0.00125699, achieved: 0.00125699
   dataset 258, input: 0.000818995, achieved: 0.000818995
   dataset 259, input: 0.00123534, achieved: 0.00123534
   dataset 260, input: 0.00127961, achieved: 0.00127961
   dataset 261, input: 0.00127486, achieved: 0.00127486
   dataset 262, input: 0.00125333, achieved: 0.00125333
   dataset 263, input: 0.00124844, achieved: 0.00124844
   dataset 264, input: 0.00122772, achieved: 0.00122772
   dataset 265, input: 0.0008236, achieved: 0.0008236
   dataset 266, input: 0.00121827, achieved: 0.00121827
   dataset 267, input: 0.000811972, achieved: 0.000811972
   dataset 268, input: 0.00132362, achieved: 0.00132362
   dataset 269, input: 0.00139814, achieved: 0.00139814
   dataset 270, input: 0.00125598, achieved: 0.00125598
   dataset 271, input: 0.00108371, achieved: 0.00108371
   dataset 272, input: 0.00107062, achieved: 0.00107062
   dataset 273, input: 0.00105916, achieved: 0.00105916
   dataset 274, input: 0.00102508, achieved: 0.00102508
   dataset 275, input: 0.001115, achieved: 0.001115
   dataset 276, input: 0.00109822, achieved: 0.00109822
   dataset 277, input: 0.00117478, achieved: 0.00117478
   dataset 278, input: 0.00119669, achieved: 0.00119669
   dataset 279, input: 0.00114093, achieved: 0.00114093
   dataset 280, input: 0.000779562, achieved: 0.000779562
   dataset 281, input: 0.00123281, achieved: 0.00123281
   dataset 282, input: 0.000626781, achieved: 0.000626781
   dataset 283, input: 0.00125362, achieved: 0.00125362
   dataset 284, input: 0.00109894, achieved: 0.00109894
   dataset 285, input: 0.0012276, achieved: 0.0012276
   dataset 286, input: 0.00127763, achieved: 0.00127763
   dataset 287, input: 0.00117288, achieved: 0.00117288
   dataset 288, input: 0.000738575, achieved: 0.000738575
   dataset 289, input: 0.0010606, achieved: 0.0010606
   dataset 290, input: 0.00123911, achieved: 0.00123911
   dataset 291, input: 0.00130963, achieved: 0.00130963
   dataset 292, input: 0.00122003, achieved: 0.00122003
   dataset 293, input: 0.000671683, achieved: 0.000671683
   dataset 294, input: 0.000733768, achieved: 0.000733768
   dataset 295, input: 0.00113394, achieved: 0.00113394
   dataset 296, input: 0.00123382, achieved: 0.00123382
   dataset 297, input: 0.00115412, achieved: 0.00115412
   dataset 298, input: 0.000686219, achieved: 0.000686219
   dataset 299, input: 0.00125178, achieved: 0.00125178
   dataset 300, input: 0.00123963, achieved: 0.00123963
   dataset 301, input: 0.00107753, achieved: 0.00107753
   dataset 302, input: 0.00115829, achieved: 0.00115829
   dataset 303, input: 0.00119977, achieved: 0.00119977
   dataset 304, input: 0.00117927, achieved: 0.00117927
   dataset 305, input: 0.000645116, achieved: 0.000645116
   dataset 306, input: 0.00123742, achieved: 0.00123742
   dataset 307, input: 0.00126571, achieved: 0.00126571
   dataset 308, input: 0.00114568, achieved: 0.00114568
   dataset 309, input: 0.00119626, achieved: 0.00119626
   dataset 310, input: 0.00122441, achieved: 0.00122441
   dataset 311, input: 0.000677066, achieved: 0.000677066
   dataset 312, input: 0.000732156, achieved: 0.000732156
   dataset 313, input: 0.00120647, achieved: 0.00120647
   dataset 314, input: 0.00122651, achieved: 0.00122651
   dataset 315, input: 0.00116149, achieved: 0.00116149
   dataset 316, input: 0.00121459, achieved: 0.00121459
   dataset 317, input: 0.00119836, achieved: 0.00119836
   dataset 318, input: 0.00127201, achieved: 0.00127201
   dataset 319, input: 0.00110162, achieved: 0.00110162
   dataset 320, input: 0.00109042, achieved: 0.00109042
   dataset 321, input: 0.00119994, achieved: 0.00119994
   dataset 322, input: 0.00109324, achieved: 0.00109324
   dataset 323, input: 0.00118549, achieved: 0.00118549
   dataset 324, input: 0.0011572, achieved: 0.0011572
   dataset 325, input: 0.00123549, achieved: 0.00123549
   dataset 326, input: 0.00118112, achieved: 0.00118112
   dataset 327, input: 0.00118874, achieved: 0.00118874
   dataset 328, input: 0.00107531, achieved: 0.00107531
   dataset 329, input: 0.00107845, achieved: 0.00107845
   dataset 330, input: 0.00124867, achieved: 0.00124867
   dataset 331, input: 0.00110691, achieved: 0.00110691
   dataset 332, input: 0.0010271, achieved: 0.0010271
   dataset 333, input: 0.00117421, achieved: 0.00117421
   dataset 334, input: 0.00113149, achieved: 0.00113149
   dataset 335, input: 0.00111281, achieved: 0.00111281
   dataset 336, input: 0.00110363, achieved: 0.00110363
   dataset 337, input: 0.00121197, achieved: 0.00121197
   dataset 338, input: 0.00119801, achieved: 0.00119801
   dataset 339, input: 0.0011519, achieved: 0.0011519
   dataset 340, input: 0.0011559, achieved: 0.0011559
   dataset 341, input: 0.00119496, achieved: 0.00119496
   dataset 342, input: 0.00104569, achieved: 0.00104569
   dataset 343, input: 0.0010756, achieved: 0.0010756
   dataset 344, input: 0.00109649, achieved: 0.00109649
   dataset 345, input: 0.00113204, achieved: 0.00113204
   dataset 346, input: 0.00101803, achieved: 0.00101803
   dataset 347, input: 0.00109609, achieved: 0.00109609
   dataset 348, input: 0.00106152, achieved: 0.00106152
   dataset 349, input: 0.00119758, achieved: 0.00119758
   dataset 350, input: 0.00130123, achieved: 0.00130123
   dataset 351, input: 0.00127432, achieved: 0.00127432
   dataset 352, input: 0.00124073, achieved: 0.00124073
   dataset 353, input: 0.00125926, achieved: 0.00125926
   dataset 354, input: 0.00121514, achieved: 0.00121514
   dataset 355, input: 0.00126171, achieved: 0.00126171
   dataset 356, input: 0.00125399, achieved: 0.00125399
   dataset 357, input: 0.00125549, achieved: 0.00125549
   dataset 358, input: 0.00118394, achieved: 0.00118394
   dataset 359, input: 0.00124139, achieved: 0.00124139
   dataset 360, input: 0.00060931, achieved: 0.00060931
   dataset 361, input: 0.00107773, achieved: 0.00107773
   dataset 362, input: 0.000908942, achieved: 0.000908942
   dataset 363, input: 0.000896076, achieved: 0.000896076
   dataset 364, input: 0.000916282, achieved: 0.000916282
   dataset 365, input: 0.00115259, achieved: 0.00115259
   dataset 366, input: 0.000930818, achieved: 0.000930818
   dataset 367, input: 0.00108648, achieved: 0.00108648
   dataset 368, input: 0.00108345, achieved: 0.00108345
   dataset 369, input: 0.00106921, achieved: 0.00106921
   dataset 370, input: 0.00108187, achieved: 0.00108187
   dataset 371, input: 0.00107059, achieved: 0.00107059
   dataset 372, input: 0.00106279, achieved: 0.00106279
   dataset 373, input: 0.00105715, achieved: 0.00105715
   dataset 374, input: 0.000961903, achieved: 0.000961903
   dataset 375, input: 0.000869625, achieved: 0.000869625
   dataset 376, input: 0.000964868, achieved: 0.000964868
   dataset 377, input: 0.000934761, achieved: 0.000934761
   dataset 378, input: 0.000975374, achieved: 0.000975374
   dataset 379, input: 0.000934962, achieved: 0.000934962
   dataset 380, input: 0.000880361, achieved: 0.000880361
   dataset 381, input: 0.000916628, achieved: 0.000916628
   dataset 382, input: 0.000851981, achieved: 0.000851981
   dataset 383, input: 0.000893054, achieved: 0.000893054
   dataset 384, input: 0.000926184, achieved: 0.000926184
   dataset 385, input: 0.000934502, achieved: 0.000934502
   dataset 386, input: 0.000911878, achieved: 0.000911878
   dataset 387, input: 0.000905863, achieved: 0.000905863
   dataset 388, input: 0.00111212, achieved: 0.00111212
   dataset 389, input: 0.000974654, achieved: 0.000974654
   dataset 390, input: 0.000943511, achieved: 0.000943511
   dataset 391, input: 0.000927824, achieved: 0.000927824
   dataset 392, input: 0.000950707, achieved: 0.000950707
   dataset 393, input: 0.000920341, achieved: 0.000920341
   dataset 394, input: 0.000930559, achieved: 0.000930559
   dataset 395, input: 0.000935193, achieved: 0.000935193
   dataset 396, input: 0.000913548, achieved: 0.000913548
   dataset 397, input: 0.000896191, achieved: 0.000896191
   dataset 398, input: 0.00089291, achieved: 0.00089291
   dataset 399, input: 0.000873798, achieved: 0.000873798
   dataset 400, input: 0.000873827, achieved: 0.000873827
   dataset 401, input: 0.000937006, achieved: 0.000937006
   dataset 402, input: 0.000879468, achieved: 0.000879468
   dataset 403, input: 0.000877079, achieved: 0.000877079
   dataset 404, input: 0.00086358, achieved: 0.00086358
   dataset 405, input: 0.000869481, achieved: 0.000869481
   dataset 406, input: 0.000827832, achieved: 0.000827832
   dataset 407, input: 0.000860443, achieved: 0.000860443
   dataset 408, input: 0.000857708, achieved: 0.000857708
   dataset 409, input: 0.000894004, achieved: 0.000894004
   dataset 410, input: 0.000883987, achieved: 0.000883987
   dataset 411, input: 0.000877454, achieved: 0.000877454
   dataset 412, input: 0.000880908, achieved: 0.000880908
   dataset 413, input: 0.000841244, achieved: 0.000841244
   dataset 414, input: 0.000850167, achieved: 0.000850167
   dataset 415, input: 0.000808259, achieved: 0.000808259
   dataset 416, input: 0.000844209, achieved: 0.000844209
   dataset 417, input: 0.000806561, achieved: 0.000806561
   dataset 418, input: 0.000799595, achieved: 0.000799595
   dataset 419, input: 0.000804719, achieved: 0.000804719
   dataset 420, input: 0.000806964, achieved: 0.000806964
   dataset 421, input: 0.000775274, achieved: 0.000775274
   dataset 422, input: 0.000757428, achieved: 0.000757428
   dataset 423, input: 0.000966566, achieved: 0.000966566
   dataset 424, input: 0.00113167, achieved: 0.00113167
   dataset 425, input: 0.00111028, achieved: 0.00111028
   dataset 426, input: 0.00109252, achieved: 0.00109252
   dataset 427, input: 0.00107577, achieved: 0.00107577
   dataset 428, input: 0.00107767, achieved: 0.00107767
   dataset 429, input: 0.00107422, achieved: 0.00107422
   dataset 430, input: 0.001056, achieved: 0.001056
   dataset 431, input: 0.000535654, achieved: 0.000535654
   dataset 432, input: 0.00104445, achieved: 0.00104445
   dataset 433, input: 0.00103605, achieved: 0.00103605
   dataset 434, input: 0.00102966, achieved: 0.00102966
   dataset 435, input: 0.00103639, achieved: 0.00103639
   dataset 436, input: 0.00101461, achieved: 0.00101461
   dataset 437, input: 0.00102399, achieved: 0.00102399
   dataset 438, input: 0.000797465, achieved: 0.000797465
   dataset 439, input: 0.000865048, achieved: 0.000865048
   dataset 440, input: 0.000831257, achieved: 0.000831257
   dataset 441, input: 0.000854254, achieved: 0.000854254
   dataset 442, input: 0.000833646, achieved: 0.000833646
   dataset 443, input: 0.000821211, achieved: 0.000821211
   dataset 444, input: 0.000825471, achieved: 0.000825471
   dataset 445, input: 0.00080115, achieved: 0.00080115
   dataset 446, input: 0.000794731, achieved: 0.000794731
   dataset 447, input: 0.000775274, achieved: 0.000775274
   dataset 448, input: 0.000776281, achieved: 0.000776281
   dataset 449, input: 0.000776195, achieved: 0.000776195
   dataset 450, input: 0.000776857, achieved: 0.000776857
   dataset 451, input: 0.000766178, achieved: 0.000766178
   dataset 452, input: 0.000739093, achieved: 0.000739093
   dataset 453, input: 0.000756248, achieved: 0.000756248
   dataset 454, input: 0.00108469, achieved: 0.00108469
   dataset 455, input: 0.00109134, achieved: 0.00109134
   dataset 456, input: 0.00106509, achieved: 0.00106509
   dataset 457, input: 0.00102641, achieved: 0.00102641
   dataset 458, input: 0.00101348, achieved: 0.00101348
   dataset 459, input: 0.000990917, achieved: 0.000990917
   dataset 460, input: 0.000954189, achieved: 0.000954189
   dataset 461, input: 0.000954276, achieved: 0.000954276
   dataset 462, input: 0.000930789, achieved: 0.000930789
   dataset 463, input: 0.000939193, achieved: 0.000939193
   dataset 464, input: 0.000878029, achieved: 0.000878029
   dataset 465, input: 0.000840985, achieved: 0.000840985
   dataset 466, input: 0.000837647, achieved: 0.000837647
   dataset 467, input: 0.00082881, achieved: 0.00082881
   dataset 468, input: 0.000801466, achieved: 0.000801466
   dataset 469, input: 0.00081036, achieved: 0.00081036
   dataset 470, input: 0.000888535, achieved: 0.000888535
   dataset 471, input: 0.00100767, achieved: 0.00100767
   dataset 472, input: 0.000980353, achieved: 0.000980353
   dataset 473, input: 0.00086453, achieved: 0.00086453
   dataset 474, input: 0.000748879, achieved: 0.000748879
   dataset 475, input: 0.000746836, achieved: 0.000746836
   dataset 476, input: 0.000845677, achieved: 0.000845677
   dataset 477, input: 0.000897947, achieved: 0.000897947
   dataset 478, input: 0.000767991, achieved: 0.000767991
   dataset 479, input: 0.000885657, achieved: 0.000885657
   dataset 480, input: 0.00089694, achieved: 0.00089694
   dataset 481, input: 0.00107099, achieved: 0.00107099
   dataset 482, input: 0.0010572, achieved: 0.0010572
   dataset 483, input: 0.00104684, achieved: 0.00104684
   dataset 484, input: 0.000883383, achieved: 0.000883383
   dataset 485, input: 0.000876849, achieved: 0.000876849
   dataset 486, input: 0.00103012, achieved: 0.00103012
   dataset 487, input: 0.00100839, achieved: 0.00100839
   dataset 488, input: 0.00100042, achieved: 0.00100042
   dataset 489, input: 0.000997652, achieved: 0.000997652
   dataset 490, input: 0.00100922, achieved: 0.00100922
   dataset 491, input: 0.000992989, achieved: 0.000992989
   dataset 492, input: 0.000994918, achieved: 0.000994918
   dataset 493, input: 0.000976151, achieved: 0.000976151
   dataset 494, input: 0.000985937, achieved: 0.000985937
   dataset 495, input: 0.000979778, achieved: 0.000979778
   dataset 496, input: 0.000979116, achieved: 0.000979116
   dataset 497, input: 0.00100036, achieved: 0.00100036
   dataset 498, input: 0.000985333, achieved: 0.000985333
   dataset 499, input: 0.000982829, achieved: 0.000982829
   dataset 500, input: 0.000964494, achieved: 0.000964494
   dataset 501, input: 0.000973244, achieved: 0.000973244
   dataset 502, input: 0.000739842, achieved: 0.000739842
   dataset 503, input: 0.000756795, achieved: 0.000756795
   dataset 504, input: 0.000775504, achieved: 0.000775504
   dataset 505, input: 0.000751815, achieved: 0.000751815
   dataset 506, input: 0.000758148, achieved: 0.000758148
   dataset 507, input: 0.00073892, achieved: 0.00073892
   dataset 508, input: 0.000737136, achieved: 0.000737136
   dataset 509, input: 0.00074698, achieved: 0.00074698
   dataset 510, input: 0.0010393, achieved: 0.0010393
   dataset 511, input: 0.00106869, achieved: 0.00106869
   dataset 512, input: 0.00107036, achieved: 0.00107036
   dataset 513, input: 0.00102186, achieved: 0.00102186
   dataset 514, input: 0.000717938, achieved: 0.000717938
   dataset 515, input: 0.000764998, achieved: 0.000764998
   dataset 516, input: 0.000764077, achieved: 0.000764077
   dataset 517, input: 0.000749311, achieved: 0.000749311
   dataset 518, input: 0.000758752, achieved: 0.000758752
   dataset 519, input: 0.000727033, achieved: 0.000727033
   dataset 520, input: 0.000731811, achieved: 0.000731811
   dataset 521, input: 0.000715865, achieved: 0.000715865
   dataset 522, input: 0.000727637, achieved: 0.000727637
   dataset 523, input: 0.000723723, achieved: 0.000723723
   dataset 524, input: 0.000725162, achieved: 0.000725162
   dataset 525, input: 0.000718657, achieved: 0.000718657
   dataset 526, input: 0.000706194, achieved: 0.000706194
   dataset 527, input: 0.000834538, achieved: 0.000834538
   dataset 528, input: 0.000985678, achieved: 0.000985678
   dataset 529, input: 0.000981217, achieved: 0.000981217
   dataset 530, input: 0.000931393, achieved: 0.000931393
   dataset 531, input: 0.000933523, achieved: 0.000933523
   dataset 532, input: 0.000914699, achieved: 0.000914699
   dataset 533, input: 0.000895558, achieved: 0.000895558
   dataset 534, input: 0.000864559, achieved: 0.000864559
   dataset 535, input: 0.000850887, achieved: 0.000850887
   dataset 536, input: 0.000842166, achieved: 0.000842166
   dataset 537, input: 0.0008255, achieved: 0.0008255
   dataset 538, input: 0.000985649, achieved: 0.000985649
   dataset 539, input: 0.000962853, achieved: 0.000962853
   dataset 540, input: 0.000963515, achieved: 0.000963515
   dataset 541, input: 0.000948778, achieved: 0.000948778
   dataset 542, input: 0.000944605, achieved: 0.000944605
   dataset 543, input: 0.000926903, achieved: 0.000926903
   dataset 544, input: 0.000911389, achieved: 0.000911389
   dataset 545, input: 0.000893457, achieved: 0.000893457
   dataset 546, input: 0.000892248, achieved: 0.000892248
   dataset 547, input: 0.000896594, achieved: 0.000896594
   dataset 548, input: 0.000883038, achieved: 0.000883038
   dataset 549, input: 0.000850369, achieved: 0.000850369
   dataset 550, input: 0.000866113, achieved: 0.000866113
   dataset 551, input: 0.000871323, achieved: 0.000871323
   dataset 552, input: 0.000874806, achieved: 0.000874806
   dataset 553, input: 0.000835574, achieved: 0.000835574
   dataset 554, input: 0.000844814, achieved: 0.000844814
   dataset 555, input: 0.000834912, achieved: 0.000834912
   dataset 556, input: 0.00081059, achieved: 0.00081059
   dataset 557, input: 0.000825903, achieved: 0.000825903
   dataset 558, input: 0.00081747, achieved: 0.00081747
   dataset 559, input: 0.000806791, achieved: 0.000806791
   dataset 560, input: 0.000790126, achieved: 0.000790126
   dataset 561, input: 0.000794069, achieved: 0.000794069
   dataset 562, input: 0.000785204, achieved: 0.000785204
   dataset 563, input: 0.000982656, achieved: 0.000982656
   dataset 564, input: 0.000985477, achieved: 0.000985477
   dataset 565, input: 0.000982829, achieved: 0.000982829
   dataset 566, input: 0.000977216, achieved: 0.000977216
   dataset 567, input: 0.000954909, achieved: 0.000954909
   dataset 568, input: 0.000929609, achieved: 0.000929609
   dataset 569, input: 0.000961961, achieved: 0.000961961
   dataset 570, input: 0.000917318, achieved: 0.000917318
   dataset 571, input: 0.000894954, achieved: 0.000894954
   dataset 572, input: 0.000898782, achieved: 0.000898782
   dataset 573, input: 0.000891701, achieved: 0.000891701
   dataset 574, input: 0.000847087, achieved: 0.000847087
   dataset 575, input: 0.000820722, achieved: 0.000820722
   dataset 576, input: 0.00118843, achieved: 0.00118843
   dataset 577, input: 0.000583664, achieved: 0.000583664
   dataset 578, input: 0.000977936, achieved: 0.000977936
   dataset 579, input: 0.000581765, achieved: 0.000581765
   dataset 580, input: 0.000965415, achieved: 0.000965415
   dataset 581, input: 0.00114482, achieved: 0.00114482
   dataset 582, input: 0.00115279, achieved: 0.00115279
   dataset 583, input: 0.00112375, achieved: 0.00112375
   dataset 584, input: 0.00110306, achieved: 0.00110306
   dataset 585, input: 0.001103, achieved: 0.001103
   dataset 586, input: 0.00109773, achieved: 0.00109773
   dataset 587, input: 0.000557587, achieved: 0.000557587
   dataset 588, input: 0.0010625, achieved: 0.0010625
   dataset 589, input: 0.00105418, achieved: 0.00105418
   dataset 590, input: 0.00105697, achieved: 0.00105697
   dataset 591, input: 0.00103662, achieved: 0.00103662
   dataset 592, input: 0.00053528, achieved: 0.00053528
   dataset 593, input: 0.00104655, achieved: 0.00104655
   dataset 594, input: 0.00101345, achieved: 0.00101345
   dataset 595, input: 0.000978137, achieved: 0.000978137
   dataset 596, input: 0.000989391, achieved: 0.000989391
   dataset 597, input: 0.000980066, achieved: 0.000980066
   dataset 598, input: 0.000958651, achieved: 0.000958651
   dataset 599, input: 0.000949354, achieved: 0.000949354
   dataset 600, input: 0.000947713, achieved: 0.000947713
   dataset 601, input: 0.000930501, achieved: 0.000930501
   dataset 602, input: 0.000927853, achieved: 0.000927853
   dataset 603, input: 0.000799135, achieved: 0.000799135
   dataset 604, input: 0.000789665, achieved: 0.000789665
   dataset 605, input: 0.0007924, achieved: 0.0007924
   dataset 606, input: 0.000782354, achieved: 0.000782354
   dataset 607, input: 0.000770409, achieved: 0.000770409
   dataset 608, input: 0.000770265, achieved: 0.000770265
   dataset 609, input: 0.000753254, achieved: 0.000753254
   dataset 610, input: 0.000736445, achieved: 0.000736445
   dataset 611, input: 0.000745166, achieved: 0.000745166
   dataset 612, input: 0.000732531, achieved: 0.000732531
   dataset 613, input: 0.000734488, achieved: 0.000734488
   dataset 614, input: 0.000687917, achieved: 0.000687917
   dataset 615, input: 0.000692292, achieved: 0.000692292
   dataset 616, input: 0.000684607, achieved: 0.000684607
   dataset 617, input: 0.000829069, achieved: 0.000829069
   dataset 618, input: 0.000939078, achieved: 0.000939078
   dataset 619, input: 0.00093053, achieved: 0.00093053
   dataset 620, input: 0.000926011, achieved: 0.000926011
   dataset 621, input: 0.00090923, achieved: 0.00090923
   dataset 622, input: 0.000900451, achieved: 0.000900451
   dataset 623, input: 0.000894205, achieved: 0.000894205
   dataset 624, input: 0.000884448, achieved: 0.000884448
   dataset 625, input: 0.000879123, achieved: 0.000879123
   dataset 626, input: 0.000876389, achieved: 0.000876389
   dataset 627, input: 0.000872647, achieved: 0.000872647
   dataset 628, input: 0.00083379, achieved: 0.00083379
   dataset 629, input: 0.000841101, achieved: 0.000841101
   dataset 630, input: 0.00086335, achieved: 0.00086335
   dataset 631, input: 0.000903761, achieved: 0.000903761
   dataset 632, input: 0.000872647, achieved: 0.000872647
   dataset 633, input: 0.000878404, achieved: 0.000878404
   dataset 634, input: 0.000876763, achieved: 0.000876763
   dataset 635, input: 0.000868473, achieved: 0.000868473
   dataset 636, input: 0.00084467, achieved: 0.00084467
   dataset 637, input: 0.000835344, achieved: 0.000835344
   dataset 638, input: 0.000840151, achieved: 0.000840151
   dataset 639, input: 0.000828868, achieved: 0.000828868
   dataset 640, input: 0.000811857, achieved: 0.000811857
   dataset 641, input: 0.000798156, achieved: 0.000798156
   dataset 642, input: 0.000792572, achieved: 0.000792572
   dataset 643, input: 0.00105035, achieved: 0.00105035
   dataset 644, input: 0.000988672, achieved: 0.000988672
   dataset 645, input: 0.000968523, achieved: 0.000968523
   dataset 646, input: 0.000888881, achieved: 0.000888881
   dataset 647, input: 0.00093761, achieved: 0.00093761
   dataset 648, input: 0.000921204, achieved: 0.000921204
   dataset 649, input: 0.00106351, achieved: 0.00106351
   dataset 650, input: 0.0010184, achieved: 0.0010184
   dataset 651, input: 0.00103274, achieved: 0.00103274
   dataset 652, input: 0.00101029, achieved: 0.00101029
   dataset 653, input: 0.00108702, achieved: 0.00108702
   dataset 654, input: 0.000783534, achieved: 0.000783534
   dataset 655, input: 0.000794961, achieved: 0.000794961
   dataset 656, input: 0.000558709, achieved: 0.000558709
   dataset 657, input: 0.000806762, achieved: 0.000806762
   dataset 658, input: 0.000806359, achieved: 0.000806359
   dataset 659, input: 0.000549038, achieved: 0.000549038
   dataset 660, input: 0.000786902, achieved: 0.000786902
   dataset 661, input: 0.000804316, achieved: 0.000804316
   dataset 662, input: 0.000810015, achieved: 0.000810015
   dataset 663, input: 0.000784858, achieved: 0.000784858
   dataset 664, input: 0.000803308, achieved: 0.000803308
   dataset 665, input: 0.000817786, achieved: 0.000817786
   dataset 666, input: 0.000810447, achieved: 0.000810447
   dataset 667, input: 0.000796228, achieved: 0.000796228
   dataset 668, input: 0.000810216, achieved: 0.000810216
   dataset 669, input: 0.000544087, achieved: 0.000544087
   dataset 670, input: 0.00081249, achieved: 0.00081249
   dataset 671, input: 0.000796775, achieved: 0.000796775
   dataset 672, input: 0.000799768, achieved: 0.000799768
   dataset 673, input: 0.000819312, achieved: 0.000819312
   dataset 674, input: 0.000784772, achieved: 0.000784772
   dataset 675, input: 0.000563602, achieved: 0.000563602
   dataset 676, input: 0.000821327, achieved: 0.000821327
   dataset 677, input: 0.000785779, achieved: 0.000785779
   dataset 678, input: 0.000550103, achieved: 0.000550103
   dataset 679, input: 0.000578368, achieved: 0.000578368
   dataset 680, input: 0.000810792, achieved: 0.000810792
   dataset 681, input: 0.000559717, achieved: 0.000559717
   dataset 682, input: 0.000795969, achieved: 0.000795969
   dataset 683, input: 0.000786211, achieved: 0.000786211
   dataset 684, input: 0.000799192, achieved: 0.000799192
   dataset 685, input: 0.000789751, achieved: 0.000789751
   dataset 686, input: 0.000782038, achieved: 0.000782038
   dataset 687, input: 0.000817959, achieved: 0.000817959
   dataset 688, input: 0.000797552, achieved: 0.000797552
   dataset 689, input: 0.000795854, achieved: 0.000795854
   dataset 690, input: 0.000785578, achieved: 0.000785578
   dataset 691, input: 0.000807626, achieved: 0.000807626
   dataset 692, input: 0.000794184, achieved: 0.000794184
   dataset 693, input: 0.000582973, achieved: 0.000582973
   dataset 694, input: 0.000549384, achieved: 0.000549384
   dataset 695, input: 0.000805381, achieved: 0.000805381
   dataset 696, input: 0.00078696, achieved: 0.00078696
   dataset 697, input: 0.000796918, achieved: 0.000796918
   dataset 698, input: 0.000817671, achieved: 0.000817671
   dataset 699, input: 0.000756075, achieved: 0.000756075
   dataset 700, input: 0.000672547, achieved: 0.000672547
   dataset 701, input: 0.000692896, achieved: 0.000692896
   dataset 702, input: 0.000674533, achieved: 0.000674533
   dataset 703, input: 0.000660486, achieved: 0.000660486
   dataset 704, input: 0.000682966, achieved: 0.000682966
   dataset 705, input: 0.000660688, achieved: 0.000660688
   dataset 706, input: 0.000671568, achieved: 0.000671568
   dataset 707, input: 0.000639245, achieved: 0.000639245
   dataset 708, input: 0.000680318, achieved: 0.000680318
   dataset 709, input: 0.000661033, achieved: 0.000661033
   dataset 710, input: 0.000649463, achieved: 0.000649463
   dataset 711, input: 0.000676547, achieved: 0.000676547
   dataset 712, input: 0.000658673, achieved: 0.000658673
   dataset 713, input: 0.000660371, achieved: 0.000660371
   dataset 714, input: 0.000675857, achieved: 0.000675857
   dataset 715, input: 0.000676663, achieved: 0.000676663
   dataset 716, input: 0.000657522, achieved: 0.000657522
   dataset 717, input: 0.000652111, achieved: 0.000652111
   dataset 718, input: 0.000632941, achieved: 0.000632941
   dataset 719, input: 0.00064408, achieved: 0.00064408
   dataset 720, input: 0.000624824, achieved: 0.000624824
   dataset 721, input: 0.000614923, achieved: 0.000614923
   dataset 722, input: 0.000621025, achieved: 0.000621025
   dataset 723, input: 0.000620708, achieved: 0.000620708
   dataset 724, input: 0.000637518, achieved: 0.000637518
   dataset 725, input: 0.000616419, achieved: 0.000616419
   dataset 726, input: 0.000616736, achieved: 0.000616736
   dataset 727, input: 0.00061026, achieved: 0.00061026
   dataset 728, input: 0.000629314, achieved: 0.000629314
   dataset 729, input: 0.000617197, achieved: 0.000617197
   dataset 730, input: 0.000633833, achieved: 0.000633833
   dataset 731, input: 0.000612706, achieved: 0.000612706
   dataset 732, input: 0.000616707, achieved: 0.000616707
   dataset 733, input: 0.000578426, achieved: 0.000578426
   dataset 734, input: 0.000607324, achieved: 0.000607324
   dataset 735, input: 0.000609627, achieved: 0.000609627
   dataset 736, input: 0.000615355, achieved: 0.000615355
   dataset 737, input: 0.000613685, achieved: 0.000613685
   dataset 738, input: 0.000614952, achieved: 0.000614952
   dataset 739, input: 0.000984844, achieved: 0.000984844
   dataset 740, input: 0.000985448, achieved: 0.000985448
   dataset 741, input: 0.000887096, achieved: 0.000887096
   dataset 742, input: 0.000855233, achieved: 0.000855233
   dataset 743, input: 0.000844555, achieved: 0.000844555
   dataset 744, input: 0.000841043, achieved: 0.000841043
   dataset 745, input: 0.000844727, achieved: 0.000844727
   dataset 746, input: 0.000837733, achieved: 0.000837733
   dataset 747, input: 0.000836898, achieved: 0.000836898
   dataset 748, input: 0.000836927, achieved: 0.000836927
   dataset 749, input: 0.000839949, achieved: 0.000839949
   dataset 750, input: 0.0008196, achieved: 0.0008196
   dataset 751, input: 0.000815023, achieved: 0.000815023
   dataset 752, input: 0.00061026, achieved: 0.00061026
   dataset 753, input: 0.000602287, achieved: 0.000602287
   dataset 754, input: 0.000608331, achieved: 0.000608331
   dataset 755, input: 0.000524515, achieved: 0.000524515
   dataset 756, input: 0.000585823, achieved: 0.000585823
   dataset 757, input: 0.000589219, achieved: 0.000589219
   dataset 758, input: 0.000566653, achieved: 0.000566653
   dataset 759, input: 0.000586082, achieved: 0.000586082
   dataset 760, input: 0.000583434, achieved: 0.000583434
   dataset 761, input: 0.000583204, achieved: 0.000583204
   dataset 762, input: 0.000566164, achieved: 0.000566164
   dataset 763, input: 0.000581909, achieved: 0.000581909
   dataset 764, input: 0.000556292, achieved: 0.000556292
   dataset 765, input: 0.000557299, achieved: 0.000557299
   dataset 766, input: 0.000575893, achieved: 0.000575893
   dataset 767, input: 0.00057195, achieved: 0.00057195
   dataset 768, input: 0.000563602, achieved: 0.000563602
   dataset 769, input: 0.000588644, achieved: 0.000588644
   dataset 770, input: 0.000575519, achieved: 0.000575519
   dataset 771, input: 0.000551945, achieved: 0.000551945
   dataset 772, input: 0.000562595, achieved: 0.000562595
   dataset 773, input: 0.00058021, achieved: 0.00058021
   dataset 774, input: 0.000539079, achieved: 0.000539079
   dataset 775, input: 0.000576238, achieved: 0.000576238
   dataset 776, input: 0.000571086, achieved: 0.000571086
   dataset 777, input: 0.000553989, achieved: 0.000553989
   dataset 778, input: 0.000556349, achieved: 0.000556349
   dataset 779, input: 0.000565214, achieved: 0.000565214
   dataset 780, input: 0.000578195, achieved: 0.000578195
   dataset 781, input: 0.000553298, achieved: 0.000553298
   dataset 782, input: 0.000540576, achieved: 0.000540576
   dataset 783, input: 0.000950016, achieved: 0.000950016
   dataset 784, input: 0.000960896, achieved: 0.000960896
   dataset 785, input: 0.000938359, achieved: 0.000938359
   dataset 786, input: 0.000923248, achieved: 0.000923248
   dataset 787, input: 0.000889629, achieved: 0.000889629
   dataset 788, input: 0.000852585, achieved: 0.000852585
   dataset 789, input: 0.000825846, achieved: 0.000825846
   dataset 790, input: 0.000832552, achieved: 0.000832552
   dataset 791, input: 0.00084585, achieved: 0.00084585
   dataset 792, input: 0.000820204, achieved: 0.000820204
   dataset 793, input: 0.000830307, achieved: 0.000830307
   dataset 794, input: 0.000823831, achieved: 0.000823831
   dataset 795, input: 0.000831113, achieved: 0.000831113
   dataset 796, input: 0.000809497, achieved: 0.000809497
   dataset 797, input: 0.00059866, achieved: 0.00059866
   dataset 798, input: 0.000603697, achieved: 0.000603697
   dataset 799, input: 0.000610577, achieved: 0.000610577
   dataset 800, input: 0.000628364, achieved: 0.000628364
   dataset 801, input: 0.000412606, achieved: 0.000412606
   dataset 802, input: 0.0006235, achieved: 0.0006235
   dataset 803, input: 0.00061429, achieved: 0.00061429
   dataset 804, input: 0.00060246, achieved: 0.00060246
   dataset 805, input: 0.000600128, achieved: 0.000600128
   dataset 806, input: 0.000587953, achieved: 0.000587953
   dataset 807, input: 0.000595724, achieved: 0.000595724
   dataset 808, input: 0.000584413, achieved: 0.000584413
   dataset 809, input: 0.000590313, achieved: 0.000590313
   dataset 810, input: 0.000585967, achieved: 0.000585967
   dataset 811, input: 0.000584528, achieved: 0.000584528
   dataset 812, input: 0.000583233, achieved: 0.000583233
   dataset 813, input: 0.000574425, achieved: 0.000574425
   dataset 814, input: 0.00100597, achieved: 0.00100597
   dataset 815, input: 0.000929868, achieved: 0.000929868
   dataset 816, input: 0.000955168, achieved: 0.000955168
   dataset 817, input: 0.000952865, achieved: 0.000952865
   dataset 818, input: 0.0009478, achieved: 0.0009478
   dataset 819, input: 0.000965415, achieved: 0.000965415
   dataset 820, input: 0.000936517, achieved: 0.000936517
   dataset 821, input: 0.000923795, achieved: 0.000923795
   dataset 822, input: 0.000920542, achieved: 0.000920542
   dataset 823, input: 0.000888017, achieved: 0.000888017
   dataset 824, input: 0.000894263, achieved: 0.000894263
   dataset 825, input: 0.000868387, achieved: 0.000868387
   dataset 826, input: 0.000871208, achieved: 0.000871208
   dataset 827, input: 0.000862659, achieved: 0.000862659
   dataset 828, input: 0.000899386, achieved: 0.000899386
   dataset 829, input: 0.00089078, achieved: 0.00089078
   dataset 830, input: 0.000872129, achieved: 0.000872129
   dataset 831, input: 0.000858716, achieved: 0.000858716
   dataset 832, input: 0.000942388, achieved: 0.000942388
   dataset 833, input: 0.00097664, achieved: 0.00097664
   dataset 834, input: 0.00097641, achieved: 0.00097641
   dataset 835, input: 0.000943885, achieved: 0.000943885
   dataset 836, input: 0.000972927, achieved: 0.000972927
   dataset 837, input: 0.000950476, achieved: 0.000950476
   dataset 838, input: 0.000932113, achieved: 0.000932113
   dataset 839, input: 0.000936862, achieved: 0.000936862
   dataset 840, input: 0.000922039, achieved: 0.000922039
   dataset 841, input: 0.000921406, achieved: 0.000921406
   dataset 842, input: 0.000905517, achieved: 0.000905517
   dataset 843, input: 0.000902294, achieved: 0.000902294
   dataset 844, input: 0.000892507, achieved: 0.000892507
   dataset 845, input: 0.000902006, achieved: 0.000902006
   dataset 846, input: 0.000838193, achieved: 0.000838193
   dataset 847, input: 0.000579635, achieved: 0.000579635
   dataset 848, input: 0.000568639, achieved: 0.000568639
   dataset 849, input: 0.000555486, achieved: 0.000555486
   dataset 850, input: 0.000559688, achieved: 0.000559688
   dataset 851, input: 0.000556637, achieved: 0.000556637
   dataset 852, input: 0.000539022, achieved: 0.000539022
   dataset 853, input: 0.000539396, achieved: 0.000539396
   dataset 854, input: 0.000531797, achieved: 0.000531797
   dataset 855, input: 0.000561185, achieved: 0.000561185
   dataset 856, input: 0.000531624, achieved: 0.000531624
   dataset 857, input: 0.000533236, achieved: 0.000533236
   dataset 858, input: 0.000515045, achieved: 0.000515045
   dataset 859, input: 0.000518384, achieved: 0.000518384
   dataset 860, input: 0.000511016, achieved: 0.000511016
   dataset 861, input: 0.000516887, achieved: 0.000516887
   dataset 862, input: 0.000529984, achieved: 0.000529984
   dataset 863, input: 0.00092958, achieved: 0.00092958
   dataset 864, input: 0.000910324, achieved: 0.000910324
   dataset 865, input: 0.000883469, achieved: 0.000883469
   dataset 866, input: 0.000893313, achieved: 0.000893313
   dataset 867, input: 0.00088062, achieved: 0.00088062
   dataset 868, input: 0.000853679, achieved: 0.000853679
   dataset 869, input: 0.000866113, achieved: 0.000866113
   dataset 870, input: 0.000823744, achieved: 0.000823744
   dataset 871, input: 0.000830652, achieved: 0.000830652
   dataset 872, input: 0.000821183, achieved: 0.000821183
   dataset 873, input: 0.000805956, achieved: 0.000805956
   dataset 874, input: 0.000800516, achieved: 0.000800516
   dataset 875, input: 0.000789492, achieved: 0.000789492
   dataset 876, input: 0.000612505, achieved: 0.000612505
   dataset 877, input: 0.000419428, achieved: 0.000419428
   dataset 878, input: 0.000620104, achieved: 0.000620104
   dataset 879, input: 0.000616045, achieved: 0.000616045
   dataset 880, input: 0.000616419, achieved: 0.000616419
   dataset 881, input: 0.000610979, achieved: 0.000610979
   dataset 882, input: 0.000413757, achieved: 0.000413757
   dataset 883, input: 0.000605482, achieved: 0.000605482
   dataset 884, input: 0.000603064, achieved: 0.000603064
   dataset 885, input: 0.000604474, achieved: 0.000604474
   dataset 886, input: 0.000586456, achieved: 0.000586456
   dataset 887, input: 0.000584413, achieved: 0.000584413
   dataset 888, input: 0.000582801, achieved: 0.000582801
   dataset 889, input: 0.000590774, achieved: 0.000590774
   dataset 890, input: 0.000586974, achieved: 0.000586974
   dataset 891, input: 0.000585046, achieved: 0.000585046
   dataset 892, input: 0.000585823, achieved: 0.000585823
   dataset 893, input: 0.000566769, achieved: 0.000566769
   dataset 894, input: 0.000584585, achieved: 0.000584585
   dataset 895, input: 0.000597538, achieved: 0.000597538
   dataset 896, input: 0.000563027, achieved: 0.000563027
   dataset 897, input: 0.000609914, achieved: 0.000609914
   dataset 898, input: 0.000603697, achieved: 0.000603697
   dataset 899, input: 0.000606576, achieved: 0.000606576
   dataset 900, input: 0.000595465, achieved: 0.000595465
   dataset 901, input: 0.000593393, achieved: 0.000593393
   dataset 902, input: 0.000598488, achieved: 0.000598488
   dataset 903, input: 0.00058542, achieved: 0.00058542
   dataset 904, input: 0.000573475, achieved: 0.000573475
   dataset 905, input: 0.000561041, achieved: 0.000561041
   dataset 906, input: 0.000564207, achieved: 0.000564207
   dataset 907, input: 0.000552291, achieved: 0.000552291
   dataset 908, input: 0.000555111, achieved: 0.000555111
   dataset 909, input: 0.000554248, achieved: 0.000554248
   dataset 910, input: 0.000550679, achieved: 0.000550679
   dataset 911, input: 0.00055088, achieved: 0.00055088
   dataset 912, input: 0.000543915, achieved: 0.000543915
   dataset 913, input: 0.000549211, achieved: 0.000549211
   dataset 914, input: 0.00052961, achieved: 0.00052961
   dataset 915, input: 0.000530991, achieved: 0.000530991
   dataset 916, input: 0.000537899, achieved: 0.000537899
   dataset 917, input: 0.000528832, achieved: 0.000528832
   dataset 918, input: 0.000568208, achieved: 0.000568208
   dataset 919, input: 0.000579606, achieved: 0.000579606
   dataset 920, input: 0.000583779, achieved: 0.000583779
   dataset 921, input: 0.000594429, achieved: 0.000594429
   dataset 922, input: 0.000576267, achieved: 0.000576267
   dataset 923, input: 0.000576296, achieved: 0.000576296
   dataset 924, input: 0.000581765, achieved: 0.000581765
   dataset 925, input: 0.000562394, achieved: 0.000562394
   dataset 926, input: 0.000557385, achieved: 0.000557385
   dataset 927, input: 0.000563833, achieved: 0.000563833
   dataset 928, input: 0.000560782, achieved: 0.000560782
   dataset 929, input: 0.000570769, achieved: 0.000570769
   dataset 930, input: 0.000565099, achieved: 0.000565099
   dataset 931, input: 0.000560235, achieved: 0.000560235
   dataset 932, input: 0.000555486, achieved: 0.000555486
   dataset 933, input: 0.000551197, achieved: 0.000551197
   dataset 934, input: 0.000529207, achieved: 0.000529207
   dataset 935, input: 0.000545555, achieved: 0.000545555
   dataset 936, input: 0.000978943, achieved: 0.000978943
   dataset 937, input: 0.000794875, achieved: 0.000794875
   dataset 938, input: 0.000782469, achieved: 0.000782469
   dataset 939, input: 0.000778411, achieved: 0.000778411
   dataset 940, input: 0.000757543, achieved: 0.000757543
   dataset 941, input: 0.000761227, achieved: 0.000761227
   dataset 942, input: 0.000739295, achieved: 0.000739295
   dataset 943, input: 0.000746865, achieved: 0.000746865
   dataset 944, input: 0.000745051, achieved: 0.000745051
   dataset 945, input: 0.000748419, achieved: 0.000748419
   dataset 946, input: 0.000720499, achieved: 0.000720499
   dataset 947, input: 0.000718974, achieved: 0.000718974
   dataset 948, input: 0.000719434, achieved: 0.000719434
   dataset 949, input: 0.000721679, achieved: 0.000721679
   dataset 950, input: 0.000742115, achieved: 0.000742115
   dataset 951, input: 0.000708871, achieved: 0.000708871
   dataset 952, input: 0.000731437, achieved: 0.000731437
   dataset 953, input: 0.000681009, achieved: 0.000681009
   dataset 954, input: 0.000701531, achieved: 0.000701531
   dataset 955, input: 0.000674878, achieved: 0.000674878
   dataset 956, input: 0.000678188, achieved: 0.000678188
   dataset 957, input: 0.000671366, achieved: 0.000671366
   dataset 958, input: 0.000651506, achieved: 0.000651506
   dataset 959, input: 0.000704956, achieved: 0.000704956
   dataset 960, input: 0.000504597, achieved: 0.000504597
   dataset 961, input: 0.000503158, achieved: 0.000503158
   dataset 962, input: 0.0004972, achieved: 0.0004972
   dataset 963, input: 0.000493343, achieved: 0.000493343
   dataset 964, input: 0.000821039, achieved: 0.000821039
   dataset 965, input: 0.000953787, achieved: 0.000953787
   dataset 966, input: 0.000485082, achieved: 0.000485082
   dataset 967, input: 0.000487557, achieved: 0.000487557
   dataset 968, input: 0.000795364, achieved: 0.000795364
   dataset 969, input: 0.000846627, achieved: 0.000846627
   dataset 970, input: 0.000772712, achieved: 0.000772712
   dataset 971, input: 0.000839518, achieved: 0.000839518
   dataset 972, input: 0.000831314, achieved: 0.000831314
   dataset 973, input: 0.000938531, achieved: 0.000938531
   dataset 974, input: 0.000954506, achieved: 0.000954506
   dataset 975, input: 0.000517204, achieved: 0.000517204
   dataset 976, input: 0.000959255, achieved: 0.000959255
   dataset 977, input: 0.000892853, achieved: 0.000892853
   dataset 978, input: 0.000813757, achieved: 0.000813757
   dataset 979, input: 0.000743324, achieved: 0.000743324
   dataset 980, input: 0.000970884, achieved: 0.000970884
   dataset 981, input: 0.000809439, achieved: 0.000809439
   dataset 982, input: 0.00086548, achieved: 0.00086548
   dataset 983, input: 0.000802675, achieved: 0.000802675
   dataset 984, input: 0.000792716, achieved: 0.000792716
   dataset 985, input: 0.000919391, achieved: 0.000919391
   dataset 986, input: 0.000747843, achieved: 0.000747843
   dataset 987, input: 0.000890636, achieved: 0.000890636
   dataset 988, input: 0.000862947, achieved: 0.000862947
   dataset 989, input: 0.000880936, achieved: 0.000880936
   dataset 990, input: 0.000513462, achieved: 0.000513462
   dataset 991, input: 0.000500021, achieved: 0.000500021
   dataset 992, input: 0.000514786, achieved: 0.000514786
   dataset 993, input: 0.000518355, achieved: 0.000518355
   dataset 994, input: 0.000511534, achieved: 0.000511534
   dataset 995, input: 0.000871409, achieved: 0.000871409
   dataset 996, input: 0.00083969, achieved: 0.00083969
   dataset 997, input: 0.000492508, achieved: 0.000492508
   dataset 998, input: 0.000926586, achieved: 0.000926586
   dataset 999, input: 0.000917635, achieved: 0.000917635
   dataset 1000, input: 0.000902409, achieved: 0.000902409
   dataset 1001, input: 0.000836639, achieved: 0.000836639
   dataset 1002, input: 0.000817326, achieved: 0.000817326
   dataset 1003, input: 0.00080492, achieved: 0.00080492
   dataset 1004, input: 0.000801869, achieved: 0.000801869
   dataset 1005, input: 0.000793062, achieved: 0.000793062
   dataset 1006, input: 0.000802992, achieved: 0.000802992
   dataset 1007, input: 0.00078506, achieved: 0.00078506
   dataset 1008, input: 0.000782987, achieved: 0.000782987
   dataset 1009, input: 0.000772165, achieved: 0.000772165
   dataset 1010, input: 0.000744763, achieved: 0.000744763
   dataset 1011, input: 0.000744763, achieved: 0.000744763
   dataset 1012, input: 0.000755442, achieved: 0.000755442
   dataset 1013, input: 0.000747728, achieved: 0.000747728
   dataset 1014, input: 0.000742576, achieved: 0.000742576
   dataset 1015, input: 0.000747152, achieved: 0.000747152
   dataset 1016, input: 0.000741252, achieved: 0.000741252
   dataset 1017, input: 0.000737021, achieved: 0.000737021
   dataset 1018, input: 0.00072522, achieved: 0.00072522
   dataset 1019, input: 0.000532718, achieved: 0.000532718
   dataset 1020, input: 0.000537784, achieved: 0.000537784
   dataset 1021, input: 0.000518355, achieved: 0.000518355
   dataset 1022, input: 0.000522414, achieved: 0.000522414
   dataset 1023, input: 0.000534071, achieved: 0.000534071
   dataset 1024, input: 0.000529869, achieved: 0.000529869
   dataset 1025, input: 0.000519622, achieved: 0.000519622
   dataset 1026, input: 0.00052037, achieved: 0.00052037
   dataset 1027, input: 0.000528286, achieved: 0.000528286
   dataset 1028, input: 0.000516312, achieved: 0.000516312
   dataset 1029, input: 0.000499992, achieved: 0.000499992
   dataset 1030, input: 0.000511476, achieved: 0.000511476
   dataset 1031, input: 0.00050169, achieved: 0.00050169
   dataset 1032, input: 0.00048396, achieved: 0.00048396
   dataset 1033, input: 0.000497056, achieved: 0.000497056
   dataset 1034, input: 0.000965242, achieved: 0.000965242
   dataset 1035, input: 0.00089363, achieved: 0.00089363
   dataset 1036, input: 0.000792284, achieved: 0.000792284
   dataset 1037, input: 0.000812145, achieved: 0.000812145
   dataset 1038, input: 0.000779793, achieved: 0.000779793
   dataset 1039, input: 0.000767387, achieved: 0.000767387
   dataset 1040, input: 0.000780771, achieved: 0.000780771
   dataset 1041, input: 0.000748678, achieved: 0.000748678
   dataset 1042, input: 0.000746433, achieved: 0.000746433
   dataset 1043, input: 0.000758896, achieved: 0.000758896
   dataset 1044, input: 0.000737568, achieved: 0.000737568
   dataset 1045, input: 0.000737251, achieved: 0.000737251
   dataset 1046, input: 0.000734344, achieved: 0.000734344
   dataset 1047, input: 0.000829789, achieved: 0.000829789
   dataset 1048, input: 0.000944058, achieved: 0.000944058
   dataset 1049, input: 0.00074036, achieved: 0.00074036
   dataset 1050, input: 0.000855406, achieved: 0.000855406
   dataset 1051, input: 0.000902437, achieved: 0.000902437
   dataset 1052, input: 0.000537496, achieved: 0.000537496
   dataset 1053, input: 0.000527739, achieved: 0.000527739
   dataset 1054, input: 0.000966595, achieved: 0.000966595
   dataset 1055, input: 0.000525983, achieved: 0.000525983
   dataset 1056, input: 0.000525637, achieved: 0.000525637
   dataset 1057, input: 0.000511908, achieved: 0.000511908
   dataset 1058, input: 0.000517406, achieved: 0.000517406
   dataset 1059, input: 0.000518758, achieved: 0.000518758
   dataset 1060, input: 0.00050287, achieved: 0.00050287
   dataset 1061, input: 0.000940633, achieved: 0.000940633
   dataset 1062, input: 0.000509202, achieved: 0.000509202
   dataset 1063, input: 0.00048986, achieved: 0.00048986
   dataset 1064, input: 0.000501604, achieved: 0.000501604
   dataset 1065, input: 0.00052037, achieved: 0.00052037
   dataset 1066, input: 0.000504827, achieved: 0.000504827
   dataset 1067, input: 0.000495127, achieved: 0.000495127
   dataset 1068, input: 0.000504338, achieved: 0.000504338
   dataset 1069, input: 0.000496279, achieved: 0.000496279
   dataset 1070, input: 0.000496912, achieved: 0.000496912
   dataset 1071, input: 0.000503762, achieved: 0.000503762
   dataset 1072, input: 0.000503906, achieved: 0.000503906
   dataset 1073, input: 0.000514182, achieved: 0.000514182
   dataset 1074, input: 0.00050713, achieved: 0.00050713
   dataset 1075, input: 0.000964465, achieved: 0.000964465
   dataset 1076, input: 0.000485284, achieved: 0.000485284
   dataset 1077, input: 0.000972352, achieved: 0.000972352
   dataset 1078, input: 0.000958967, achieved: 0.000958967
   dataset 1079, input: 0.00091444, achieved: 0.00091444
   dataset 1080, input: 0.000463178, achieved: 0.000463178
   dataset 1081, input: 0.000540634, achieved: 0.000540634
   dataset 1082, input: 0.000524745, achieved: 0.000524745
   dataset 1083, input: 0.000541986, achieved: 0.000541986
   dataset 1084, input: 0.000535798, achieved: 0.000535798
   dataset 1085, input: 0.000507619, achieved: 0.000507619
   dataset 1086, input: 0.000522068, achieved: 0.000522068
   dataset 1087, input: 0.000512368, achieved: 0.000512368
   dataset 1088, input: 0.000476447, achieved: 0.000476447
   dataset 1089, input: 0.000523105, achieved: 0.000523105
   dataset 1090, input: 0.000883786, achieved: 0.000883786
   dataset 1091, input: 0.00091775, achieved: 0.00091775
   dataset 1092, input: 0.000892248, achieved: 0.000892248
   dataset 1093, input: 0.000882491, achieved: 0.000882491
   dataset 1094, input: 0.000867696, achieved: 0.000867696
   dataset 1095, input: 0.000831804, achieved: 0.000831804
   dataset 1096, input: 0.000831343, achieved: 0.000831343
   dataset 1097, input: 0.000664315, achieved: 0.000664315
   dataset 1098, input: 0.00586928, achieved: 0.00586928
   dataset 1099, input: 0.00614952, achieved: 0.00614952
   dataset 1100, input: 0.00595218, achieved: 0.00595218
   dataset 1101, input: 0.00596332, achieved: 0.00596332
   dataset 1102, input: 0.00481165, achieved: 0.00481165
   dataset 1103, input: 0.00538103, achieved: 0.00538103
   dataset 1104, input: 0.00548126, achieved: 0.00548126
   dataset 1105, input: 0.00176982, achieved: 0.00176982
   dataset 1106, input: 0.00505262, achieved: 0.00505262
[2025-03-12 09:21:24][I][data/gpt_dataset:191:megatron.data.gpt_dataset] [BuildConcatDataset] Caught args.shuffle_sample_in_corpus=True across 34742575 samples
> building indices for blendable datasets ...
 > sample ratios:
   dataset 0, input: 0.00201113, achieved: 0.00201113
   dataset 1, input: 0.00203689, achieved: 0.00203689
   dataset 2, input: 0.00202407, achieved: 0.00202407
   dataset 3, input: 0.00200028, achieved: 0.00200028
   dataset 4, input: 0.00203932, achieved: 0.00203932
   dataset 5, input: 0.00203025, achieved: 0.00203025
   dataset 6, input: 0.00201148, achieved: 0.00201148
   dataset 7, input: 0.00206305, achieved: 0.00206305
   dataset 8, input: 0.00200524, achieved: 0.00200524
   dataset 9, input: 0.00212208, achieved: 0.00212208
   dataset 10, input: 0.00200114, achieved: 0.00200114
   dataset 11, input: 0.00201293, achieved: 0.00201293
   dataset 12, input: 0.00205122, achieved: 0.00205122
   dataset 13, input: 0.00199444, achieved: 0.00199444
   dataset 14, input: 0.00200744, achieved: 0.00200744
   dataset 15, input: 0.00205855, achieved: 0.00205855
   dataset 16, input: 0.00202632, achieved: 0.00202632
   dataset 17, input: 0.00204429, achieved: 0.00204429
   dataset 18, input: 0.00205555, achieved: 0.00205555
   dataset 19, input: 0.002009, achieved: 0.002009
   dataset 20, input: 0.00203938, achieved: 0.00203938
   dataset 21, input: 0.00203245, achieved: 0.00203245
   dataset 22, input: 0.00204174, achieved: 0.00204174
   dataset 23, input: 0.00198549, achieved: 0.00198549
   dataset 24, input: 0.00205, achieved: 0.00205
   dataset 25, input: 0.00205231, achieved: 0.00205231
   dataset 26, input: 0.00201841, achieved: 0.00201841
   dataset 27, input: 0.00201408, achieved: 0.00201408
   dataset 28, input: 0.00203528, achieved: 0.00203528
   dataset 29, input: 0.00196944, achieved: 0.00196944
   dataset 30, input: 0.00201015, achieved: 0.00201015
   dataset 31, input: 0.00197077, achieved: 0.00197077
   dataset 32, input: 0.0020228, achieved: 0.0020228
   dataset 33, input: 0.0020172, achieved: 0.0020172
   dataset 34, input: 0.00199785, achieved: 0.00199785
   dataset 35, input: 0.00199098, achieved: 0.00199098
   dataset 36, input: 0.00203597, achieved: 0.00203597
   dataset 37, input: 0.0019919, achieved: 0.0019919
   dataset 38, input: 0.00203204, achieved: 0.00203204
   dataset 39, input: 0.0019915, achieved: 0.0019915
   dataset 40, input: 0.00204105, achieved: 0.00204105
   dataset 41, input: 0.00202453, achieved: 0.00202453
   dataset 42, input: 0.00202748, achieved: 0.00202748
   dataset 43, input: 0.00203718, achieved: 0.00203718
   dataset 44, input: 0.0019945, achieved: 0.0019945
   dataset 45, input: 0.00200819, achieved: 0.00200819
   dataset 46, input: 0.00203655, achieved: 0.00203655
   dataset 47, input: 0.00201587, achieved: 0.00201587
   dataset 48, input: 0.00200611, achieved: 0.00200611
   dataset 49, input: 0.00201154, achieved: 0.00201154
   dataset 50, input: 0.00197637, achieved: 0.00197637
   dataset 51, input: 0.00199369, achieved: 0.00199369
   dataset 52, input: 0.00197683, achieved: 0.00197683
   dataset 53, input: 0.00201668, achieved: 0.00201668
   dataset 54, input: 0.00201859, achieved: 0.00201859
   dataset 55, input: 0.00200126, achieved: 0.00200126
   dataset 56, input: 0.00202829, achieved: 0.00202829
   dataset 57, input: 0.0020131, achieved: 0.0020131
   dataset 58, input: 0.00197354, achieved: 0.00197354
   dataset 59, input: 0.00199115, achieved: 0.00199115
   dataset 60, input: 0.00197452, achieved: 0.00197452
   dataset 61, input: 0.00202482, achieved: 0.00202482
   dataset 62, input: 0.00199185, achieved: 0.00199185
   dataset 63, input: 0.00201431, achieved: 0.00201431
   dataset 64, input: 0.00199537, achieved: 0.00199537
   dataset 65, input: 0.00199462, achieved: 0.00199462
   dataset 66, input: 0.0020336, achieved: 0.0020336
   dataset 67, input: 0.00202321, achieved: 0.00202321
   dataset 68, input: 0.00207333, achieved: 0.00207333
   dataset 69, input: 0.0020064, achieved: 0.0020064
   dataset 70, input: 0.00205116, achieved: 0.00205116
   dataset 71, input: 0.00199219, achieved: 0.00199219
   dataset 72, input: 0.00204671, achieved: 0.00204671
   dataset 73, input: 0.00198503, achieved: 0.00198503
   dataset 74, input: 0.0020161, achieved: 0.0020161
   dataset 75, input: 0.00198665, achieved: 0.00198665
   dataset 76, input: 0.00203672, achieved: 0.00203672
   dataset 77, input: 0.00198561, achieved: 0.00198561
   dataset 78, input: 0.00200259, achieved: 0.00200259
   dataset 79, input: 0.00203129, achieved: 0.00203129
   dataset 80, input: 0.00202147, achieved: 0.00202147
   dataset 81, input: 0.0019662, achieved: 0.0019662
   dataset 82, input: 0.00203938, achieved: 0.00203938
   dataset 83, input: 0.00199121, achieved: 0.00199121
   dataset 84, input: 0.00203435, achieved: 0.00203435
   dataset 85, input: 0.00199756, achieved: 0.00199756
   dataset 86, input: 0.0019945, achieved: 0.0019945
   dataset 87, input: 0.00203308, achieved: 0.00203308
   dataset 88, input: 0.00195823, achieved: 0.00195823
   dataset 89, input: 0.00200368, achieved: 0.00200368
   dataset 90, input: 0.00200646, achieved: 0.00200646
   dataset 91, input: 0.00199225, achieved: 0.00199225
   dataset 92, input: 0.00200646, achieved: 0.00200646
   dataset 93, input: 0.00199571, achieved: 0.00199571
   dataset 94, input: 0.00197874, achieved: 0.00197874
   dataset 95, input: 0.00200091, achieved: 0.00200091
   dataset 96, input: 0.00197648, achieved: 0.00197648
   dataset 97, input: 0.0020373, achieved: 0.0020373
   dataset 98, input: 0.00200963, achieved: 0.00200963
   dataset 99, input: 0.00200034, achieved: 0.00200034
   dataset 100, input: 0.0020142, achieved: 0.0020142
   dataset 101, input: 0.00206571, achieved: 0.00206571
   dataset 102, input: 0.00199722, achieved: 0.00199722
   dataset 103, input: 0.00203851, achieved: 0.00203851
   dataset 104, input: 0.00200207, achieved: 0.00200207
   dataset 105, input: 0.00201067, achieved: 0.00201067
   dataset 106, input: 0.00202679, achieved: 0.00202679
   dataset 107, input: 0.00199883, achieved: 0.00199883
   dataset 108, input: 0.00205433, achieved: 0.00205433
   dataset 109, input: 0.00197735, achieved: 0.00197735
   dataset 110, input: 0.00203966, achieved: 0.00203966
   dataset 111, input: 0.00201697, achieved: 0.00201697
   dataset 112, input: 0.0020068, achieved: 0.0020068
   dataset 113, input: 0.00204827, achieved: 0.00204827
   dataset 114, input: 0.00202569, achieved: 0.00202569
   dataset 115, input: 0.00202765, achieved: 0.00202765
   dataset 116, input: 0.00199577, achieved: 0.00199577
   dataset 117, input: 0.00204042, achieved: 0.00204042
   dataset 118, input: 0.0020075, achieved: 0.0020075
   dataset 119, input: 0.00206063, achieved: 0.00206063
   dataset 120, input: 0.00200507, achieved: 0.00200507
   dataset 121, input: 0.00204365, achieved: 0.00204365
   dataset 122, input: 0.00199745, achieved: 0.00199745
   dataset 123, input: 0.00204937, achieved: 0.00204937
   dataset 124, input: 0.00203834, achieved: 0.00203834
   dataset 125, input: 0.00203943, achieved: 0.00203943
   dataset 126, input: 0.00201968, achieved: 0.00201968
   dataset 127, input: 0.00203094, achieved: 0.00203094
   dataset 128, input: 0.00198266, achieved: 0.00198266
   dataset 129, input: 0.00201217, achieved: 0.00201217
   dataset 130, input: 0.00200085, achieved: 0.00200085
   dataset 131, input: 0.0019628, achieved: 0.0019628
   dataset 132, input: 0.00201616, achieved: 0.00201616
   dataset 133, input: 0.0020094, achieved: 0.0020094
   dataset 134, input: 0.00202944, achieved: 0.00202944
   dataset 135, input: 0.00205041, achieved: 0.00205041
   dataset 136, input: 0.00198584, achieved: 0.00198584
   dataset 137, input: 0.00199011, achieved: 0.00199011
   dataset 138, input: 0.00199976, achieved: 0.00199976
   dataset 139, input: 0.00198555, achieved: 0.00198555
   dataset 140, input: 0.00202788, achieved: 0.00202788
   dataset 141, input: 0.0019554, achieved: 0.0019554
   dataset 142, input: 0.00203123, achieved: 0.00203123
   dataset 143, input: 0.00197088, achieved: 0.00197088
   dataset 144, input: 0.00200415, achieved: 0.00200415
   dataset 145, input: 0.0020202, achieved: 0.0020202
   dataset 146, input: 0.00205041, achieved: 0.00205041
   dataset 147, input: 0.00200328, achieved: 0.00200328
   dataset 148, input: 0.00201246, achieved: 0.00201246
   dataset 149, input: 0.00200449, achieved: 0.00200449
   dataset 150, input: 0.00200507, achieved: 0.00200507
   dataset 151, input: 0.0019811, achieved: 0.0019811
   dataset 152, input: 0.0019844, achieved: 0.0019844
   dataset 153, input: 0.00201766, achieved: 0.00201766
   dataset 154, input: 0.00199901, achieved: 0.00199901
   dataset 155, input: 0.00199982, achieved: 0.00199982
   dataset 156, input: 0.00202107, achieved: 0.00202107
   dataset 157, input: 0.00203412, achieved: 0.00203412
   dataset 158, input: 0.00201056, achieved: 0.00201056
   dataset 159, input: 0.00196759, achieved: 0.00196759
   dataset 160, input: 0.0019766, achieved: 0.0019766
   dataset 161, input: 0.00198405, achieved: 0.00198405
   dataset 162, input: 0.00199075, achieved: 0.00199075
   dataset 163, input: 0.0020068, achieved: 0.0020068
   dataset 164, input: 0.00203227, achieved: 0.00203227
   dataset 165, input: 0.00200363, achieved: 0.00200363
   dataset 166, input: 0.00201749, achieved: 0.00201749
   dataset 167, input: 0.00197556, achieved: 0.00197556
   dataset 168, input: 0.00200293, achieved: 0.00200293
   dataset 169, input: 0.00201974, achieved: 0.00201974
   dataset 170, input: 0.00197591, achieved: 0.00197591
   dataset 171, input: 0.00203597, achieved: 0.00203597
   dataset 172, input: 0.00197429, achieved: 0.00197429
   dataset 173, input: 0.00201685, achieved: 0.00201685
   dataset 174, input: 0.00197954, achieved: 0.00197954
   dataset 175, input: 0.0019833, achieved: 0.0019833
   dataset 176, input: 0.00200981, achieved: 0.00200981
   dataset 177, input: 0.00196251, achieved: 0.00196251
   dataset 178, input: 0.00204463, achieved: 0.00204463
   dataset 179, input: 0.00201333, achieved: 0.00201333
   dataset 180, input: 0.00199941, achieved: 0.00199941
   dataset 181, input: 0.00201079, achieved: 0.00201079
   dataset 182, input: 0.00198191, achieved: 0.00198191
   dataset 183, input: 0.00200513, achieved: 0.00200513
   dataset 184, input: 0.0019781, achieved: 0.0019781
   dataset 185, input: 0.00198717, achieved: 0.00198717
   dataset 186, input: 0.00198821, achieved: 0.00198821
   dataset 187, input: 0.00201968, achieved: 0.00201968
   dataset 188, input: 0.00201974, achieved: 0.00201974
   dataset 189, input: 0.00197839, achieved: 0.00197839
   dataset 190, input: 0.00202713, achieved: 0.00202713
   dataset 191, input: 0.00198607, achieved: 0.00198607
   dataset 192, input: 0.00200322, achieved: 0.00200322
   dataset 193, input: 0.00195696, achieved: 0.00195696
   dataset 194, input: 0.00201391, achieved: 0.00201391
   dataset 195, input: 0.00197175, achieved: 0.00197175
   dataset 196, input: 0.00197989, achieved: 0.00197989
   dataset 197, input: 0.0019833, achieved: 0.0019833
   dataset 198, input: 0.00193865, achieved: 0.00193865
   dataset 199, input: 0.00200074, achieved: 0.00200074
   dataset 200, input: 0.00196372, achieved: 0.00196372
   dataset 201, input: 0.00199057, achieved: 0.00199057
   dataset 202, input: 0.00197423, achieved: 0.00197423
   dataset 203, input: 0.00198087, achieved: 0.00198087
   dataset 204, input: 0.00196066, achieved: 0.00196066
   dataset 205, input: 0.00200831, achieved: 0.00200831
   dataset 206, input: 0.00197001, achieved: 0.00197001
   dataset 207, input: 0.00203995, achieved: 0.00203995
   dataset 208, input: 0.00198584, achieved: 0.00198584
   dataset 209, input: 0.00203608, achieved: 0.00203608
   dataset 210, input: 0.00202748, achieved: 0.00202748
   dataset 211, input: 0.00199514, achieved: 0.00199514
   dataset 212, input: 0.00201206, achieved: 0.00201206
   dataset 213, input: 0.00202257, achieved: 0.00202257
   dataset 214, input: 0.00199109, achieved: 0.00199109
   dataset 215, input: 0.00203568, achieved: 0.00203568
   dataset 216, input: 0.00197059, achieved: 0.00197059
   dataset 217, input: 0.00199774, achieved: 0.00199774
   dataset 218, input: 0.00200068, achieved: 0.00200068
   dataset 219, input: 0.00199421, achieved: 0.00199421
   dataset 220, input: 0.00201737, achieved: 0.00201737
   dataset 221, input: 0.0019796, achieved: 0.0019796
   dataset 222, input: 0.00196014, achieved: 0.00196014
   dataset 223, input: 0.00201847, achieved: 0.00201847
   dataset 224, input: 0.00200074, achieved: 0.00200074
   dataset 225, input: 0.00199779, achieved: 0.00199779
   dataset 226, input: 0.00194928, achieved: 0.00194928
   dataset 227, input: 0.00203961, achieved: 0.00203961
   dataset 228, input: 0.0019535, achieved: 0.0019535
   dataset 229, input: 0.00201396, achieved: 0.00201396
   dataset 230, input: 0.00197573, achieved: 0.00197573
   dataset 231, input: 0.00198012, achieved: 0.00198012
   dataset 232, input: 0.00202962, achieved: 0.00202962
   dataset 233, input: 0.00198278, achieved: 0.00198278
   dataset 234, input: 0.00202783, achieved: 0.00202783
   dataset 235, input: 0.00201818, achieved: 0.00201818
   dataset 236, input: 0.00198896, achieved: 0.00198896
   dataset 237, input: 0.0020254, achieved: 0.0020254
   dataset 238, input: 0.00201674, achieved: 0.00201674
   dataset 239, input: 0.00198353, achieved: 0.00198353
   dataset 240, input: 0.00204486, achieved: 0.00204486
   dataset 241, input: 0.0019569, achieved: 0.0019569
   dataset 242, input: 0.00203591, achieved: 0.00203591
   dataset 243, input: 0.00199849, achieved: 0.00199849
   dataset 244, input: 0.00200536, achieved: 0.00200536
   dataset 245, input: 0.00198659, achieved: 0.00198659
   dataset 246, input: 0.00198815, achieved: 0.00198815
   dataset 247, input: 0.00198538, achieved: 0.00198538
   dataset 248, input: 0.00200877, achieved: 0.00200877
   dataset 249, input: 0.00199225, achieved: 0.00199225
   dataset 250, input: 0.0020094, achieved: 0.0020094
   dataset 251, input: 0.00194813, achieved: 0.00194813
   dataset 252, input: 0.00199185, achieved: 0.00199185
   dataset 253, input: 0.0019893, achieved: 0.0019893
   dataset 254, input: 0.00194934, achieved: 0.00194934
   dataset 255, input: 0.0019844, achieved: 0.0019844
   dataset 256, input: 0.00193167, achieved: 0.00193167
   dataset 257, input: 0.00203037, achieved: 0.00203037
   dataset 258, input: 0.00196441, achieved: 0.00196441
   dataset 259, input: 0.00196129, achieved: 0.00196129
   dataset 260, input: 0.0019509, achieved: 0.0019509
   dataset 261, input: 0.0019952, achieved: 0.0019952
   dataset 262, input: 0.00194634, achieved: 0.00194634
   dataset 263, input: 0.00200946, achieved: 0.00200946
   dataset 264, input: 0.00198647, achieved: 0.00198647
   dataset 265, input: 0.00197596, achieved: 0.00197596
   dataset 266, input: 0.00200409, achieved: 0.00200409
   dataset 267, input: 0.00196799, achieved: 0.00196799
   dataset 268, input: 0.00201962, achieved: 0.00201962
   dataset 269, input: 0.00197706, achieved: 0.00197706
   dataset 270, input: 0.00196759, achieved: 0.00196759
   dataset 271, input: 0.00200137, achieved: 0.00200137
   dataset 272, input: 0.00199098, achieved: 0.00199098
   dataset 273, input: 0.00199364, achieved: 0.00199364
   dataset 274, input: 0.00199716, achieved: 0.00199716
   dataset 275, input: 0.00199779, achieved: 0.00199779
   dataset 276, input: 0.00199866, achieved: 0.00199866
   dataset 277, input: 0.00198763, achieved: 0.00198763
   dataset 278, input: 0.00200161, achieved: 0.00200161
   dataset 279, input: 0.00198122, achieved: 0.00198122
   dataset 280, input: 0.00200744, achieved: 0.00200744
   dataset 281, input: 0.00200767, achieved: 0.00200767
   dataset 282, input: 0.00200034, achieved: 0.00200034
   dataset 283, input: 0.00200796, achieved: 0.00200796
   dataset 284, input: 0.00198041, achieved: 0.00198041
   dataset 285, input: 0.00199652, achieved: 0.00199652
   dataset 286, input: 0.00198474, achieved: 0.00198474
   dataset 287, input: 0.00198243, achieved: 0.00198243
   dataset 288, input: 0.00197556, achieved: 0.00197556
   dataset 289, input: 0.00198388, achieved: 0.00198388
   dataset 290, input: 0.00202367, achieved: 0.00202367
   dataset 291, input: 0.00195939, achieved: 0.00195939
   dataset 292, input: 0.00201391, achieved: 0.00201391
   dataset 293, input: 0.00198145, achieved: 0.00198145
   dataset 294, input: 0.0019822, achieved: 0.0019822
   dataset 295, input: 0.0019662, achieved: 0.0019662
   dataset 296, input: 0.00198549, achieved: 0.00198549
   dataset 297, input: 0.00201581, achieved: 0.00201581
   dataset 298, input: 0.00199248, achieved: 0.00199248
   dataset 299, input: 0.00201974, achieved: 0.00201974
   dataset 300, input: 0.00198624, achieved: 0.00198624
   dataset 301, input: 0.00197342, achieved: 0.00197342
   dataset 302, input: 0.00205457, achieved: 0.00205457
   dataset 303, input: 0.00199681, achieved: 0.00199681
   dataset 304, input: 0.00200022, achieved: 0.00200022
   dataset 305, input: 0.00198209, achieved: 0.00198209
   dataset 306, input: 0.00199427, achieved: 0.00199427
   dataset 307, input: 0.00199802, achieved: 0.00199802
   dataset 308, input: 0.00198826, achieved: 0.00198826
   dataset 309, input: 0.00205139, achieved: 0.00205139
   dataset 310, input: 0.00198977, achieved: 0.00198977
   dataset 311, input: 0.00199751, achieved: 0.00199751
   dataset 312, input: 0.00200085, achieved: 0.00200085
   dataset 313, input: 0.00197186, achieved: 0.00197186
   dataset 314, input: 0.00198572, achieved: 0.00198572
   dataset 315, input: 0.00199208, achieved: 0.00199208
   dataset 316, input: 0.00197296, achieved: 0.00197296
   dataset 317, input: 0.00202315, achieved: 0.00202315
   dataset 318, input: 0.00197723, achieved: 0.00197723
   dataset 319, input: 0.00199537, achieved: 0.00199537
   dataset 320, input: 0.00197914, achieved: 0.00197914
   dataset 321, input: 0.00199964, achieved: 0.00199964
   dataset 322, input: 0.00199808, achieved: 0.00199808
   dataset 323, input: 0.00198671, achieved: 0.00198671
   dataset 324, input: 0.00196921, achieved: 0.00196921
   dataset 325, input: 0.00198214, achieved: 0.00198214
   dataset 326, input: 0.00198041, achieved: 0.00198041
   dataset 327, input: 0.00197723, achieved: 0.00197723
   dataset 328, input: 0.00199837, achieved: 0.00199837
   dataset 329, input: 0.0019785, achieved: 0.0019785
   dataset 330, input: 0.00201437, achieved: 0.00201437
   dataset 331, input: 0.00197591, achieved: 0.00197591
   dataset 332, input: 0.00198676, achieved: 0.00198676
   dataset 333, input: 0.00200253, achieved: 0.00200253
   dataset 334, input: 0.00198578, achieved: 0.00198578
   dataset 335, input: 0.00203129, achieved: 0.00203129
   dataset 336, input: 0.00198226, achieved: 0.00198226
   dataset 337, input: 0.00202887, achieved: 0.00202887
   dataset 338, input: 0.00199369, achieved: 0.00199369
   dataset 339, input: 0.00204613, achieved: 0.00204613
   dataset 340, input: 0.00198549, achieved: 0.00198549
   dataset 341, input: 0.00202938, achieved: 0.00202938
   dataset 342, input: 0.00202245, achieved: 0.00202245
   dataset 343, input: 0.00204787, achieved: 0.00204787
   dataset 344, input: 0.00201991, achieved: 0.00201991
   dataset 345, input: 0.00201564, achieved: 0.00201564
   dataset 346, input: 0.00198902, achieved: 0.00198902
   dataset 347, input: 0.00203753, achieved: 0.00203753
   dataset 348, input: 0.00201962, achieved: 0.00201962
   dataset 349, input: 0.00205318, achieved: 0.00205318
   dataset 350, input: 0.00200715, achieved: 0.00200715
   dataset 351, input: 0.00203764, achieved: 0.00203764
   dataset 352, input: 0.00202136, achieved: 0.00202136
   dataset 353, input: 0.00201916, achieved: 0.00201916
   dataset 354, input: 0.00202361, achieved: 0.00202361
   dataset 355, input: 0.00199335, achieved: 0.00199335
   dataset 356, input: 0.00198665, achieved: 0.00198665
   dataset 357, input: 0.00201882, achieved: 0.00201882
   dataset 358, input: 0.00201062, achieved: 0.00201062
   dataset 359, input: 0.00193952, achieved: 0.00193952
   dataset 360, input: 0.00201229, achieved: 0.00201229
   dataset 361, input: 0.00197718, achieved: 0.00197718
   dataset 362, input: 0.0019915, achieved: 0.0019915
   dataset 363, input: 0.00195477, achieved: 0.00195477
   dataset 364, input: 0.00196181, achieved: 0.00196181
   dataset 365, input: 0.00197723, achieved: 0.00197723
   dataset 366, input: 0.00195338, achieved: 0.00195338
   dataset 367, input: 0.0019926, achieved: 0.0019926
   dataset 368, input: 0.00202141, achieved: 0.00202141
   dataset 369, input: 0.00201304, achieved: 0.00201304
   dataset 370, input: 0.00197198, achieved: 0.00197198
   dataset 371, input: 0.00196274, achieved: 0.00196274
   dataset 372, input: 0.00199953, achieved: 0.00199953
   dataset 373, input: 0.00197394, achieved: 0.00197394
   dataset 374, input: 0.00197394, achieved: 0.00197394
   dataset 375, input: 0.00199317, achieved: 0.00199317
   dataset 376, input: 0.00197105, achieved: 0.00197105
   dataset 377, input: 0.00195361, achieved: 0.00195361
   dataset 378, input: 0.00197163, achieved: 0.00197163
   dataset 379, input: 0.00199369, achieved: 0.00199369
   dataset 380, input: 0.00195552, achieved: 0.00195552
   dataset 381, input: 0.00197492, achieved: 0.00197492
   dataset 382, input: 0.0019647, achieved: 0.0019647
   dataset 383, input: 0.00197625, achieved: 0.00197625
   dataset 384, input: 0.00197163, achieved: 0.00197163
   dataset 385, input: 0.00198162, achieved: 0.00198162
   dataset 386, input: 0.0019781, achieved: 0.0019781
   dataset 387, input: 0.00197793, achieved: 0.00197793
   dataset 388, input: 0.00196932, achieved: 0.00196932
   dataset 389, input: 0.00196274, achieved: 0.00196274
   dataset 390, input: 0.00207992, achieved: 0.00207992
   dataset 391, input: 0.00196643, achieved: 0.00196643
   dataset 392, input: 0.0019997, achieved: 0.0019997
   dataset 393, input: 0.00196603, achieved: 0.00196603
   dataset 394, input: 0.001958, achieved: 0.001958
   dataset 395, input: 0.0020075, achieved: 0.0020075
   dataset 396, input: 0.00195973, achieved: 0.00195973
   dataset 397, input: 0.00199791, achieved: 0.00199791
   dataset 398, input: 0.00195096, achieved: 0.00195096
   dataset 399, input: 0.00199473, achieved: 0.00199473
   dataset 400, input: 0.0019885, achieved: 0.0019885
   dataset 401, input: 0.0020001, achieved: 0.0020001
   dataset 402, input: 0.00194738, achieved: 0.00194738
   dataset 403, input: 0.00197204, achieved: 0.00197204
   dataset 404, input: 0.00194801, achieved: 0.00194801
   dataset 405, input: 0.00197879, achieved: 0.00197879
   dataset 406, input: 0.00194882, achieved: 0.00194882
   dataset 407, input: 0.00197683, achieved: 0.00197683
   dataset 408, input: 0.00195962, achieved: 0.00195962
   dataset 409, input: 0.00196378, achieved: 0.00196378
   dataset 410, input: 0.0019844, achieved: 0.0019844
   dataset 411, input: 0.00196262, achieved: 0.00196262
   dataset 412, input: 0.00205226, achieved: 0.00205226
   dataset 413, input: 0.00197666, achieved: 0.00197666
   dataset 414, input: 0.00194824, achieved: 0.00194824
   dataset 415, input: 0.00201339, achieved: 0.00201339
   dataset 416, input: 0.00198399, achieved: 0.00198399
   dataset 417, input: 0.0019755, achieved: 0.0019755
   dataset 418, input: 0.00198855, achieved: 0.00198855
   dataset 419, input: 0.00200161, achieved: 0.00200161
   dataset 420, input: 0.00198087, achieved: 0.00198087
   dataset 421, input: 0.00196932, achieved: 0.00196932
   dataset 422, input: 0.00200461, achieved: 0.00200461
   dataset 423, input: 0.00201812, achieved: 0.00201812
   dataset 424, input: 0.00198035, achieved: 0.00198035
   dataset 425, input: 0.00200005, achieved: 0.00200005
   dataset 426, input: 0.00199843, achieved: 0.00199843
   dataset 427, input: 0.00197244, achieved: 0.00197244
   dataset 428, input: 0.00200992, achieved: 0.00200992
   dataset 429, input: 0.00196072, achieved: 0.00196072
   dataset 430, input: 0.00199288, achieved: 0.00199288
   dataset 431, input: 0.00196903, achieved: 0.00196903
   dataset 432, input: 0.00200045, achieved: 0.00200045
   dataset 433, input: 0.00196049, achieved: 0.00196049
   dataset 434, input: 0.00202401, achieved: 0.00202401
   dataset 435, input: 0.00198936, achieved: 0.00198936
   dataset 436, input: 0.00199779, achieved: 0.00199779
   dataset 437, input: 0.00196551, achieved: 0.00196551
   dataset 438, input: 0.00197937, achieved: 0.00197937
   dataset 439, input: 0.00198925, achieved: 0.00198925
   dataset 440, input: 0.00197504, achieved: 0.00197504
   dataset 441, input: 0.00197567, achieved: 0.00197567
   dataset 442, input: 0.0019837, achieved: 0.0019837
   dataset 443, input: 0.00197989, achieved: 0.00197989
   dataset 444, input: 0.0020008, achieved: 0.0020008
   dataset 445, input: 0.00202141, achieved: 0.00202141
   dataset 446, input: 0.00206866, achieved: 0.00206866
   dataset 447, input: 0.00201974, achieved: 0.00201974
   dataset 448, input: 0.00201876, achieved: 0.00201876
   dataset 449, input: 0.00200207, achieved: 0.00200207
   dataset 450, input: 0.00200432, achieved: 0.00200432
   dataset 451, input: 0.00200906, achieved: 0.00200906
   dataset 452, input: 0.0019952, achieved: 0.0019952
   dataset 453, input: 0.00202661, achieved: 0.00202661
   dataset 454, input: 0.0019926, achieved: 0.0019926
   dataset 455, input: 0.00197648, achieved: 0.00197648
   dataset 456, input: 0.00195361, achieved: 0.00195361
   dataset 457, input: 0.00198925, achieved: 0.00198925
   dataset 458, input: 0.00200293, achieved: 0.00200293
   dataset 459, input: 0.00197839, achieved: 0.00197839
   dataset 460, input: 0.00199554, achieved: 0.00199554
   dataset 461, input: 0.00201212, achieved: 0.00201212
   dataset 462, input: 0.00198803, achieved: 0.00198803
   dataset 463, input: 0.00200802, achieved: 0.00200802
   dataset 464, input: 0.00199762, achieved: 0.00199762
   dataset 465, input: 0.00200952, achieved: 0.00200952
   dataset 466, input: 0.00198133, achieved: 0.00198133
   dataset 467, input: 0.00200825, achieved: 0.00200825
   dataset 468, input: 0.00199999, achieved: 0.00199999
   dataset 469, input: 0.002009, achieved: 0.002009
   dataset 470, input: 0.0019822, achieved: 0.0019822
   dataset 471, input: 0.00201402, achieved: 0.00201402
   dataset 472, input: 0.00198318, achieved: 0.00198318
   dataset 473, input: 0.001996, achieved: 0.001996
   dataset 474, input: 0.00199924, achieved: 0.00199924
   dataset 475, input: 0.00197862, achieved: 0.00197862
   dataset 476, input: 0.00203123, achieved: 0.00203123
   dataset 477, input: 0.00195044, achieved: 0.00195044
   dataset 478, input: 0.00197521, achieved: 0.00197521
   dataset 479, input: 0.00201483, achieved: 0.00201483
   dataset 480, input: 0.00200155, achieved: 0.00200155
   dataset 481, input: 0.00198411, achieved: 0.00198411
   dataset 482, input: 0.00198676, achieved: 0.00198676
   dataset 483, input: 0.00199473, achieved: 0.00199473
   dataset 484, input: 0.00199133, achieved: 0.00199133
   dataset 485, input: 0.00205589, achieved: 0.00205589
   dataset 486, input: 0.00198399, achieved: 0.00198399
   dataset 487, input: 0.00205098, achieved: 0.00205098
   dataset 488, input: 0.00200397, achieved: 0.00200397
   dataset 489, input: 0.0019766, achieved: 0.0019766
   dataset 490, input: 0.00205122, achieved: 0.00205122
   dataset 491, input: 0.00198619, achieved: 0.00198619
   dataset 492, input: 0.00198751, achieved: 0.00198751
   dataset 493, input: 0.00198307, achieved: 0.00198307
   dataset 494, input: 0.00201489, achieved: 0.00201489
   dataset 495, input: 0.00198913, achieved: 0.00198913
   dataset 496, input: 0.00198584, achieved: 0.00198584
   dataset 497, input: 0.00200657, achieved: 0.00200657
   dataset 498, input: 0.002012, achieved: 0.002012
   dataset 499, input: 0.00204446, achieved: 0.00204446
[2025-03-12 09:21:37][I][data/gpt_dataset:191:megatron.data.gpt_dataset] [BuildConcatDataset] Caught args.shuffle_sample_in_corpus=True across 17315099 samples
> building indices for blendable datasets ...
 > sample ratios:
   dataset 0, input: 0.000966334, achieved: 0.000966334
   dataset 1, input: 0.00373151, achieved: 0.00373151
   dataset 2, input: 0.00086052, achieved: 0.00086052
   dataset 3, input: 0.00369728, achieved: 0.00369728
   dataset 4, input: 0.00355256, achieved: 0.00355256
   dataset 5, input: 0.00366771, achieved: 0.00366771
   dataset 6, input: 0.000820061, achieved: 0.000820061
   dataset 7, input: 0.00207894, achieved: 0.00207894
   dataset 8, input: 0.00199802, achieved: 0.00199802
   dataset 9, input: 0.00204471, achieved: 0.00204471
   dataset 10, input: 0.00190933, achieved: 0.00190933
   dataset 11, input: 0.0021443, achieved: 0.0021443
   dataset 12, input: 0.00208361, achieved: 0.00208361
   dataset 13, input: 0.00187042, achieved: 0.00187042
   dataset 14, input: 0.00189377, achieved: 0.00189377
   dataset 15, input: 0.00198246, achieved: 0.00198246
   dataset 16, input: 0.00213029, achieved: 0.00213029
   dataset 17, input: 0.00200892, achieved: 0.00200892
   dataset 18, input: 0.0017864, achieved: 0.0017864
   dataset 19, input: 0.00208983, achieved: 0.00208983
   dataset 20, input: 0.00213029, achieved: 0.00213029
   dataset 21, input: 0.00217231, achieved: 0.00217231
   dataset 22, input: 0.0020167, achieved: 0.0020167
   dataset 23, input: 0.00174594, achieved: 0.00174594
   dataset 24, input: 0.00197001, achieved: 0.00197001
   dataset 25, input: 0.00208516, achieved: 0.00208516
   dataset 26, input: 0.00189532, achieved: 0.00189532
   dataset 27, input: 0.00176617, achieved: 0.00176617
   dataset 28, input: 0.0018782, achieved: 0.0018782
   dataset 29, input: 0.00185486, achieved: 0.00185486
   dataset 30, input: 0.00203226, achieved: 0.00203226
   dataset 31, input: 0.00190466, achieved: 0.00190466
   dataset 32, input: 0.00224077, achieved: 0.00224077
   dataset 33, input: 0.0018144, achieved: 0.0018144
   dataset 34, input: 0.00197624, achieved: 0.00197624
   dataset 35, input: 0.00211784, achieved: 0.00211784
   dataset 36, input: 0.00201358, achieved: 0.00201358
   dataset 37, input: 0.00173193, achieved: 0.00173193
   dataset 38, input: 0.00197001, achieved: 0.00197001
   dataset 39, input: 0.00205871, achieved: 0.00205871
   dataset 40, input: 0.00186731, achieved: 0.00186731
   dataset 41, input: 0.00192956, achieved: 0.00192956
   dataset 42, input: 0.00195912, achieved: 0.00195912
   dataset 43, input: 0.0020805, achieved: 0.0020805
   dataset 44, input: 0.00189688, achieved: 0.00189688
   dataset 45, input: 0.00201203, achieved: 0.00201203
   dataset 46, input: 0.00197935, achieved: 0.00197935
   dataset 47, input: 0.00189843, achieved: 0.00189843
   dataset 48, input: 0.00189688, achieved: 0.00189688
   dataset 49, input: 0.00185331, achieved: 0.00185331
   dataset 50, input: 0.00199958, achieved: 0.00199958
   dataset 51, input: 0.001942, achieved: 0.001942
   dataset 52, input: 0.00851494, achieved: 0.00851494
   dataset 53, input: 0.0088246, achieved: 0.0088246
   dataset 54, input: 0.00817105, achieved: 0.00817105
   dataset 55, input: 0.008669, achieved: 0.008669
   dataset 56, input: 0.00837489, achieved: 0.00837489
   dataset 57, input: 0.0077369, achieved: 0.0077369
   dataset 58, input: 0.00822395, achieved: 0.00822395
   dataset 59, input: 0.00744124, achieved: 0.00744124
   dataset 60, input: 0.00692306, achieved: 0.00692306
   dataset 61, input: 0.00874524, achieved: 0.00874524
   dataset 62, input: 0.00748636, achieved: 0.00748636
   dataset 63, input: 0.00874058, achieved: 0.00874058
   dataset 64, input: 0.00939258, achieved: 0.00939258
   dataset 65, input: 0.00797031, achieved: 0.00797031
   dataset 66, input: 0.00762174, achieved: 0.00762174
   dataset 67, input: 0.00895999, achieved: 0.00895999
   dataset 68, input: 0.00764353, achieved: 0.00764353
   dataset 69, input: 0.00732297, achieved: 0.00732297
   dataset 70, input: 0.00830176, achieved: 0.00830176
   dataset 71, input: 0.00769177, achieved: 0.00769177
   dataset 72, input: 0.00858808, achieved: 0.00858808
   dataset 73, input: 0.00768088, achieved: 0.00768088
   dataset 74, input: 0.00854762, achieved: 0.00854762
   dataset 75, input: 0.0092074, achieved: 0.0092074
   dataset 76, input: 0.00845581, achieved: 0.00845581
   dataset 77, input: 0.00914049, achieved: 0.00914049
   dataset 78, input: 0.00782404, achieved: 0.00782404
   dataset 79, input: 0.0084278, achieved: 0.0084278
   dataset 80, input: 0.00789562, achieved: 0.00789562
   dataset 81, input: 0.00868144, achieved: 0.00868144
   dataset 82, input: 0.00796253, achieved: 0.00796253
   dataset 83, input: 0.00741634, achieved: 0.00741634
   dataset 84, input: 0.00838423, achieved: 0.00838423
   dataset 85, input: 0.00816482, achieved: 0.00816482
   dataset 86, input: 0.00740856, achieved: 0.00740856
   dataset 87, input: 0.0078396, achieved: 0.0078396
   dataset 88, input: 0.00873591, achieved: 0.00873591
   dataset 89, input: 0.00877481, achieved: 0.00877481
   dataset 90, input: 0.00749415, achieved: 0.00749415
   dataset 91, input: 0.0082224, achieved: 0.0082224
   dataset 92, input: 0.00737588, achieved: 0.00737588
   dataset 93, input: 0.00892264, achieved: 0.00892264
   dataset 94, input: 0.00788006, achieved: 0.00788006
   dataset 95, input: 0.00921207, achieved: 0.00921207
   dataset 96, input: 0.00833288, achieved: 0.00833288
   dataset 97, input: 0.0081586, achieved: 0.0081586
   dataset 98, input: 0.00785205, achieved: 0.00785205
   dataset 99, input: 0.00876081, achieved: 0.00876081
   dataset 100, input: 0.00836089, achieved: 0.00836089
   dataset 101, input: 0.00835622, achieved: 0.00835622
   dataset 102, input: 0.00753149, achieved: 0.00753149
   dataset 103, input: 0.00779603, achieved: 0.00779603
   dataset 104, input: 0.00841691, achieved: 0.00841691
   dataset 105, input: 0.00863321, achieved: 0.00863321
   dataset 106, input: 0.00874836, achieved: 0.00874836
   dataset 107, input: 0.00861142, achieved: 0.00861142
   dataset 108, input: 0.00857252, achieved: 0.00857252
   dataset 109, input: 0.00764197, achieved: 0.00764197
   dataset 110, input: 0.00854762, achieved: 0.00854762
   dataset 111, input: 0.00844336, achieved: 0.00844336
   dataset 112, input: 0.00797809, achieved: 0.00797809
   dataset 113, input: 0.00808702, achieved: 0.00808702
   dataset 114, input: 0.00336894, achieved: 0.00336894
   dataset 115, input: 0.00219565, achieved: 0.00219565
   dataset 116, input: 0.00542299, achieved: 0.00542299
   dataset 117, input: 0.00464961, achieved: 0.00464961
   dataset 118, input: 0.00344363, achieved: 0.00344363
   dataset 119, input: 0.00410653, achieved: 0.00410653
   dataset 120, input: 0.00387156, achieved: 0.00387156
   dataset 121, input: 0.00405674, achieved: 0.00405674
   dataset 122, input: 0.00389179, achieved: 0.00389179
   dataset 123, input: 0.00385756, achieved: 0.00385756
   dataset 124, input: 0.00394625, achieved: 0.00394625
   dataset 125, input: 0.00367082, achieved: 0.00367082
   dataset 126, input: 0.00365526, achieved: 0.00365526
   dataset 127, input: 0.00382488, achieved: 0.00382488
   dataset 128, input: 0.00407852, achieved: 0.00407852
   dataset 129, input: 0.00326935, achieved: 0.00326935
   dataset 130, input: 0.0039976, achieved: 0.0039976
   dataset 131, input: 0.0023497, achieved: 0.0023497
   dataset 132, input: 0.0019669, achieved: 0.0019669
   dataset 133, input: 0.00344519, achieved: 0.00344519
   dataset 134, input: 0.00379376, achieved: 0.00379376
   dataset 135, input: 0.00142227, achieved: 0.00142227
   dataset 136, input: 0.00333004, achieved: 0.00333004
   dataset 137, input: 0.00301726, achieved: 0.00301726
   dataset 138, input: 0.00423724, achieved: 0.00423724
   dataset 139, input: 0.00424813, achieved: 0.00424813
   dataset 140, input: 0.00635353, achieved: 0.00635353
   dataset 141, input: 0.00418745, achieved: 0.00418745
   dataset 142, input: 0.00323512, achieved: 0.00323512
   dataset 143, input: 0.00261735, achieved: 0.00261735
   dataset 144, input: 0.00267181, achieved: 0.00267181
   dataset 145, input: 0.00267337, achieved: 0.00267337
   dataset 146, input: 0.0025411, achieved: 0.0025411
   dataset 147, input: 0.00254732, achieved: 0.00254732
   dataset 148, input: 0.00237149, achieved: 0.00237149
   dataset 149, input: 0.00252398, achieved: 0.00252398
   dataset 150, input: 0.0024633, achieved: 0.0024633
   dataset 151, input: 0.00254266, achieved: 0.00254266
   dataset 152, input: 0.00262669, achieved: 0.00262669
   dataset 153, input: 0.00240261, achieved: 0.00240261
   dataset 154, input: 0.00232169, achieved: 0.00232169
   dataset 155, input: 0.00254732, achieved: 0.00254732
   dataset 156, input: 0.0024384, achieved: 0.0024384
   dataset 157, input: 0.00257067, achieved: 0.00257067
   dataset 158, input: 0.00254266, achieved: 0.00254266
   dataset 159, input: 0.00261268, achieved: 0.00261268
   dataset 160, input: 0.00240416, achieved: 0.00240416
   dataset 161, input: 0.00259245, achieved: 0.00259245
   dataset 162, input: 0.00253799, achieved: 0.00253799
   dataset 163, input: 0.00246174, achieved: 0.00246174
   dataset 164, input: 0.00225945, achieved: 0.00225945
   dataset 165, input: 0.00250375, achieved: 0.00250375
   dataset 166, input: 0.00202292, achieved: 0.00202292
   dataset 167, input: 0.00131179, achieved: 0.00131179
   dataset 168, input: 0.00125577, achieved: 0.00125577
   dataset 169, input: 0.00145962, achieved: 0.00145962
   dataset 170, input: 0.00140204, achieved: 0.00140204
   dataset 171, input: 0.00116707, achieved: 0.00116707
   dataset 172, input: 0.00132579, achieved: 0.00132579
   dataset 173, input: 0.00139426, achieved: 0.00139426
   dataset 174, input: 0.00127444, achieved: 0.00127444
   dataset 175, input: 0.00137403, achieved: 0.00137403
   dataset 176, input: 0.0013398, achieved: 0.0013398
   dataset 177, input: 0.00121842, achieved: 0.00121842
   dataset 178, input: 0.00137714, achieved: 0.00137714
   dataset 179, input: 0.00133357, achieved: 0.00133357
   dataset 180, input: 0.00185642, achieved: 0.00185642
   dataset 181, input: 0.00050573, achieved: 0.00050573
   dataset 182, input: 0.00314642, achieved: 0.00314642
   dataset 183, input: 0.00289278, achieved: 0.00289278
   dataset 184, input: 0.00265314, achieved: 0.00265314
   dataset 185, input: 0.00227812, achieved: 0.00227812
   dataset 186, input: 0.00349654, achieved: 0.00349654
   dataset 187, input: 0.00437107, achieved: 0.00437107
   dataset 188, input: 0.00135225, achieved: 0.00135225
   dataset 189, input: 0.00277607, achieved: 0.00277607
   dataset 190, input: 0.00277296, achieved: 0.00277296
   dataset 191, input: 0.0027325, achieved: 0.0027325
   dataset 192, input: 0.00270293, achieved: 0.00270293
   dataset 193, input: 0.00280097, achieved: 0.00280097
   dataset 194, input: 0.00287099, achieved: 0.00287099
   dataset 195, input: 0.0026298, achieved: 0.0026298
   dataset 196, input: 0.00314953, achieved: 0.00314953
   dataset 197, input: 0.00294724, achieved: 0.00294724
   dataset 198, input: 0.00308418, achieved: 0.00308418
   dataset 199, input: 0.000790495, achieved: 0.000790495
   dataset 200, input: 0.00204937, achieved: 0.00204937
   dataset 201, input: 0.00168214, achieved: 0.00168214
   dataset 202, input: 0.00171948, achieved: 0.00171948
   dataset 203, input: 0.00195601, achieved: 0.00195601
   dataset 204, input: 0.0019918, achieved: 0.0019918
   dataset 205, input: 0.00205249, achieved: 0.00205249
   dataset 206, input: 0.00201825, achieved: 0.00201825
   dataset 207, input: 0.0020556, achieved: 0.0020556
   dataset 208, input: 0.00187665, achieved: 0.00187665
   dataset 209, input: 0.00189688, achieved: 0.00189688
   dataset 210, input: 0.00211006, achieved: 0.00211006
   dataset 211, input: 0.00212251, achieved: 0.00212251
   dataset 212, input: 0.00187976, achieved: 0.00187976
   dataset 213, input: 0.00203226, achieved: 0.00203226
   dataset 214, input: 0.00168214, achieved: 0.00168214
   dataset 215, input: 0.00231235, achieved: 0.00231235
   dataset 216, input: 0.00169147, achieved: 0.00169147
   dataset 217, input: 0.00196846, achieved: 0.00196846
   dataset 218, input: 0.00147051, achieved: 0.00147051
   dataset 219, input: 0.00189221, achieved: 0.00189221
   dataset 220, input: 0.00179729, achieved: 0.00179729
   dataset 221, input: 0.0019669, achieved: 0.0019669
   dataset 222, input: 0.00183463, achieved: 0.00183463
   dataset 223, input: 0.00213496, achieved: 0.00213496
   dataset 224, input: 0.00187665, achieved: 0.00187665
   dataset 225, input: 0.00193422, achieved: 0.00193422
   dataset 226, input: 0.00237149, achieved: 0.00237149
   dataset 227, input: 0.00170703, achieved: 0.00170703
   dataset 228, input: 0.00174749, achieved: 0.00174749
   dataset 229, input: 0.000759374, achieved: 0.000759374
   dataset 230, input: 0.00168058, achieved: 0.00168058
   dataset 231, input: 0.00197935, achieved: 0.00197935
   dataset 232, input: 0.0042139, achieved: 0.0042139
   dataset 233, input: 0.00431816, achieved: 0.00431816
   dataset 234, input: 0.00355723, achieved: 0.00355723
   dataset 235, input: 0.00370195, achieved: 0.00370195
   dataset 236, input: 0.00189999, achieved: 0.00189999
   dataset 237, input: 0.00250375, achieved: 0.00250375
   dataset 238, input: 0.00499973, achieved: 0.00499973
   dataset 239, input: 0.000925875, achieved: 0.000925875
   dataset 240, input: 0.00190466, achieved: 0.00190466
   dataset 241, input: 0.00194667, achieved: 0.00194667
   dataset 242, input: 0.00158722, achieved: 0.00158722
   dataset 243, input: 0.0039976, achieved: 0.0039976
   dataset 244, input: 0.00557704, achieved: 0.00557704
   dataset 245, input: 0.00230146, achieved: 0.00230146
   dataset 246, input: 0.00170548, achieved: 0.00170548
   dataset 247, input: 0.0023746, achieved: 0.0023746
   dataset 248, input: 0.00192333, achieved: 0.00192333
   dataset 249, input: 0.00204782, achieved: 0.00204782
   dataset 250, input: 0.00197779, achieved: 0.00197779
   dataset 251, input: 0.00167591, achieved: 0.00167591
   dataset 252, input: 0.00195445, achieved: 0.00195445
   dataset 253, input: 0.00214585, achieved: 0.00214585
   dataset 254, input: 0.00203848, achieved: 0.00203848
   dataset 255, input: 0.000746925, achieved: 0.000746925
   dataset 256, input: 0.00500595, achieved: 0.00500595
   dataset 257, input: 0.00535452, achieved: 0.00535452
   dataset 258, input: 0.00502151, achieved: 0.00502151
   dataset 259, input: 0.00479121, achieved: 0.00479121
   dataset 260, input: 0.00486746, achieved: 0.00486746
   dataset 261, input: 0.00111105, achieved: 0.00111105
[2025-03-12 09:21:37][I][data/gpt_dataset:191:megatron.data.gpt_dataset] [BuildConcatDataset] Caught args.shuffle_sample_in_corpus=True across 642635 samples
> building indices for blendable datasets ...
 > sample ratios:
   dataset 0, input: 0.0835778, achieved: 0.0835778
   dataset 1, input: 0.0834323, achieved: 0.0834323
   dataset 2, input: 0.0510323, achieved: 0.0510323
   dataset 3, input: 0.104354, achieved: 0.104354
   dataset 4, input: 0.0513544, achieved: 0.0513544
   dataset 5, input: 0.00400893, achieved: 0.00400893
   dataset 6, input: 0.115667, achieved: 0.115667
   dataset 7, input: 0.0827874, achieved: 0.0827874
   dataset 8, input: 0.103788, achieved: 0.103788
   dataset 9, input: 0.11266, achieved: 0.11266
   dataset 10, input: 0.050851, achieved: 0.050851
   dataset 11, input: 0.0513192, achieved: 0.0513192
   dataset 12, input: 0.105168, achieved: 0.105168
[2025-03-12 09:21:37][I][data/gpt_dataset:191:megatron.data.gpt_dataset] [BuildConcatDataset] Caught args.shuffle_sample_in_corpus=True across 1704196 samples
> building indices for blendable datasets ...
 > sample ratios:
   dataset 0, input: 0.0182376, achieved: 0.0182376
   dataset 1, input: 0.0182962, achieved: 0.0182962
   dataset 2, input: 0.018299, achieved: 0.018299
   dataset 3, input: 0.0182779, achieved: 0.0182779
   dataset 4, input: 0.0182862, achieved: 0.0182862
   dataset 5, input: 0.0181746, achieved: 0.0181746
   dataset 6, input: 0.0183693, achieved: 0.0183693
   dataset 7, input: 0.0220027, achieved: 0.0220027
   dataset 8, input: 0.0486005, achieved: 0.0486005
   dataset 9, input: 0.0484891, achieved: 0.0484891
   dataset 10, input: 0.0512473, achieved: 0.0512473
   dataset 11, input: 0.0512001, achieved: 0.0512001
   dataset 12, input: 0.0512732, achieved: 0.0512732
   dataset 13, input: 0.0485441, achieved: 0.0485441
   dataset 14, input: 0.0485733, achieved: 0.0485733
   dataset 15, input: 0.0511485, achieved: 0.0511485
   dataset 16, input: 0.0485108, achieved: 0.0485108
   dataset 17, input: 0.0485108, achieved: 0.0485108
   dataset 18, input: 0.0487117, achieved: 0.0487117
   dataset 19, input: 0.0511296, achieved: 0.0511296
   dataset 20, input: 0.048739, achieved: 0.048739
   dataset 21, input: 0.0512227, achieved: 0.0512227
   dataset 22, input: 0.0486002, achieved: 0.0486002
   dataset 23, input: 0.0487371, achieved: 0.0487371
   dataset 24, input: 0.0511531, achieved: 0.0511531
   dataset 25, input: 0.00566539, achieved: 0.00566539
[2025-03-12 09:21:38][I][data/gpt_dataset:191:megatron.data.gpt_dataset] [BuildConcatDataset] Caught args.shuffle_sample_in_corpus=True across 6726808 samples
> building indices for blendable datasets ...
 > sample ratios:
   dataset 0, input: 0.0130268, achieved: 0.0130268
   dataset 1, input: 0.0134792, achieved: 0.0134792
   dataset 2, input: 0.0136289, achieved: 0.0136289
   dataset 3, input: 0.013172, achieved: 0.013172
   dataset 4, input: 0.0132409, achieved: 0.0132409
   dataset 5, input: 0.013352, achieved: 0.013352
   dataset 6, input: 0.0134121, achieved: 0.0134121
   dataset 7, input: 0.0128782, achieved: 0.0128782
   dataset 8, input: 0.0128349, achieved: 0.0128349
   dataset 9, input: 0.013055, achieved: 0.013055
   dataset 10, input: 0.0128416, achieved: 0.0128416
   dataset 11, input: 0.012892, achieved: 0.012892
   dataset 12, input: 0.0128222, achieved: 0.0128222
   dataset 13, input: 0.0128934, achieved: 0.0128934
   dataset 14, input: 0.0131082, achieved: 0.0131082
   dataset 15, input: 0.0129718, achieved: 0.0129718
   dataset 16, input: 0.013004, achieved: 0.013004
   dataset 17, input: 0.0128759, achieved: 0.0128759
   dataset 18, input: 0.0129287, achieved: 0.0129287
   dataset 19, input: 0.0130294, achieved: 0.0130294
   dataset 20, input: 0.0128648, achieved: 0.0128648
   dataset 21, input: 0.0131354, achieved: 0.0131354
   dataset 22, input: 0.0129144, achieved: 0.0129144
   dataset 23, input: 0.0129003, achieved: 0.0129003
   dataset 24, input: 0.013258, achieved: 0.013258
   dataset 25, input: 0.0129312, achieved: 0.0129312
   dataset 26, input: 0.013249, achieved: 0.013249
   dataset 27, input: 0.0131446, achieved: 0.0131446
   dataset 28, input: 0.0131264, achieved: 0.0131264
   dataset 29, input: 0.0128913, achieved: 0.0128913
   dataset 30, input: 0.0129347, achieved: 0.0129347
   dataset 31, input: 0.0132695, achieved: 0.0132695
   dataset 32, input: 0.0129616, achieved: 0.0129616
   dataset 33, input: 0.0129188, achieved: 0.0129188
   dataset 34, input: 0.0128966, achieved: 0.0128966
   dataset 35, input: 0.012892, achieved: 0.012892
   dataset 36, input: 0.013181, achieved: 0.013181
   dataset 37, input: 0.0130499, achieved: 0.0130499
   dataset 38, input: 0.0129443, achieved: 0.0129443
   dataset 39, input: 0.0130167, achieved: 0.0130167
   dataset 40, input: 0.0127473, achieved: 0.0127473
   dataset 41, input: 0.0127561, achieved: 0.0127561
   dataset 42, input: 0.01274, achieved: 0.01274
   dataset 43, input: 0.012751, achieved: 0.012751
   dataset 44, input: 0.012733, achieved: 0.012733
   dataset 45, input: 0.012737, achieved: 0.012737
   dataset 46, input: 0.0127356, achieved: 0.0127356
   dataset 47, input: 0.0127287, achieved: 0.0127287
   dataset 48, input: 0.0127204, achieved: 0.0127204
   dataset 49, input: 0.0127125, achieved: 0.0127125
   dataset 50, input: 0.0126971, achieved: 0.0126971
   dataset 51, input: 0.0127033, achieved: 0.0127033
   dataset 52, input: 0.0126906, achieved: 0.0126906
   dataset 53, input: 0.0126747, achieved: 0.0126747
   dataset 54, input: 0.0126724, achieved: 0.0126724
   dataset 55, input: 0.0126814, achieved: 0.0126814
   dataset 56, input: 0.0126768, achieved: 0.0126768
   dataset 57, input: 0.0126962, achieved: 0.0126962
   dataset 58, input: 0.0126883, achieved: 0.0126883
   dataset 59, input: 0.0126651, achieved: 0.0126651
   dataset 60, input: 0.0126734, achieved: 0.0126734
   dataset 61, input: 0.0126632, achieved: 0.0126632
   dataset 62, input: 0.0126542, achieved: 0.0126542
   dataset 63, input: 0.0126671, achieved: 0.0126671
   dataset 64, input: 0.0126969, achieved: 0.0126969
   dataset 65, input: 0.0138538, achieved: 0.0138538
   dataset 66, input: 0.0124738, achieved: 0.0124738
   dataset 67, input: 0.0124994, achieved: 0.0124994
   dataset 68, input: 0.0123766, achieved: 0.0123766
   dataset 69, input: 0.0124441, achieved: 0.0124441
   dataset 70, input: 0.0122455, achieved: 0.0122455
   dataset 71, input: 0.0124694, achieved: 0.0124694
   dataset 72, input: 0.0121931, achieved: 0.0121931
   dataset 73, input: 0.0122484, achieved: 0.0122484
   dataset 74, input: 0.0117788, achieved: 0.0117788
   dataset 75, input: 0.0133204, achieved: 0.0133204
   dataset 76, input: 0.0131683, achieved: 0.0131683
   dataset 77, input: 0.00943814, achieved: 0.00943814
[2025-03-12 09:21:38][I][data/gpt_dataset:191:megatron.data.gpt_dataset] [BuildConcatDataset] Caught args.shuffle_sample_in_corpus=True across 4339733 samples
> building indices for blendable datasets ...
 > sample ratios:
   dataset 0, input: 0.038046, achieved: 0.038046
   dataset 1, input: 0.0413855, achieved: 0.0413855
   dataset 2, input: 0.0406094, achieved: 0.0406094
   dataset 3, input: 0.0365558, achieved: 0.0365558
   dataset 4, input: 0.0341426, achieved: 0.0341426
   dataset 5, input: 0.0350149, achieved: 0.0350149
   dataset 6, input: 0.0358744, achieved: 0.0358744
   dataset 7, input: 0.036827, achieved: 0.036827
   dataset 8, input: 0.0375282, achieved: 0.0375282
   dataset 9, input: 0.0379556, achieved: 0.0379556
   dataset 10, input: 0.0381705, achieved: 0.0381705
   dataset 11, input: 0.038556, achieved: 0.038556
   dataset 12, input: 0.0388884, achieved: 0.0388884
   dataset 13, input: 0.0391665, achieved: 0.0391665
   dataset 14, input: 0.0393857, achieved: 0.0393857
   dataset 15, input: 0.0397976, achieved: 0.0397976
   dataset 16, input: 0.0400668, achieved: 0.0400668
   dataset 17, input: 0.0403879, achieved: 0.0403879
   dataset 18, input: 0.0408308, achieved: 0.0408308
   dataset 19, input: 0.0411838, achieved: 0.0411838
   dataset 20, input: 0.0418467, achieved: 0.0418467
   dataset 21, input: 0.0425557, achieved: 0.0425557
   dataset 22, input: 0.042814, achieved: 0.042814
   dataset 23, input: 0.0425712, achieved: 0.0425712
   dataset 24, input: 0.0388551, achieved: 0.0388551
   dataset 25, input: 0.020984, achieved: 0.020984
[2025-03-12 09:21:39][I][data/gpt_dataset:191:megatron.data.gpt_dataset] [BuildConcatDataset] Caught args.shuffle_sample_in_corpus=True across 2578247 samples
> building indices for blendable datasets ...
 > sample ratios:
   dataset 0, input: 0.0235833, achieved: 0.0235833
   dataset 1, input: 0.0216057, achieved: 0.0216057
   dataset 2, input: 0.027075, achieved: 0.027075
   dataset 3, input: 0.0271066, achieved: 0.0271066
   dataset 4, input: 0.0274384, achieved: 0.0274384
   dataset 5, input: 0.0257867, achieved: 0.0257867
   dataset 6, input: 0.0255337, achieved: 0.0255337
   dataset 7, input: 0.027977, achieved: 0.027977
   dataset 8, input: 0.0270093, achieved: 0.0270093
   dataset 9, input: 0.0285904, achieved: 0.0285904
   dataset 10, input: 0.0283676, achieved: 0.0283676
   dataset 11, input: 0.0153407, achieved: 0.0153407
   dataset 12, input: 0.014138, achieved: 0.014138
   dataset 13, input: 0.0141798, achieved: 0.0141798
   dataset 14, input: 0.0141664, achieved: 0.0141664
   dataset 15, input: 0.0150215, achieved: 0.0150215
   dataset 16, input: 0.0280527, achieved: 0.0280527
   dataset 17, input: 0.023455, achieved: 0.023455
   dataset 18, input: 0.0247761, achieved: 0.0247761
   dataset 19, input: 0.0205734, achieved: 0.0205734
   dataset 20, input: 0.0205842, achieved: 0.0205842
   dataset 21, input: 0.020579, achieved: 0.020579
   dataset 22, input: 0.0205939, achieved: 0.0205939
   dataset 23, input: 0.0203349, achieved: 0.0203349
   dataset 24, input: 0.0199823, achieved: 0.0199823
   dataset 25, input: 0.0199573, achieved: 0.0199573
   dataset 26, input: 0.0199854, achieved: 0.0199854
   dataset 27, input: 0.0168267, achieved: 0.0168267
   dataset 28, input: 0.0172125, achieved: 0.0172125
   dataset 29, input: 0.018342, achieved: 0.018342
   dataset 30, input: 0.014919, achieved: 0.014919
   dataset 31, input: 0.0149787, achieved: 0.0149787
   dataset 32, input: 0.0149735, achieved: 0.0149735
   dataset 33, input: 0.0149415, achieved: 0.0149415
   dataset 34, input: 0.0149689, achieved: 0.0149689
   dataset 35, input: 0.0149673, achieved: 0.0149673
   dataset 36, input: 0.0230039, achieved: 0.0230039
   dataset 37, input: 0.0215731, achieved: 0.0215731
   dataset 38, input: 0.0215682, achieved: 0.0215682
   dataset 39, input: 0.0211097, achieved: 0.0211097
   dataset 40, input: 0.0190817, achieved: 0.0190817
   dataset 41, input: 0.0191069, achieved: 0.0191069
   dataset 42, input: 0.0189985, achieved: 0.0189985
   dataset 43, input: 0.0186603, achieved: 0.0186603
   dataset 44, input: 0.0219256, achieved: 0.0219256
   dataset 45, input: 0.0310232, achieved: 0.0310232
   dataset 46, input: 0.0189783, achieved: 0.0189783
   dataset 47, input: 0.0184155, achieved: 0.0184155
   dataset 48, input: 0.00263109, achieved: 0.00263109
[2025-03-12 09:21:40][I][data/gpt_dataset:191:megatron.data.gpt_dataset] [BuildConcatDataset] Caught args.shuffle_sample_in_corpus=True across 18622355 samples
> building indices for blendable datasets ...
 > sample ratios:
   dataset 0, input: 0.0180888, achieved: 0.0180888
   dataset 1, input: 0.0157123, achieved: 0.0157123
   dataset 2, input: 0.0156305, achieved: 0.0156305
   dataset 3, input: 0.0150714, achieved: 0.0150714
   dataset 4, input: 0.0142973, achieved: 0.0142973
   dataset 5, input: 0.0129064, achieved: 0.0129064
   dataset 6, input: 0.0162972, achieved: 0.0162972
   dataset 7, input: 0.0174737, achieved: 0.0174737
   dataset 8, input: 0.017477, achieved: 0.017477
   dataset 9, input: 0.0148979, achieved: 0.0148979
   dataset 10, input: 0.0158723, achieved: 0.0158723
   dataset 11, input: 0.0164824, achieved: 0.0164824
   dataset 12, input: 0.0147445, achieved: 0.0147445
   dataset 13, input: 0.0160167, achieved: 0.0160167
   dataset 14, input: 0.0164729, achieved: 0.0164729
   dataset 15, input: 0.0169845, achieved: 0.0169845
   dataset 16, input: 0.014763, achieved: 0.014763
   dataset 17, input: 0.0152292, achieved: 0.0152292
   dataset 18, input: 0.0156109, achieved: 0.0156109
   dataset 19, input: 0.0155986, achieved: 0.0155986
   dataset 20, input: 0.0157206, achieved: 0.0157206
   dataset 21, input: 0.0135193, achieved: 0.0135193
   dataset 22, input: 0.0106687, achieved: 0.0106687
   dataset 23, input: 0.012068, achieved: 0.012068
   dataset 24, input: 0.0143947, achieved: 0.0143947
   dataset 25, input: 0.0133682, achieved: 0.0133682
   dataset 26, input: 0.0116767, achieved: 0.0116767
   dataset 27, input: 0.0121379, achieved: 0.0121379
   dataset 28, input: 0.0188349, achieved: 0.0188349
   dataset 29, input: 0.0185595, achieved: 0.0185595
   dataset 30, input: 0.0184979, achieved: 0.0184979
   dataset 31, input: 0.0163945, achieved: 0.0163945
   dataset 32, input: 0.0160772, achieved: 0.0160772
   dataset 33, input: 0.0161074, achieved: 0.0161074
   dataset 34, input: 0.0160856, achieved: 0.0160856
   dataset 35, input: 0.0160156, achieved: 0.0160156
   dataset 36, input: 0.0157346, achieved: 0.0157346
   dataset 37, input: 0.0157352, achieved: 0.0157352
   dataset 38, input: 0.0155214, achieved: 0.0155214
   dataset 39, input: 0.0154414, achieved: 0.0154414
   dataset 40, input: 0.0172906, achieved: 0.0172906
   dataset 41, input: 0.0117512, achieved: 0.0117512
   dataset 42, input: 0.0169453, achieved: 0.0169453
   dataset 43, input: 0.0181806, achieved: 0.0181806
   dataset 44, input: 0.0184588, achieved: 0.0184588
   dataset 45, input: 0.018433, achieved: 0.018433
   dataset 46, input: 0.0209411, achieved: 0.0209411
   dataset 47, input: 0.0209187, achieved: 0.0209187
   dataset 48, input: 0.0186021, achieved: 0.0186021
   dataset 49, input: 0.0118603, achieved: 0.0118603
   dataset 50, input: 0.0118782, achieved: 0.0118782
   dataset 51, input: 0.0118402, achieved: 0.0118402
   dataset 52, input: 0.0118558, achieved: 0.0118558
   dataset 53, input: 0.0118536, achieved: 0.0118536
   dataset 54, input: 0.0118463, achieved: 0.0118463
   dataset 55, input: 0.011829, achieved: 0.011829
   dataset 56, input: 0.0118284, achieved: 0.0118284
   dataset 57, input: 0.0155567, achieved: 0.0155567
   dataset 58, input: 0.0138154, achieved: 0.0138154
   dataset 59, input: 0.0173808, achieved: 0.0173808
   dataset 60, input: 0.0151027, achieved: 0.0151027
   dataset 61, input: 0.0143712, achieved: 0.0143712
   dataset 62, input: 0.0144484, achieved: 0.0144484
   dataset 63, input: 0.0170886, achieved: 0.0170886
   dataset 64, input: 0.0156731, achieved: 0.0156731
   dataset 65, input: 0.0020631, achieved: 0.0020631
[2025-03-12 09:21:41][I][data/gpt_dataset:191:megatron.data.gpt_dataset] [BuildConcatDataset] Caught args.shuffle_sample_in_corpus=True across 1786631 samples
> building indices for blendable datasets ...
 > sample ratios:
   dataset 0, input: 0.658845, achieved: 0.658845
   dataset 1, input: 0.341155, achieved: 0.341155
[2025-03-12 09:21:41][I][data/gpt_dataset:191:megatron.data.gpt_dataset] [BuildConcatDataset] Caught args.shuffle_sample_in_corpus=True across 528552 samples
 > WARNING: could not find index map files for blendable dataset, building indices on rank 0 ...
> building indices for blendable datasets ...
 > sample ratios:
   dataset 0, input: 0.0170699, achieved: 0.0170699
   dataset 1, input: 0.0300583, achieved: 0.0300583
   dataset 2, input: 0.00720419, achieved: 0.00720419
   dataset 3, input: 0.0390113, achieved: 0.0390113
   dataset 4, input: 0.353979, achieved: 0.353979
   dataset 5, input: 0.176417, achieved: 0.176417
   dataset 6, input: 0.00654659, achieved: 0.00654659
   dataset 7, input: 0.0173636, achieved: 0.0173636
   dataset 8, input: 0.0685377, achieved: 0.0685377
   dataset 9, input: 0.0442162, achieved: 0.0442162
   dataset 10, input: 0.026269, achieved: 0.026269
   dataset 11, input: 0.189738, achieved: 0.189738
   dataset 12, input: 0.0182033, achieved: 0.0182033
   dataset 13, input: 0.00538529, achieved: 0.00538529
[2025-03-12 09:21:55][I][data/blendable_dataset:52:megatron.data.blendable_dataset] > elapsed time for building blendable dataset indices: 13.57 (sec)
[2025-03-12 09:22:00][I][data/blendable_dataset:87:megatron.data.blendable_dataset]  > finished saving index map files in 4.8593598260194995 seconds
[2025-03-12 09:22:00][I][data/blendable_dataset:112:megatron.data.blendable_dataset] > loading blendable dataset index: checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/.cache/dolma/index-cache/3e486b3f0fe13ac22e1cefff571f6e28_index.npy
[2025-03-12 09:22:00][I][data/blendable_dataset:115:megatron.data.blendable_dataset] > loading blendable dataset sample index: checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/.cache/dolma/index-cache/3e486b3f0fe13ac22e1cefff571f6e28_sample_index.npy
[2025-03-12 09:22:00][I][data/blendable_dataset:118:megatron.data.blendable_dataset] > finished loading in 0.02210795701830648 seconds
[2025-03-12 09:22:00][I][data/blendable_dataset:130:megatron.data.blendable_dataset] > size of blendable dataset: 490723579 samples
 > WARNING: could not find index map files for blendable dataset, building indices on rank 0 ...
> building indices for blendable datasets ...
 > sample ratios:
   dataset 0, input: 0.0170698, achieved: 0.0170698
   dataset 1, input: 0.0300584, achieved: 0.0300584
   dataset 2, input: 0.00720414, achieved: 0.00720414
   dataset 3, input: 0.0390117, achieved: 0.0390117
   dataset 4, input: 0.35398, achieved: 0.35398
   dataset 5, input: 0.176418, achieved: 0.176418
   dataset 6, input: 0.00654759, achieved: 0.00654759
   dataset 7, input: 0.0173635, achieved: 0.0173635
   dataset 8, input: 0.0685371, achieved: 0.0685371
   dataset 9, input: 0.044216, achieved: 0.044216
   dataset 10, input: 0.0262689, achieved: 0.0262689
   dataset 11, input: 0.189737, achieved: 0.189737
   dataset 12, input: 0.0182034, achieved: 0.0182034
   dataset 13, input: 0.00538523, achieved: 0.00538523
[2025-03-12 09:22:03][I][data/blendable_dataset:52:megatron.data.blendable_dataset] > elapsed time for building blendable dataset indices: 2.72 (sec)
[2025-03-12 09:22:04][I][data/blendable_dataset:87:megatron.data.blendable_dataset]  > finished saving index map files in 0.9886123439937364 seconds
[2025-03-12 09:22:04][I][data/blendable_dataset:112:megatron.data.blendable_dataset] > loading blendable dataset index: checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/.cache/dolma/index-cache/28ce9099b1b5fd7fcef2f8dd382241ce_index.npy
[2025-03-12 09:22:04][I][data/blendable_dataset:115:megatron.data.blendable_dataset] > loading blendable dataset sample index: checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/.cache/dolma/index-cache/28ce9099b1b5fd7fcef2f8dd382241ce_sample_index.npy
[2025-03-12 09:22:04][I][data/blendable_dataset:118:megatron.data.blendable_dataset] > finished loading in 0.01950894997571595 seconds
[2025-03-12 09:22:04][I][data/blendable_dataset:130:megatron.data.blendable_dataset] > size of blendable dataset: 98148401 samples
[2025-03-12 09:22:04][I][Megatron-DeepSpeed/pretrain_gpt_alcf:515:__main__] > finished creating GPT datasets. Took: 146370014386706.56250s
[2025-03-12 09:22:04][I][ezpz/dist:125] `train_valid_test_datasets_provider`(([488280960, 97658880, 7680],)) took: dt=499.5761s
[2025-03-12 09:22:04][I][ezpz/dist:125] `build_train_valid_test_datasets`((<function train_valid_test_datasets_provider at 0x152f6ca8ab90>,)) took: dt=499.5786s
[2025-03-12 09:22:04][I][ezpz/dist:125] `build_train_valid_test_data_loaders`((<function train_valid_test_datasets_provider at 0x152f6ca8ab90>,)) took: dt=499.5840s
[2025-03-12 09:22:08][I][ezpz/dist:125] `build_train_valid_test_data_iterators`((<function train_valid_test_datasets_provider at 0x152f6ca8ab90>,)) took: dt=503.1383s
[2025-03-12 09:22:08][I][megatron/training:96] [after dataloaders are built] datetime=2025-03-12 09:22:08 
[2025-03-12 09:22:08][I][megatron/training:287] done with setup ...
(min, max) time across ranks (ms):
    model-and-optimizer-setup ......................: (31352.08, 31398.10)
    train/valid/test-data-iterators-setup ..........: (502846.19, 503390.32)
[2025-03-12 09:22:08][I][megatron/training:293] training ...
[2025-03-12 09:22:08][I][megatron/training:96] [before the start of training step] datetime=2025-03-12 09:22:08 
[2025-03-12 09:22:46,305] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | optimizer_allgather: 385.34 | optimizer_gradients: 81.01 | optimizer_step: 187.65
[2025-03-12 09:22:46,306] [INFO] [logging.py:128:log_dist] [Rank 0] step=1, skipped=0, lr=[3.1457298683118837e-09, 3.1457298683118837e-09], mom=[(0.9, 0.95), (0.9, 0.95)]
[2025-03-12 09:22:46,306] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd_microstep: 14022.90 | bwd_microstep: 19792.51 | bwd_inner_microstep: 19070.74 | bwd_allreduce_microstep: 721.43 | step_microstep: 1070.39
[2025-03-12 09:22:46,306] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd: 14022.92 | bwd: 19792.50 | bwd_inner: 19070.80 | bwd_allreduce: 721.43 | step: 1070.39
[2025-03-12 09:22:46][I][megatron/training_log:661]  iteration=       1/ 1271565 | consumed_samples=         384 | consumed_tokens=     1572864 | elapsed_time_per_iteration_ms=38000.2 | learning_rate=3.14573e-09 | global_batch_size=  384 | lm loss=11.170761 | loss_scale=1.0 | grad_norm=10.663 | actual_seqlen= 4096 | number_of_skipped_iterations=  0 | number_of_nan_iterations=  0 | samples_per_second=10.105 | tokens_per_gpu_per_second_tgs=1724.620 | [LM]TFLOPs=71.15 | [DS]TFLOPs=68.37 |
[2025-03-12 09:22:46][I][megatron/utils:249] [Rank 0] (after 1 iterations) memory (MB) | allocated: 14149.64111328125 | max allocated: 45063.8935546875 | reserved: 50822.0 | max reserved: 50822.0
(min, max) time across ranks (ms):
    forward-backward ...............................: (36879.78, 36881.36)
    optimizer ......................................: (1069.44, 1070.79)
[2025-03-12 09:23:13,766] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | optimizer_allgather: 222.97 | optimizer_gradients: 0.55 | optimizer_step: 1.05
[2025-03-12 09:23:13,766] [INFO] [logging.py:128:log_dist] [Rank 0] step=2, skipped=0, lr=[6.291459736623767e-09, 6.291459736623767e-09], mom=[(0.9, 0.95), (0.9, 0.95)]
[2025-03-12 09:23:13,766] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd_microstep: 7744.55 | bwd_microstep: 19367.09 | bwd_inner_microstep: 18726.46 | bwd_allreduce_microstep: 640.33 | step_microstep: 237.10
[2025-03-12 09:23:13,766] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd: 7744.59 | bwd: 19367.09 | bwd_inner: 18726.52 | bwd_allreduce: 640.32 | step: 237.10
[2025-03-12 09:23:13][I][megatron/training_log:661]  iteration=       2/ 1271565 | consumed_samples=         768 | consumed_tokens=     3145728 | elapsed_time_per_iteration_ms=27455.8 | learning_rate=6.29146e-09 | global_batch_size=  384 | lm loss=11.167736 | loss_scale=1.0 | grad_norm=10.859 | actual_seqlen= 4096 | number_of_skipped_iterations=  0 | number_of_nan_iterations=  0 | samples_per_second=13.986 | tokens_per_gpu_per_second_tgs=2386.962 | [LM]TFLOPs=98.47 | [DS]TFLOPs=94.63 |
(min, max) time across ranks (ms):
    forward-backward ...............................: (27194.87, 27195.76)
    optimizer ......................................: (236.57, 237.40)
[2025-03-12 09:23:13,791] [INFO] [profiler.py:82:start_profile] Flops profiler started
[2025-03-12 09:23:15,366] [INFO] [profiler.py:82:start_profile] Flops profiler started
[2025-03-12 09:23:17,028] [INFO] [profiler.py:82:start_profile] Flops profiler started
[2025-03-12 09:23:18,697] [INFO] [profiler.py:82:start_profile] Flops profiler started
[2025-03-12 09:23:20,369] [INFO] [profiler.py:82:start_profile] Flops profiler started
[2025-03-12 09:23:22,043] [INFO] [profiler.py:82:start_profile] Flops profiler started
[2025-03-12 09:23:23,712] [INFO] [profiler.py:82:start_profile] Flops profiler started
[2025-03-12 09:23:25,388] [INFO] [profiler.py:82:start_profile] Flops profiler started
[2025-03-12 09:23:27,058] [INFO] [profiler.py:82:start_profile] Flops profiler started
[2025-03-12 09:23:28,733] [INFO] [profiler.py:82:start_profile] Flops profiler started
[2025-03-12 09:23:30,408] [INFO] [profiler.py:82:start_profile] Flops profiler started
[2025-03-12 09:23:32,080] [INFO] [profiler.py:82:start_profile] Flops profiler started
[2025-03-12 09:23:33,751] [INFO] [profiler.py:82:start_profile] Flops profiler started
[2025-03-12 09:23:35,425] [INFO] [profiler.py:82:start_profile] Flops profiler started
[2025-03-12 09:23:37,099] [INFO] [profiler.py:82:start_profile] Flops profiler started
[2025-03-12 09:23:38,769] [INFO] [profiler.py:82:start_profile] Flops profiler started
[2025-03-12 09:23:41,311] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | optimizer_allgather: 222.98 | optimizer_gradients: 0.57 | optimizer_step: 1.04
[2025-03-12 09:23:41,311] [INFO] [logging.py:128:log_dist] [Rank 0] step=3, skipped=0, lr=[9.43718960493565e-09, 9.43718960493565e-09], mom=[(0.9, 0.95), (0.9, 0.95)]
[2025-03-12 09:23:41,312] [INFO] [timer.py:264:stop] epoch=0/micro_step=3/global_step=3, RunningAvgSamplesPerSec=152.28062072819174, CurrSamplesPerSec=152.28056033918585, MemAllocated=13.82GB, MaxMemAllocated=45.85GB
 .netrc                             conda-install.log
-------------------------- DeepSpeed Flops Profiler --------------------------
Profile Summary at step 3:
Notations:
data parallel size (dp_size), model parallel size(mp_size),
number of parameters (params), number of multiply-accumulate operations(MACs),
number of floating-point operations (flops), floating-point operations per second (FLOPS),
fwd latency (forward propagation latency), bwd latency (backward propagation latency),
step (weights update latency), iter latency (sum of fwd, bwd and step latency)
 .zsh_history-aurora-combined       test.sh.o2466437
world size:                                                             24      
data parallel size:                                                     24      
model parallel size:                                                    1       
batch size per GPU:                                                     1       
params per GPU:                                                         5.93 B  
params of model = params per GPU * mp_size:                             5.93 B  
fwd MACs per GPU:                                                       94.13 TMACs
fwd flops per GPU:                                                      188.27 T
fwd flops of model = fwd flops per GPU * mp_size:                       188.27 T
fwd latency:                                                            8 s     
fwd FLOPS per GPU = fwd flops per GPU / fwd latency:                    23.52 TFLOPS
bwd latency:                                                            18.84 s 
bwd FLOPS per GPU = 2 * fwd flops per GPU / bwd latency:                19.98 TFLOPS
fwd+bwd FLOPS per GPU = 3 * fwd flops per GPU / (fwd+bwd latency):      21.04 TFLOPS
step latency:                                                           237.44 ms
iter latency:                                                           27.09 s 
FLOPS per GPU = 3 * fwd flops per GPU / iter latency:                   20.85 TFLOPS
samples/second:                                                         0.89    
 core.96710                         intel_extension_for_pytorch-2.
----------------------------- Aggregated Profile per GPU -----------------------------
Top 1 modules in terms of params, MACs or fwd latency at different model depths:
depth 0:
    params      - {'GPTModel': '5.93 B'}
    MACs        - {'GPTModel': '94.13 TMACs'}
    fwd latency - {'GPTModel': '503.85 ms'}
depth 1:
    params      - {'TransformerLanguageModel': '5.93 B'}
    MACs        - {'TransformerLanguageModel': '93.6 TMACs'}
    fwd latency - {'TransformerLanguageModel': '489.39 ms'}
depth 2:
    params      - {'ParallelTransformer': '5.67 B'}
    MACs        - {'ParallelTransformer': '93.6 TMACs'}
    fwd latency - {'ParallelTransformer': '488.06 ms'}
depth 3:
    params      - {'ModuleList': '5.67 B'}
    MACs        - {'ModuleList': '93.6 TMACs'}
    fwd latency - {'ModuleList': '486.47 ms'}
depth 4:
    params      - {'ParallelTransformerLayer': '5.67 B'}
    MACs        - {'ParallelTransformerLayer': '93.6 TMACs'}
    fwd latency - {'ParallelTransformerLayer': '486.47 ms'}
depth 5:
    params      - {'ParallelMLP': '4.33 B'}
    MACs        - {'ParallelAttention': '75.87 TMACs'}
    fwd latency - {'ParallelAttention': '228.3 ms'}
 scripts   .tach     deps      .gitignore   test.sh.e2991275  
------------------------------ Detailed Profile per GPU ------------------------------
Each module profile is listed after its name in the following order: 
params, percentage of total params, MACs, percentage of total MACs, fwd latency, percentage of total fwd latency, fwd FLOPS
                                                                    
Note: 1. A module can have torch.nn.module or torch.nn.functional to compute logits (e.g. CrossEntropyLoss). They are not counted as submodules, thus not to be printed out. However they make up the difference between a parent's MACs (or latency) and the sum of its submodules'.      
2. Number of floating-point operations is a theoretical estimation, thus FLOPS computed using that could be larger than the maximum system throughput.
3. The fwd latency listed in the top module's profile is directly captured at the module forward function in PyTorch, thus it's less than the fwd latency shown above which is captured in DeepSpeed.
[1]    12996 exit 255   ssh login.aurora
GPTModel(
  5.93 B = 100% Params, 94.13 TMACs = 100% MACs, 503.85 ms = 100% latency, 373.66 TFLOPS
  (language_model): TransformerLanguageModel(
    5.93 B = 100% Params, 93.6 TMACs = 99.43% MACs, 489.39 ms = 97.13% latency, 382.51 TFLOPS
    (embedding): Embedding(
      131.07 M = 2.21% Params, 0 MACs = 0% MACs, 813.72 us = 0.16% latency, 0 FLOPS
      (word_embeddings): VocabParallelEmbedding(131.07 M = 2.21% Params, 0 MACs = 0% MACs, 692.61 us = 0.14% latency, 0 FLOPS)
      (embedding_dropout): Dropout(0 = 0% Params, 0 MACs = 0% MACs, 36.24 us = 0.01% latency, 0 FLOPS, p=0.0, inplace=False)
    )
    (rotary_pos_emb): RotaryEmbedding(0 = 0% Params, 0 MACs = 0% MACs, 402.21 us = 0.08% latency, 0 FLOPS)
    (encoder): ParallelTransformer(
      5.67 B = 95.58% Params, 93.6 TMACs = 99.43% MACs, 488.06 ms = 96.87% latency, 383.54 TFLOPS
      (layers): ModuleList(
        (0): ParallelTransformerLayer(
          177.22 M = 2.99% Params, 2.92 TMACs = 3.11% MACs, 13.6 ms = 2.7% latency, 430.29 TFLOPS
          (input_layernorm): RMSNorm(4.1 K = 0% Params, 0 MACs = 0% MACs, 222.68 us = 0.04% latency, 0 FLOPS)
          (self_attention): ParallelAttention(
            41.94 M = 0.71% Params, 2.37 TMACs = 2.52% MACs, 6.34 ms = 1.26% latency, 747.58 TFLOPS
            (query_key_value): ColumnParallelLinear(25.17 M = 0.42% Params, 103.08 GMACs = 0.11% MACs, 1.59 ms = 0.32% latency, 129.33 TFLOPS)
            (core_attention_flash): FlashSelfAttention(0 = 0% Params, 2.2 TMACs = 2.34% MACs, 2.68 ms = 0.53% latency, 1640.15 TFLOPS)
            (dense): RowParallelLinear(16.78 M = 0.28% Params, 68.72 GMACs = 0.07% MACs, 771.05 us = 0.15% latency, 178.25 TFLOPS)
          )
          (post_attention_layernorm): RMSNorm(4.1 K = 0% Params, 0 MACs = 0% MACs, 251.29 us = 0.05% latency, 0 FLOPS)
          (mlp): ParallelMLP(
            135.27 M = 2.28% Params, 554.05 GMACs = 0.59% MACs, 6.41 ms = 1.27% latency, 172.98 TFLOPS
            (dense_h_to_4h): ColumnParallelLinear(90.18 M = 1.52% Params, 369.37 GMACs = 0.39% MACs, 3.41 ms = 0.68% latency, 216.46 TFLOPS)
            (dense_4h_to_h): RowParallelLinear(45.09 M = 0.76% Params, 184.68 GMACs = 0.2% MACs, 1.96 ms = 0.39% latency, 188.66 TFLOPS)
          )
        )
        (1): ParallelTransformerLayer(
          177.22 M = 2.99% Params, 2.92 TMACs = 3.11% MACs, 14.32 ms = 2.84% latency, 408.54 TFLOPS
          (input_layernorm): RMSNorm(4.1 K = 0% Params, 0 MACs = 0% MACs, 285.15 us = 0.06% latency, 0 FLOPS)
          (self_attention): ParallelAttention(
            41.94 M = 0.71% Params, 2.37 TMACs = 2.52% MACs, 6.76 ms = 1.34% latency, 701.14 TFLOPS
            (query_key_value): ColumnParallelLinear(25.17 M = 0.42% Params, 103.08 GMACs = 0.11% MACs, 1.15 ms = 0.23% latency, 179.51 TFLOPS)
            (core_attention_flash): FlashSelfAttention(0 = 0% Params, 2.2 TMACs = 2.34% MACs, 3.31 ms = 0.66% latency, 1328.15 TFLOPS)
            (dense): RowParallelLinear(16.78 M = 0.28% Params, 68.72 GMACs = 0.07% MACs, 838.28 us = 0.17% latency, 163.95 TFLOPS)
          )
          (post_attention_layernorm): RMSNorm(4.1 K = 0% Params, 0 MACs = 0% MACs, 263.21 us = 0.05% latency, 0 FLOPS)
          (mlp): ParallelMLP(
            135.27 M = 2.28% Params, 554.05 GMACs = 0.59% MACs, 6.64 ms = 1.32% latency, 167 TFLOPS
            (dense_h_to_4h): ColumnParallelLinear(90.18 M = 1.52% Params, 369.37 GMACs = 0.39% MACs, 3.63 ms = 0.72% latency, 203.3 TFLOPS)ora-pbs-0001.hostmgmt.cm.aurora.alcf.anl.gov: 
            (dense_4h_to_h): RowParallelLinear(45.09 M = 0.76% Params, 184.68 GMACs = 0.2% MACs, 1.94 ms = 0.38% latency, 190.53 TFLOPS)
          )
        )
        (2): ParallelTransformerLayer(
          177.22 M = 2.99% Params, 2.92 TMACs = 3.11% MACs, 14.72 ms = 2.92% latency, 397.35 TFLOPS
          (input_layernorm): RMSNorm(4.1 K = 0% Params, 0 MACs = 0% MACs, 289.92 us = 0.06% latency, 0 FLOPS)
          (self_attention): ParallelAttention(
            41.94 M = 0.71% Params, 2.37 TMACs = 2.52% MACs, 6.51 ms = 1.29% latency, 728.49 TFLOPS
            (query_key_value): ColumnParallelLinear(25.17 M = 0.42% Params, 103.08 GMACs = 0.11% MACs, 1.19 ms = 0.24% latency, 173.15 TFLOPS)
            (core_attention_flash): FlashSelfAttention(0 = 0% Params, 2.2 TMACs = 2.34% MACs, 3.06 ms = 0.61% latency, 1436.55 TFLOPS)
            (dense): RowParallelLinear(16.78 M = 0.28% Params, 68.72 GMACs = 0.07% MACs, 816.82 us = 0.16% latency, 168.26 TFLOPS)
          )
          (post_attention_layernorm): RMSNorm(4.1 K = 0% Params, 0 MACs = 0% MACs, 261.55 us = 0.05% latency, 0 FLOPS)
          (mlp): ParallelMLP(
            135.27 M = 2.28% Params, 554.05 GMACs = 0.59% MACs, 7.24 ms = 1.44% latency, 153.13 TFLOPS
            (dense_h_to_4h): ColumnParallelLinear(90.18 M = 1.52% Params, 369.37 GMACs = 0.39% MACs, 3.75 ms = 0.74% latency, 196.93 TFLOPS)
            (dense_4h_to_h): RowParallelLinear(45.09 M = 0.76% Params, 184.68 GMACs = 0.2% MACs, 2.37 ms = 0.47% latency, 156.02 TFLOPS)
          )
        )
        (3): ParallelTransformerLayer(
          177.22 M = 2.99% Params, 2.92 TMACs = 3.11% MACs, 17.14 ms = 3.4% latency, 341.2 TFLOPS
          (input_layernorm): RMSNorm(4.1 K = 0% Params, 0 MACs = 0% MACs, 401.26 us = 0.08% latency, 0 FLOPS)
          (self_attention): ParallelAttention(
            41.94 M = 0.71% Params, 2.37 TMACs = 2.52% MACs, 8.4 ms = 1.67% latency, 564.23 TFLOPS
            (query_key_value): ColumnParallelLinear(25.17 M = 0.42% Params, 103.08 GMACs = 0.11% MACs, 1.61 ms = 0.32% latency, 127.86 TFLOPS)
            (core_attention_flash): FlashSelfAttention(0 = 0% Params, 2.2 TMACs = 2.34% MACs, 3.93 ms = 0.78% latency, 1119.61 TFLOPS)
            (dense): RowParallelLinear(16.78 M = 0.28% Params, 68.72 GMACs = 0.07% MACs, 872.85 us = 0.17% latency, 157.46 TFLOPS)
          )
          (post_attention_layernorm): RMSNorm(4.1 K = 0% Params, 0 MACs = 0% MACs, 297.78 us = 0.06% latency, 0 FLOPS)
          (mlp): ParallelMLP(
            135.27 M = 2.28% Params, 554.05 GMACs = 0.59% MACs, 7.6 ms = 1.51% latency, 145.82 TFLOPS
            (dense_h_to_4h): ColumnParallelLinear(90.18 M = 1.52% Params, 369.37 GMACs = 0.39% MACs, 3.77 ms = 0.75% latency, 195.92 TFLOPS)
            (dense_4h_to_h): RowParallelLinear(45.09 M = 0.76% Params, 184.68 GMACs = 0.2% MACs, 2.62 ms = 0.52% latency, 141.06 TFLOPS)
          )
        )
        (4): ParallelTransformerLayer(
          177.22 M = 2.99% Params, 2.92 TMACs = 3.11% MACs, 16.14 ms = 3.2% latency, 362.37 TFLOPS
          (input_layernorm): RMSNorm(4.1 K = 0% Params, 0 MACs = 0% MACs, 411.99 us = 0.08% latency, 0 FLOPS)
          (self_attention): ParallelAttention(
            41.94 M = 0.71% Params, 2.37 TMACs = 2.52% MACs, 7.78 ms = 1.54% latency, 609.61 TFLOPS
            (query_key_value): ColumnParallelLinear(25.17 M = 0.42% Params, 103.08 GMACs = 0.11% MACs, 1.5 ms = 0.3% latency, 137.21 TFLOPS)
            (core_attention_flash): FlashSelfAttention(0 = 0% Params, 2.2 TMACs = 2.34% MACs, 3.46 ms = 0.69% latency, 1270.79 TFLOPS)
            (dense): RowParallelLinear(16.78 M = 0.28% Params, 68.72 GMACs = 0.07% MACs, 923.63 us = 0.18% latency, 148.8 TFLOPS)
          )
          (post_attention_layernorm): RMSNorm(4.1 K = 0% Params, 0 MACs = 0% MACs, 308.51 us = 0.06% latency, 0 FLOPS)
          (mlp): ParallelMLP(
            135.27 M = 2.28% Params, 554.05 GMACs = 0.59% MACs, 7.22 ms = 1.43% latency, 153.44 TFLOPS
            (dense_h_to_4h): ColumnParallelLinear(90.18 M = 1.52% Params, 369.37 GMACs = 0.39% MACs, 3.79 ms = 0.75% latency, 195.08 TFLOPS)
            (dense_4h_to_h): RowParallelLinear(45.09 M = 0.76% Params, 184.68 GMACs = 0.2% MACs, 2.29 ms = 0.45% latency, 161.51 TFLOPS)
          )
        )
        (5): ParallelTransformerLayer(
          177.22 M = 2.99% Params, 2.92 TMACs = 3.11% MACs, 15.75 ms = 3.13% latency, 371.37 TFLOPS
          (input_layernorm): RMSNorm(4.1 K = 0% Params, 0 MACs = 0% MACs, 360.01 us = 0.07% latency, 0 FLOPS)
          (self_attention): ParallelAttention(
            41.94 M = 0.71% Params, 2.37 TMACs = 2.52% MACs, 6.99 ms = 1.39% latency, 678.54 TFLOPS
            (query_key_value): ColumnParallelLinear(25.17 M = 0.42% Params, 103.08 GMACs = 0.11% MACs, 1.43 ms = 0.28% latency, 143.83 TFLOPS)
            (core_attention_flash): FlashSelfAttention(0 = 0% Params, 2.2 TMACs = 2.34% MACs, 3.15 ms = 0.63% latency, 1396.21 TFLOPS)
            (dense): RowParallelLinear(16.78 M = 0.28% Params, 68.72 GMACs = 0.07% MACs, 804.42 us = 0.16% latency, 170.85 TFLOPS)
          )
          (post_attention_layernorm): RMSNorm(4.1 K = 0% Params, 0 MACs = 0% MACs, 283.24 us = 0.06% latency, 0 FLOPS)
          (mlp): ParallelMLP(
            135.27 M = 2.28% Params, 554.05 GMACs = 0.59% MACs, 7.72 ms = 1.53% latency, 143.56 TFLOPS
            (dense_h_to_4h): ColumnParallelLinear(90.18 M = 1.52% Params, 369.37 GMACs = 0.39% MACs, 3.9 ms = 0.77% latency, 189.22 TFLOPS)ostfile                           conda-install.log
            (dense_4h_to_h): RowParallelLinear(45.09 M = 0.76% Params, 184.68 GMACs = 0.2% MACs, 2.52 ms = 0.5% latency, 146.49 TFLOPS)
          )
        )
        (6): ParallelTransformerLayer(
          177.22 M = 2.99% Params, 2.92 TMACs = 3.11% MACs, 15.9 ms = 3.15% latency, 368.02 TFLOPS
          (input_layernorm): RMSNorm(4.1 K = 0% Params, 0 MACs = 0% MACs, 411.99 us = 0.08% latency, 0 FLOPS)
          (self_attention): ParallelAttention(
            41.94 M = 0.71% Params, 2.37 TMACs = 2.52% MACs, 7.01 ms = 1.39% latency, 676.76 TFLOPS
            (query_key_value): ColumnParallelLinear(25.17 M = 0.42% Params, 103.08 GMACs = 0.11% MACs, 1.4 ms = 0.28% latency, 147.18 TFLOPS)
            (core_attention_flash): FlashSelfAttention(0 = 0% Params, 2.2 TMACs = 2.34% MACs, 3.17 ms = 0.63% latency, 1386.14 TFLOPS)
            (dense): RowParallelLinear(16.78 M = 0.28% Params, 68.72 GMACs = 0.07% MACs, 801.32 us = 0.16% latency, 171.51 TFLOPS)
          )
          (post_attention_layernorm): RMSNorm(4.1 K = 0% Params, 0 MACs = 0% MACs, 310.18 us = 0.06% latency, 0 FLOPS)
          (mlp): ParallelMLP(
            135.27 M = 2.28% Params, 554.05 GMACs = 0.59% MACs, 7.74 ms = 1.54% latency, 143.22 TFLOPS
            (dense_h_to_4h): ColumnParallelLinear(90.18 M = 1.52% Params, 369.37 GMACs = 0.39% MACs, 3.98 ms = 0.79% latency, 185.46 TFLOPS)
            (dense_4h_to_h): RowParallelLinear(45.09 M = 0.76% Params, 184.68 GMACs = 0.2% MACs, 2.44 ms = 0.48% latency, 151.26 TFLOPS)
          )
        )
        (7): ParallelTransformerLayer(
          177.22 M = 2.99% Params, 2.92 TMACs = 3.11% MACs, 15.6 ms = 3.1% latency, 375.06 TFLOPS
          (input_layernorm): RMSNorm(4.1 K = 0% Params, 0 MACs = 0% MACs, 342.61 us = 0.07% latency, 0 FLOPS)
          (self_attention): ParallelAttention(
            41.94 M = 0.71% Params, 2.37 TMACs = 2.52% MACs, 6.83 ms = 1.36% latency, 694.05 TFLOPS
            (query_key_value): ColumnParallelLinear(25.17 M = 0.42% Params, 103.08 GMACs = 0.11% MACs, 1.31 ms = 0.26% latency, 156.82 TFLOPS)
            (core_attention_flash): FlashSelfAttention(0 = 0% Params, 2.2 TMACs = 2.34% MACs, 3.14 ms = 0.62% latency, 1402.05 TFLOPS)
            (dense): RowParallelLinear(16.78 M = 0.28% Params, 68.72 GMACs = 0.07% MACs, 886.92 us = 0.18% latency, 154.96 TFLOPS)
          )
          (post_attention_layernorm): RMSNorm(4.1 K = 0% Params, 0 MACs = 0% MACs, 308.28 us = 0.06% latency, 0 FLOPS)
          (mlp): ParallelMLP(
            135.27 M = 2.28% Params, 554.05 GMACs = 0.59% MACs, 7.71 ms = 1.53% latency, 143.76 TFLOPS
            (dense_h_to_4h): ColumnParallelLinear(90.18 M = 1.52% Params, 369.37 GMACs = 0.39% MACs, 4.04 ms = 0.8% latency, 183.04 TFLOPS)ore.96866                         2024-1.yml
            (dense_4h_to_h): RowParallelLinear(45.09 M = 0.76% Params, 184.68 GMACs = 0.2% MACs, 2.34 ms = 0.46% latency, 158.05 TFLOPS)
          )
        )
        (8): ParallelTransformerLayer(
          177.22 M = 2.99% Params, 2.92 TMACs = 3.11% MACs, 15.36 ms = 3.05% latency, 380.94 TFLOPS
          (input_layernorm): RMSNorm(4.1 K = 0% Params, 0 MACs = 0% MACs, 337.84 us = 0.07% latency, 0 FLOPS)
          (self_attention): ParallelAttention(
            41.94 M = 0.71% Params, 2.37 TMACs = 2.52% MACs, 7.12 ms = 1.41% latency, 665.7 TFLOPS
            (query_key_value): ColumnParallelLinear(25.17 M = 0.42% Params, 103.08 GMACs = 0.11% MACs, 1.25 ms = 0.25% latency, 164.95 TFLOPS)
            (core_attention_flash): FlashSelfAttention(0 = 0% Params, 2.2 TMACs = 2.34% MACs, 3.43 ms = 0.68% latency, 1282.45 TFLOPS)
            (dense): RowParallelLinear(16.78 M = 0.28% Params, 68.72 GMACs = 0.07% MACs, 936.27 us = 0.19% latency, 146.79 TFLOPS)
          )
          (post_attention_layernorm): RMSNorm(4.1 K = 0% Params, 0 MACs = 0% MACs, 339.98 us = 0.07% latency, 0 FLOPS)
          (mlp): ParallelMLP(
            135.27 M = 2.28% Params, 554.05 GMACs = 0.59% MACs, 7.16 ms = 1.42% latency, 154.82 TFLOPS
            (dense_h_to_4h): ColumnParallelLinear(90.18 M = 1.52% Params, 369.37 GMACs = 0.39% MACs, 3.89 ms = 0.77% latency, 190.11 TFLOPS)
            (dense_4h_to_h): RowParallelLinear(45.09 M = 0.76% Params, 184.68 GMACs = 0.2% MACs, 2.11 ms = 0.42% latency, 174.78 TFLOPS)
          )
        )
        (9): ParallelTransformerLayer(
          177.22 M = 2.99% Params, 2.92 TMACs = 3.11% MACs, 15.43 ms = 3.06% latency, 379.19 TFLOPS
          (input_layernorm): RMSNorm(4.1 K = 0% Params, 0 MACs = 0% MACs, 302.55 us = 0.06% latency, 0 FLOPS)
          (self_attention): ParallelAttention(
            41.94 M = 0.71% Params, 2.37 TMACs = 2.52% MACs, 7.19 ms = 1.43% latency, 659.48 TFLOPS
            (query_key_value): ColumnParallelLinear(25.17 M = 0.42% Params, 103.08 GMACs = 0.11% MACs, 1.24 ms = 0.25% latency, 166.19 TFLOPS)
            (core_attention_flash): FlashSelfAttention(0 = 0% Params, 2.2 TMACs = 2.34% MACs, 3.48 ms = 0.69% latency, 1264.86 TFLOPS)
            (dense): RowParallelLinear(16.78 M = 0.28% Params, 68.72 GMACs = 0.07% MACs, 920.06 us = 0.18% latency, 149.38 TFLOPS)
          )
          (post_attention_layernorm): RMSNorm(4.1 K = 0% Params, 0 MACs = 0% MACs, 317.57 us = 0.06% latency, 0 FLOPS)
          (mlp): ParallelMLP(
            135.27 M = 2.28% Params, 554.05 GMACs = 0.59% MACs, 7.22 ms = 1.43% latency, 153.4 TFLOPS
            (dense_h_to_4h): ColumnParallelLinear(90.18 M = 1.52% Params, 369.37 GMACs = 0.39% MACs, 3.95 ms = 0.78% latency, 186.89 TFLOPS)
            (dense_4h_to_h): RowParallelLinear(45.09 M = 0.76% Params, 184.68 GMACs = 0.2% MACs, 2.14 ms = 0.42% latency, 172.75 TFLOPS)
          )
        )
        (10): ParallelTransformerLayer(
          177.22 M = 2.99% Params, 2.92 TMACs = 3.11% MACs, 15.42 ms = 3.06% latency, 379.45 TFLOPS
          (input_layernorm): RMSNorm(4.1 K = 0% Params, 0 MACs = 0% MACs, 297.55 us = 0.06% latency, 0 FLOPS)
          (self_attention): ParallelAttention(
            41.94 M = 0.71% Params, 2.37 TMACs = 2.52% MACs, 7.51 ms = 1.49% latency, 631.06 TFLOPS
            (query_key_value): ColumnParallelLinear(25.17 M = 0.42% Params, 103.08 GMACs = 0.11% MACs, 1.27 ms = 0.25% latency, 161.9 TFLOPS)
            (core_attention_flash): FlashSelfAttention(0 = 0% Params, 2.2 TMACs = 2.34% MACs, 3.63 ms = 0.72% latency, 1212.33 TFLOPS)
            (dense): RowParallelLinear(16.78 M = 0.28% Params, 68.72 GMACs = 0.07% MACs, 926.97 us = 0.18% latency, 148.27 TFLOPS)
          )
          (post_attention_layernorm): RMSNorm(4.1 K = 0% Params, 0 MACs = 0% MACs, 327.35 us = 0.06% latency, 0 FLOPS)
          (mlp): ParallelMLP(
            135.27 M = 2.28% Params, 554.05 GMACs = 0.59% MACs, 6.86 ms = 1.36% latency, 161.55 TFLOPS
            (dense_h_to_4h): ColumnParallelLinear(90.18 M = 1.52% Params, 369.37 GMACs = 0.39% MACs, 3.74 ms = 0.74% latency, 197.76 TFLOPS)
            (dense_4h_to_h): RowParallelLinear(45.09 M = 0.76% Params, 184.68 GMACs = 0.2% MACs, 2.08 ms = 0.41% latency, 177.89 TFLOPS)
          )
        )
        (11): ParallelTransformerLayer(
          177.22 M = 2.99% Params, 2.92 TMACs = 3.11% MACs, 15.63 ms = 3.1% latency, 374.33 TFLOPS
          (input_layernorm): RMSNorm(4.1 K = 0% Params, 0 MACs = 0% MACs, 310.66 us = 0.06% latency, 0 FLOPS)
          (self_attention): ParallelAttention(
            41.94 M = 0.71% Params, 2.37 TMACs = 2.52% MACs, 7.7 ms = 1.53% latency, 615.93 TFLOPS
            (query_key_value): ColumnParallelLinear(25.17 M = 0.42% Params, 103.08 GMACs = 0.11% MACs, 1.41 ms = 0.28% latency, 145.96 TFLOPS)
            (core_attention_flash): FlashSelfAttention(0 = 0% Params, 2.2 TMACs = 2.34% MACs, 3.65 ms = 0.73% latency, 1203.47 TFLOPS)
            (dense): RowParallelLinear(16.78 M = 0.28% Params, 68.72 GMACs = 0.07% MACs, 897.17 us = 0.18% latency, 153.19 TFLOPS)
          )
          (post_attention_layernorm): RMSNorm(4.1 K = 0% Params, 0 MACs = 0% MACs, 326.16 us = 0.06% latency, 0 FLOPS)
          (mlp): ParallelMLP(
            135.27 M = 2.28% Params, 554.05 GMACs = 0.59% MACs, 6.92 ms = 1.37% latency, 160.15 TFLOPS
            (dense_h_to_4h): ColumnParallelLinear(90.18 M = 1.52% Params, 369.37 GMACs = 0.39% MACs, 3.76 ms = 0.75% latency, 196.73 TFLOPS)
            (dense_4h_to_h): RowParallelLinear(45.09 M = 0.76% Params, 184.68 GMACs = 0.2% MACs, 2.08 ms = 0.41% latency, 177.24 TFLOPS)
          )
        )
        (12): ParallelTransformerLayer(
          177.22 M = 2.99% Params, 2.92 TMACs = 3.11% MACs, 15.82 ms = 3.14% latency, 369.79 TFLOPS
          (input_layernorm): RMSNorm(4.1 K = 0% Params, 0 MACs = 0% MACs, 322.1 us = 0.06% latency, 0 FLOPS)
          (self_attention): ParallelAttention(
            41.94 M = 0.71% Params, 2.37 TMACs = 2.52% MACs, 7.68 ms = 1.52% latency, 617.46 TFLOPS
            (query_key_value): ColumnParallelLinear(25.17 M = 0.42% Params, 103.08 GMACs = 0.11% MACs, 1.36 ms = 0.27% latency, 151.33 TFLOPS)
            (core_attention_flash): FlashSelfAttention(0 = 0% Params, 2.2 TMACs = 2.34% MACs, 3.74 ms = 0.74% latency, 1177.13 TFLOPS)
            (dense): RowParallelLinear(16.78 M = 0.28% Params, 68.72 GMACs = 0.07% MACs, 934.84 us = 0.19% latency, 147.02 TFLOPS)
          )
          (post_attention_layernorm): RMSNorm(4.1 K = 0% Params, 0 MACs = 0% MACs, 307.32 us = 0.06% latency, 0 FLOPS)
          (mlp): ParallelMLP(
            135.27 M = 2.28% Params, 554.05 GMACs = 0.59% MACs, 7.14 ms = 1.42% latency, 155.2 TFLOPS
            (dense_h_to_4h): ColumnParallelLinear(90.18 M = 1.52% Params, 369.37 GMACs = 0.39% MACs, 3.89 ms = 0.77% latency, 190.04 TFLOPS)
            (dense_4h_to_h): RowParallelLinear(45.09 M = 0.76% Params, 184.68 GMACs = 0.2% MACs, 2.13 ms = 0.42% latency, 173.35 TFLOPS)
          )
        )
        (13): ParallelTransformerLayer(
          177.22 M = 2.99% Params, 2.92 TMACs = 3.11% MACs, 15.85 ms = 3.15% latency, 369.05 TFLOPS
          (input_layernorm): RMSNorm(4.1 K = 0% Params, 0 MACs = 0% MACs, 364.78 us = 0.07% latency, 0 FLOPS)
          (self_attention): ParallelAttention(
            41.94 M = 0.71% Params, 2.37 TMACs = 2.52% MACs, 7.54 ms = 1.5% latency, 629.2 TFLOPS
            (query_key_value): ColumnParallelLinear(25.17 M = 0.42% Params, 103.08 GMACs = 0.11% MACs, 1.42 ms = 0.28% latency, 144.99 TFLOPS)
            (core_attention_flash): FlashSelfAttention(0 = 0% Params, 2.2 TMACs = 2.34% MACs, 3.49 ms = 0.69% latency, 1260.8 TFLOPS)
            (dense): RowParallelLinear(16.78 M = 0.28% Params, 68.72 GMACs = 0.07% MACs, 840.66 us = 0.17% latency, 163.49 TFLOPS)
          )
          (post_attention_layernorm): RMSNorm(4.1 K = 0% Params, 0 MACs = 0% MACs, 295.88 us = 0.06% latency, 0 FLOPS)
          (mlp): ParallelMLP(
            135.27 M = 2.28% Params, 554.05 GMACs = 0.59% MACs, 7.28 ms = 1.44% latency, 152.21 TFLOPS
            (dense_h_to_4h): ColumnParallelLinear(90.18 M = 1.52% Params, 369.37 GMACs = 0.39% MACs, 3.94 ms = 0.78% latency, 187.28 TFLOPS)
            (dense_4h_to_h): RowParallelLinear(45.09 M = 0.76% Params, 184.68 GMACs = 0.2% MACs, 2.19 ms = 0.43% latency, 168.58 TFLOPS)
          )
        )
        (14): ParallelTransformerLayer(
          177.22 M = 2.99% Params, 2.92 TMACs = 3.11% MACs, 16.01 ms = 3.18% latency, 365.46 TFLOPS
          (input_layernorm): RMSNorm(4.1 K = 0% Params, 0 MACs = 0% MACs, 359.54 us = 0.07% latency, 0 FLOPS)
          (self_attention): ParallelAttention(
            41.94 M = 0.71% Params, 2.37 TMACs = 2.52% MACs, 7.54 ms = 1.5% latency, 628.51 TFLOPS
            (query_key_value): ColumnParallelLinear(25.17 M = 0.42% Params, 103.08 GMACs = 0.11% MACs, 1.41 ms = 0.28% latency, 146.29 TFLOPS)
            (core_attention_flash): FlashSelfAttention(0 = 0% Params, 2.2 TMACs = 2.34% MACs, 3.48 ms = 0.69% latency, 1264 TFLOPS)
            (dense): RowParallelLinear(16.78 M = 0.28% Params, 68.72 GMACs = 0.07% MACs, 881.67 us = 0.17% latency, 155.88 TFLOPS)
          )
          (post_attention_layernorm): RMSNorm(4.1 K = 0% Params, 0 MACs = 0% MACs, 299.22 us = 0.06% latency, 0 FLOPS)
          (mlp): ParallelMLP(
            135.27 M = 2.28% Params, 554.05 GMACs = 0.59% MACs, 7.41 ms = 1.47% latency, 149.48 TFLOPS
            (dense_h_to_4h): ColumnParallelLinear(90.18 M = 1.52% Params, 369.37 GMACs = 0.39% MACs, 3.97 ms = 0.79% latency, 186.08 TFLOPS)
            (dense_4h_to_h): RowParallelLinear(45.09 M = 0.76% Params, 184.68 GMACs = 0.2% MACs, 2.3 ms = 0.46% latency, 160.84 TFLOPS)
          )
        )
        (15): ParallelTransformerLayer(
          177.22 M = 2.99% Params, 2.92 TMACs = 3.11% MACs, 15.95 ms = 3.17% latency, 366.75 TFLOPS
          (input_layernorm): RMSNorm(4.1 K = 0% Params, 0 MACs = 0% MACs, 346.18 us = 0.07% latency, 0 FLOPS)
          (self_attention): ParallelAttention(
            41.94 M = 0.71% Params, 2.37 TMACs = 2.52% MACs, 7.39 ms = 1.47% latency, 641.75 TFLOPS
            (query_key_value): ColumnParallelLinear(25.17 M = 0.42% Params, 103.08 GMACs = 0.11% MACs, 1.41 ms = 0.28% latency, 146.71 TFLOPS)
            (core_attention_flash): FlashSelfAttention(0 = 0% Params, 2.2 TMACs = 2.34% MACs, 3.37 ms = 0.67% latency, 1306.43 TFLOPS)
            (dense): RowParallelLinear(16.78 M = 0.28% Params, 68.72 GMACs = 0.07% MACs, 931.5 us = 0.18% latency, 147.55 TFLOPS)
          )
          (post_attention_layernorm): RMSNorm(4.1 K = 0% Params, 0 MACs = 0% MACs, 292.54 us = 0.06% latency, 0 FLOPS)
          (mlp): ParallelMLP(
            135.27 M = 2.28% Params, 554.05 GMACs = 0.59% MACs, 7.56 ms = 1.5% latency, 146.55 TFLOPS
            (dense_h_to_4h): ColumnParallelLinear(90.18 M = 1.52% Params, 369.37 GMACs = 0.39% MACs, 3.85 ms = 0.77% latency, 191.64 TFLOPS)
            (dense_4h_to_h): RowParallelLinear(45.09 M = 0.76% Params, 184.68 GMACs = 0.2% MACs, 2.41 ms = 0.48% latency, 153.25 TFLOPS)
          )
        )
        (16): ParallelTransformerLayer(
          177.22 M = 2.99% Params, 2.92 TMACs = 3.11% MACs, 15.57 ms = 3.09% latency, 375.83 TFLOPS
          (input_layernorm): RMSNorm(4.1 K = 0% Params, 0 MACs = 0% MACs, 347.14 us = 0.07% latency, 0 FLOPS)
          (self_attention): ParallelAttention(
            41.94 M = 0.71% Params, 2.37 TMACs = 2.52% MACs, 6.74 ms = 1.34% latency, 703.1 TFLOPS
            (query_key_value): ColumnParallelLinear(25.17 M = 0.42% Params, 103.08 GMACs = 0.11% MACs, 1.32 ms = 0.26% latency, 156.68 TFLOPS)
            (core_attention_flash): FlashSelfAttention(0 = 0% Params, 2.2 TMACs = 2.34% MACs, 3.03 ms = 0.6% latency, 1449.65 TFLOPS)
            (dense): RowParallelLinear(16.78 M = 0.28% Params, 68.72 GMACs = 0.07% MACs, 813.96 us = 0.16% latency, 168.85 TFLOPS)
          )
          (post_attention_layernorm): RMSNorm(4.1 K = 0% Params, 0 MACs = 0% MACs, 287.29 us = 0.06% latency, 0 FLOPS)
          (mlp): ParallelMLP(
            135.27 M = 2.28% Params, 554.05 GMACs = 0.59% MACs, 7.82 ms = 1.55% latency, 141.64 TFLOPS
            (dense_h_to_4h): ColumnParallelLinear(90.18 M = 1.52% Params, 369.37 GMACs = 0.39% MACs, 4.12 ms = 0.82% latency, 179.41 TFLOPS)
            (dense_4h_to_h): RowParallelLinear(45.09 M = 0.76% Params, 184.68 GMACs = 0.2% MACs, 2.41 ms = 0.48% latency, 153.03 TFLOPS)
          )
        )
        (17): ParallelTransformerLayer(
          177.22 M = 2.99% Params, 2.92 TMACs = 3.11% MACs, 15.46 ms = 3.07% latency, 378.41 TFLOPS
          (input_layernorm): RMSNorm(4.1 K = 0% Params, 0 MACs = 0% MACs, 322.34 us = 0.06% latency, 0 FLOPS)
          (self_attention): ParallelAttention(
            41.94 M = 0.71% Params, 2.37 TMACs = 2.52% MACs, 6.78 ms = 1.35% latency, 699.51 TFLOPS
            (query_key_value): ColumnParallelLinear(25.17 M = 0.42% Params, 103.08 GMACs = 0.11% MACs, 1.24 ms = 0.25% latency, 166.29 TFLOPS)
            (core_attention_flash): FlashSelfAttention(0 = 0% Params, 2.2 TMACs = 2.34% MACs, 3.15 ms = 0.63% latency, 1394.1 TFLOPS)
            (dense): RowParallelLinear(16.78 M = 0.28% Params, 68.72 GMACs = 0.07% MACs, 873.8 us = 0.17% latency, 157.29 TFLOPS)
          )
          (post_attention_layernorm): RMSNorm(4.1 K = 0% Params, 0 MACs = 0% MACs, 300.88 us = 0.06% latency, 0 FLOPS)
          (mlp): ParallelMLP(
            135.27 M = 2.28% Params, 554.05 GMACs = 0.59% MACs, 7.7 ms = 1.53% latency, 143.92 TFLOPS
            (dense_h_to_4h): ColumnParallelLinear(90.18 M = 1.52% Params, 369.37 GMACs = 0.39% MACs, 4.02 ms = 0.8% latency, 183.71 TFLOPS)amed 'core.96915' -> 'ignore/core-dumps/core.96915'
            (dense_4h_to_h): RowParallelLinear(45.09 M = 0.76% Params, 184.68 GMACs = 0.2% MACs, 2.34 ms = 0.46% latency, 157.73 TFLOPS)
          )
        )
        (18): ParallelTransformerLayer(
          177.22 M = 2.99% Params, 2.92 TMACs = 3.11% MACs, 15.37 ms = 3.05% latency, 380.63 TFLOPS
          (input_layernorm): RMSNorm(4.1 K = 0% Params, 0 MACs = 0% MACs, 305.65 us = 0.06% latency, 0 FLOPS)
          (self_attention): ParallelAttention(
            41.94 M = 0.71% Params, 2.37 TMACs = 2.52% MACs, 6.78 ms = 1.35% latency, 699.64 TFLOPS
            (query_key_value): ColumnParallelLinear(25.17 M = 0.42% Params, 103.08 GMACs = 0.11% MACs, 1.22 ms = 0.24% latency, 169.58 TFLOPS)
            (core_attention_flash): FlashSelfAttention(0 = 0% Params, 2.2 TMACs = 2.34% MACs, 3.16 ms = 0.63% latency, 1392.63 TFLOPS)
            (dense): RowParallelLinear(16.78 M = 0.28% Params, 68.72 GMACs = 0.07% MACs, 859.26 us = 0.17% latency, 159.95 TFLOPS)
          )
          (post_attention_layernorm): RMSNorm(4.1 K = 0% Params, 0 MACs = 0% MACs, 329.49 us = 0.07% latency, 0 FLOPS)
          (mlp): ParallelMLP(
            135.27 M = 2.28% Params, 554.05 GMACs = 0.59% MACs, 7.61 ms = 1.51% latency, 145.68 TFLOPS
            (dense_h_to_4h): ColumnParallelLinear(90.18 M = 1.52% Params, 369.37 GMACs = 0.39% MACs, 4 ms = 0.79% latency, 184.76 TFLOPS)  cache                             test.sh.e1966946
            (dense_4h_to_h): RowParallelLinear(45.09 M = 0.76% Params, 184.68 GMACs = 0.2% MACs, 2.32 ms = 0.46% latency, 159.5 TFLOPS)
          )
        )
        (19): ParallelTransformerLayer(
          177.22 M = 2.99% Params, 2.92 TMACs = 3.11% MACs, 15.29 ms = 3.03% latency, 382.67 TFLOPS
          (input_layernorm): RMSNorm(4.1 K = 0% Params, 0 MACs = 0% MACs, 305.18 us = 0.06% latency, 0 FLOPS)
          (self_attention): ParallelAttention(
            41.94 M = 0.71% Params, 2.37 TMACs = 2.52% MACs, 6.9 ms = 1.37% latency, 686.83 TFLOPS
            (query_key_value): ColumnParallelLinear(25.17 M = 0.42% Params, 103.08 GMACs = 0.11% MACs, 1.22 ms = 0.24% latency, 168.39 TFLOPS)
            (core_attention_flash): FlashSelfAttention(0 = 0% Params, 2.2 TMACs = 2.34% MACs, 3.35 ms = 0.66% latency, 1314.43 TFLOPS)
            (dense): RowParallelLinear(16.78 M = 0.28% Params, 68.72 GMACs = 0.07% MACs, 929.36 us = 0.18% latency, 147.89 TFLOPS)
          )
          (post_attention_layernorm): RMSNorm(4.1 K = 0% Params, 0 MACs = 0% MACs, 294.69 us = 0.06% latency, 0 FLOPS)
          (mlp): ParallelMLP(
            135.27 M = 2.28% Params, 554.05 GMACs = 0.59% MACs, 7.44 ms = 1.48% latency, 148.89 TFLOPS
            (dense_h_to_4h): ColumnParallelLinear(90.18 M = 1.52% Params, 369.37 GMACs = 0.39% MACs, 4.03 ms = 0.8% latency, 183.5 TFLOPS) tc                                aGPT-7B.e863451
            (dense_4h_to_h): RowParallelLinear(45.09 M = 0.76% Params, 184.68 GMACs = 0.2% MACs, 2.23 ms = 0.44% latency, 165.52 TFLOPS)
          )
        )
        (20): ParallelTransformerLayer(
          177.22 M = 2.99% Params, 2.92 TMACs = 3.11% MACs, 15.18 ms = 3.01% latency, 385.24 TFLOPS
          (input_layernorm): RMSNorm(4.1 K = 0% Params, 0 MACs = 0% MACs, 348.81 us = 0.07% latency, 0 FLOPS)
          (self_attention): ParallelAttention(
            41.94 M = 0.71% Params, 2.37 TMACs = 2.52% MACs, 7.13 ms = 1.42% latency, 664.95 TFLOPS
            (query_key_value): ColumnParallelLinear(25.17 M = 0.42% Params, 103.08 GMACs = 0.11% MACs, 1.22 ms = 0.24% latency, 169.45 TFLOPS)
            (core_attention_flash): FlashSelfAttention(0 = 0% Params, 2.2 TMACs = 2.34% MACs, 3.52 ms = 0.7% latency, 1248.6 TFLOPS)
            (dense): RowParallelLinear(16.78 M = 0.28% Params, 68.72 GMACs = 0.07% MACs, 863.31 us = 0.17% latency, 159.2 TFLOPS)
          )
          (post_attention_layernorm): RMSNorm(4.1 K = 0% Params, 0 MACs = 0% MACs, 325.92 us = 0.06% latency, 0 FLOPS)
          (mlp): ParallelMLP(
            135.27 M = 2.28% Params, 554.05 GMACs = 0.59% MACs, 7.03 ms = 1.4% latency, 157.53 TFLOPS
            (dense_h_to_4h): ColumnParallelLinear(90.18 M = 1.52% Params, 369.37 GMACs = 0.39% MACs, 3.8 ms = 0.75% latency, 194.57 TFLOPS)05842.log
            (dense_4h_to_h): RowParallelLinear(45.09 M = 0.76% Params, 184.68 GMACs = 0.2% MACs, 2.11 ms = 0.42% latency, 175.25 TFLOPS)
          )
        )
        (21): ParallelTransformerLayer(
          177.22 M = 2.99% Params, 2.92 TMACs = 3.11% MACs, 15.52 ms = 3.08% latency, 376.99 TFLOPS
          (input_layernorm): RMSNorm(4.1 K = 0% Params, 0 MACs = 0% MACs, 344.99 us = 0.07% latency, 0 FLOPS)
          (self_attention): ParallelAttention(
            41.94 M = 0.71% Params, 2.37 TMACs = 2.52% MACs, 7.6 ms = 1.51% latency, 624.19 TFLOPS
            (query_key_value): ColumnParallelLinear(25.17 M = 0.42% Params, 103.08 GMACs = 0.11% MACs, 1.26 ms = 0.25% latency, 163.09 TFLOPS)
            (core_attention_flash): FlashSelfAttention(0 = 0% Params, 2.2 TMACs = 2.34% MACs, 3.78 ms = 0.75% latency, 1163.98 TFLOPS)
            (dense): RowParallelLinear(16.78 M = 0.28% Params, 68.72 GMACs = 0.07% MACs, 912.43 us = 0.18% latency, 150.63 TFLOPS)
          )
          (post_attention_layernorm): RMSNorm(4.1 K = 0% Params, 0 MACs = 0% MACs, 296.35 us = 0.06% latency, 0 FLOPS)
          (mlp): ParallelMLP(
            135.27 M = 2.28% Params, 554.05 GMACs = 0.59% MACs, 6.93 ms = 1.38% latency, 159.95 TFLOPS
            (dense_h_to_4h): ColumnParallelLinear(90.18 M = 1.52% Params, 369.37 GMACs = 0.39% MACs, 3.67 ms = 0.73% latency, 201.02 TFLOPS)
            (dense_4h_to_h): RowParallelLinear(45.09 M = 0.76% Params, 184.68 GMACs = 0.2% MACs, 2.16 ms = 0.43% latency, 171.34 TFLOPS)
          )
        )
        (22): ParallelTransformerLayer(
          177.22 M = 2.99% Params, 2.92 TMACs = 3.11% MACs, 15.53 ms = 3.08% latency, 376.7 TFLOPS
          (input_layernorm): RMSNorm(4.1 K = 0% Params, 0 MACs = 0% MACs, 304.7 us = 0.06% latency, 0 FLOPS)
          (self_attention): ParallelAttention(
            41.94 M = 0.71% Params, 2.37 TMACs = 2.52% MACs, 7.66 ms = 1.52% latency, 618.73 TFLOPS
            (query_key_value): ColumnParallelLinear(25.17 M = 0.42% Params, 103.08 GMACs = 0.11% MACs, 1.34 ms = 0.27% latency, 153.29 TFLOPS)
            (core_attention_flash): FlashSelfAttention(0 = 0% Params, 2.2 TMACs = 2.34% MACs, 3.7 ms = 0.73% latency, 1189.35 TFLOPS)
            (dense): RowParallelLinear(16.78 M = 0.28% Params, 68.72 GMACs = 0.07% MACs, 910.28 us = 0.18% latency, 150.99 TFLOPS)
          )
          (post_attention_layernorm): RMSNorm(4.1 K = 0% Params, 0 MACs = 0% MACs, 295.16 us = 0.06% latency, 0 FLOPS)
          (mlp): ParallelMLP(
            135.27 M = 2.28% Params, 554.05 GMACs = 0.59% MACs, 6.92 ms = 1.37% latency, 160.1 TFLOPS
            (dense_h_to_4h): ColumnParallelLinear(90.18 M = 1.52% Params, 369.37 GMACs = 0.39% MACs, 3.84 ms = 0.76% latency, 192.54 TFLOPS)
            (dense_4h_to_h): RowParallelLinear(45.09 M = 0.76% Params, 184.68 GMACs = 0.2% MACs, 2 ms = 0.4% latency, 184.98 TFLOPS)
          )
        )
        (23): ParallelTransformerLayer(
          177.22 M = 2.99% Params, 2.92 TMACs = 3.11% MACs, 15.54 ms = 3.08% latency, 376.43 TFLOPS
          (input_layernorm): RMSNorm(4.1 K = 0% Params, 0 MACs = 0% MACs, 342.37 us = 0.07% latency, 0 FLOPS)
          (self_attention): ParallelAttention(
            41.94 M = 0.71% Params, 2.37 TMACs = 2.52% MACs, 7.66 ms = 1.52% latency, 618.92 TFLOPS
            (query_key_value): ColumnParallelLinear(25.17 M = 0.42% Params, 103.08 GMACs = 0.11% MACs, 1.34 ms = 0.27% latency, 154.05 TFLOPS)
            (core_attention_flash): FlashSelfAttention(0 = 0% Params, 2.2 TMACs = 2.34% MACs, 3.71 ms = 0.74% latency, 1185.22 TFLOPS)
            (dense): RowParallelLinear(16.78 M = 0.28% Params, 68.72 GMACs = 0.07% MACs, 877.62 us = 0.17% latency, 156.6 TFLOPS)
          )
          (post_attention_layernorm): RMSNorm(4.1 K = 0% Params, 0 MACs = 0% MACs, 345.47 us = 0.07% latency, 0 FLOPS)
          (mlp): ParallelMLP(
            135.27 M = 2.28% Params, 554.05 GMACs = 0.59% MACs, 6.84 ms = 1.36% latency, 161.98 TFLOPS
            (dense_h_to_4h): ColumnParallelLinear(90.18 M = 1.52% Params, 369.37 GMACs = 0.39% MACs, 3.67 ms = 0.73% latency, 201.21 TFLOPS)
            (dense_4h_to_h): RowParallelLinear(45.09 M = 0.76% Params, 184.68 GMACs = 0.2% MACs, 2.08 ms = 0.41% latency, 177.18 TFLOPS)
          )
        )
        (24): ParallelTransformerLayer(
          177.22 M = 2.99% Params, 2.92 TMACs = 3.11% MACs, 15.68 ms = 3.11% latency, 373.05 TFLOPS
          (input_layernorm): RMSNorm(4.1 K = 0% Params, 0 MACs = 0% MACs, 324.25 us = 0.06% latency, 0 FLOPS)
          (self_attention): ParallelAttention(
            41.94 M = 0.71% Params, 2.37 TMACs = 2.52% MACs, 7.62 ms = 1.51% latency, 622.06 TFLOPS
            (query_key_value): ColumnParallelLinear(25.17 M = 0.42% Params, 103.08 GMACs = 0.11% MACs, 1.4 ms = 0.28% latency, 147.68 TFLOPS)
            (core_attention_flash): FlashSelfAttention(0 = 0% Params, 2.2 TMACs = 2.34% MACs, 3.66 ms = 0.73% latency, 1200.33 TFLOPS)
            (dense): RowParallelLinear(16.78 M = 0.28% Params, 68.72 GMACs = 0.07% MACs, 857.35 us = 0.17% latency, 160.31 TFLOPS)
          )
          (post_attention_layernorm): RMSNorm(4.1 K = 0% Params, 0 MACs = 0% MACs, 275.37 us = 0.05% latency, 0 FLOPS)
          (mlp): ParallelMLP(
            135.27 M = 2.28% Params, 554.05 GMACs = 0.59% MACs, 7.1 ms = 1.41% latency, 156.12 TFLOPS
            (dense_h_to_4h): ColumnParallelLinear(90.18 M = 1.52% Params, 369.37 GMACs = 0.39% MACs, 3.77 ms = 0.75% latency, 195.7 TFLOPS)amed 'test.sh.e2992234' -> 'pbslogs/test.sh.e2992234'
            (dense_4h_to_h): RowParallelLinear(45.09 M = 0.76% Params, 184.68 GMACs = 0.2% MACs, 2.22 ms = 0.44% latency, 166.07 TFLOPS)
          )
        )
        (25): ParallelTransformerLayer(
          177.22 M = 2.99% Params, 2.92 TMACs = 3.11% MACs, 15.61 ms = 3.1% latency, 374.68 TFLOPS
          (input_layernorm): RMSNorm(4.1 K = 0% Params, 0 MACs = 0% MACs, 319.72 us = 0.06% latency, 0 FLOPS)
          (self_attention): ParallelAttention(
            41.94 M = 0.71% Params, 2.37 TMACs = 2.52% MACs, 7.45 ms = 1.48% latency, 636.49 TFLOPS
            (query_key_value): ColumnParallelLinear(25.17 M = 0.42% Params, 103.08 GMACs = 0.11% MACs, 1.37 ms = 0.27% latency, 150.7 TFLOPS)
            (core_attention_flash): FlashSelfAttention(0 = 0% Params, 2.2 TMACs = 2.34% MACs, 3.55 ms = 0.7% latency, 1240.45 TFLOPS)
            (dense): RowParallelLinear(16.78 M = 0.28% Params, 68.72 GMACs = 0.07% MACs, 809.91 us = 0.16% latency, 169.7 TFLOPS)
          )
          (post_attention_layernorm): RMSNorm(4.1 K = 0% Params, 0 MACs = 0% MACs, 297.07 us = 0.06% latency, 0 FLOPS)
          (mlp): ParallelMLP(
            135.27 M = 2.28% Params, 554.05 GMACs = 0.59% MACs, 7.21 ms = 1.43% latency, 153.8 TFLOPS
            (dense_h_to_4h): ColumnParallelLinear(90.18 M = 1.52% Params, 369.37 GMACs = 0.39% MACs, 3.84 ms = 0.76% latency, 192.14 TFLOPS)
            (dense_4h_to_h): RowParallelLinear(45.09 M = 0.76% Params, 184.68 GMACs = 0.2% MACs, 2.24 ms = 0.45% latency, 164.69 TFLOPS)
          )
        )
        (26): ParallelTransformerLayer(
          177.22 M = 2.99% Params, 2.92 TMACs = 3.11% MACs, 15.51 ms = 3.08% latency, 377.17 TFLOPS
          (input_layernorm): RMSNorm(4.1 K = 0% Params, 0 MACs = 0% MACs, 347.85 us = 0.07% latency, 0 FLOPS)
          (self_attention): ParallelAttention(
            41.94 M = 0.71% Params, 2.37 TMACs = 2.52% MACs, 7.36 ms = 1.46% latency, 644.48 TFLOPS
            (query_key_value): ColumnParallelLinear(25.17 M = 0.42% Params, 103.08 GMACs = 0.11% MACs, 1.4 ms = 0.28% latency, 147.66 TFLOPS)
            (core_attention_flash): FlashSelfAttention(0 = 0% Params, 2.2 TMACs = 2.34% MACs, 3.45 ms = 0.68% latency, 1275.36 TFLOPS)
            (dense): RowParallelLinear(16.78 M = 0.28% Params, 68.72 GMACs = 0.07% MACs, 828.27 us = 0.16% latency, 165.94 TFLOPS)
          )
          (post_attention_layernorm): RMSNorm(4.1 K = 0% Params, 0 MACs = 0% MACs, 323.53 us = 0.06% latency, 0 FLOPS)
          (mlp): ParallelMLP(
            135.27 M = 2.28% Params, 554.05 GMACs = 0.59% MACs, 7.13 ms = 1.41% latency, 155.5 TFLOPS
            (dense_h_to_4h): ColumnParallelLinear(90.18 M = 1.52% Params, 369.37 GMACs = 0.39% MACs, 3.64 ms = 0.72% latency, 202.86 TFLOPS)
            (dense_4h_to_h): RowParallelLinear(45.09 M = 0.76% Params, 184.68 GMACs = 0.2% MACs, 2.35 ms = 0.47% latency, 157.24 TFLOPS)
          )
        )
        (27): ParallelTransformerLayer(
          177.22 M = 2.99% Params, 2.92 TMACs = 3.11% MACs, 15.94 ms = 3.16% latency, 366.89 TFLOPS
          (input_layernorm): RMSNorm(4.1 K = 0% Params, 0 MACs = 0% MACs, 348.57 us = 0.07% latency, 0 FLOPS)
          (self_attention): ParallelAttention(
            41.94 M = 0.71% Params, 2.37 TMACs = 2.52% MACs, 7.45 ms = 1.48% latency, 636.58 TFLOPS
            (query_key_value): ColumnParallelLinear(25.17 M = 0.42% Params, 103.08 GMACs = 0.11% MACs, 1.49 ms = 0.3% latency, 138.48 TFLOPS)
            (core_attention_flash): FlashSelfAttention(0 = 0% Params, 2.2 TMACs = 2.34% MACs, 3.28 ms = 0.65% latency, 1339.73 TFLOPS)
            (dense): RowParallelLinear(16.78 M = 0.28% Params, 68.72 GMACs = 0.07% MACs, 932.22 us = 0.19% latency, 147.43 TFLOPS)
          )
          (post_attention_layernorm): RMSNorm(4.1 K = 0% Params, 0 MACs = 0% MACs, 293.02 us = 0.06% latency, 0 FLOPS)
          (mlp): ParallelMLP(
            135.27 M = 2.28% Params, 554.05 GMACs = 0.59% MACs, 7.5 ms = 1.49% latency, 147.7 TFLOPS
            (dense_h_to_4h): ColumnParallelLinear(90.18 M = 1.52% Params, 369.37 GMACs = 0.39% MACs, 3.84 ms = 0.76% latency, 192.62 TFLOPS)
            (dense_4h_to_h): RowParallelLinear(45.09 M = 0.76% Params, 184.68 GMACs = 0.2% MACs, 2.41 ms = 0.48% latency, 152.97 TFLOPS)
          )
        )
        (28): ParallelTransformerLayer(
          177.22 M = 2.99% Params, 2.92 TMACs = 3.11% MACs, 15.05 ms = 2.99% latency, 388.8 TFLOPS
          (input_layernorm): RMSNorm(4.1 K = 0% Params, 0 MACs = 0% MACs, 416.04 us = 0.08% latency, 0 FLOPS)
          (self_attention): ParallelAttention(
            41.94 M = 0.71% Params, 2.37 TMACs = 2.52% MACs, 7.96 ms = 1.58% latency, 595.86 TFLOPS
            (query_key_value): ColumnParallelLinear(25.17 M = 0.42% Params, 103.08 GMACs = 0.11% MACs, 1.46 ms = 0.29% latency, 140.78 TFLOPS)
            (core_attention_flash): FlashSelfAttention(0 = 0% Params, 2.2 TMACs = 2.34% MACs, 3.85 ms = 0.76% latency, 1141.72 TFLOPS)
            (dense): RowParallelLinear(16.78 M = 0.28% Params, 68.72 GMACs = 0.07% MACs, 862.12 us = 0.17% latency, 159.42 TFLOPS)
          )
          (post_attention_layernorm): RMSNorm(4.1 K = 0% Params, 0 MACs = 0% MACs, 280.38 us = 0.06% latency, 0 FLOPS)
          (mlp): ParallelMLP(
            135.27 M = 2.28% Params, 554.05 GMACs = 0.59% MACs, 6.07 ms = 1.2% latency, 182.69 TFLOPS
            (dense_h_to_4h): ColumnParallelLinear(90.18 M = 1.52% Params, 369.37 GMACs = 0.39% MACs, 3.27 ms = 0.65% latency, 225.59 TFLOPS)
            (dense_4h_to_h): RowParallelLinear(45.09 M = 0.76% Params, 184.68 GMACs = 0.2% MACs, 1.84 ms = 0.37% latency, 200.55 TFLOPS)
          )
        )
        (29): ParallelTransformerLayer(
          177.22 M = 2.99% Params, 2.92 TMACs = 3.11% MACs, 12.59 ms = 2.5% latency, 464.79 TFLOPS
          (input_layernorm): RMSNorm(4.1 K = 0% Params, 0 MACs = 0% MACs, 295.4 us = 0.06% latency, 0 FLOPS)
          (self_attention): ParallelAttention(
            41.94 M = 0.71% Params, 2.37 TMACs = 2.52% MACs, 6.1 ms = 1.21% latency, 777.81 TFLOPS
            (query_key_value): ColumnParallelLinear(25.17 M = 0.42% Params, 103.08 GMACs = 0.11% MACs, 1.1 ms = 0.22% latency, 187.12 TFLOPS)
            (core_attention_flash): FlashSelfAttention(0 = 0% Params, 2.2 TMACs = 2.34% MACs, 2.83 ms = 0.56% latency, 1556.03 TFLOPS)
            (dense): RowParallelLinear(16.78 M = 0.28% Params, 68.72 GMACs = 0.07% MACs, 795.6 us = 0.16% latency, 172.75 TFLOPS)
          )
          (post_attention_layernorm): RMSNorm(4.1 K = 0% Params, 0 MACs = 0% MACs, 236.51 us = 0.05% latency, 0 FLOPS)
          (mlp): ParallelMLP(
            135.27 M = 2.28% Params, 554.05 GMACs = 0.59% MACs, 5.65 ms = 1.12% latency, 196.09 TFLOPS
            (dense_h_to_4h): ColumnParallelLinear(90.18 M = 1.52% Params, 369.37 GMACs = 0.39% MACs, 3.01 ms = 0.6% latency, 245.4 TFLOPS) orkspace                          anl_2024_12_release.tar.gz
            (dense_4h_to_h): RowParallelLinear(45.09 M = 0.76% Params, 184.68 GMACs = 0.2% MACs, 1.74 ms = 0.35% latency, 212.14 TFLOPS)
          )
        )
        (30): ParallelTransformerLayer(
          177.22 M = 2.99% Params, 2.92 TMACs = 3.11% MACs, 11.73 ms = 2.33% latency, 498.68 TFLOPS
          (input_layernorm): RMSNorm(4.1 K = 0% Params, 0 MACs = 0% MACs, 221.97 us = 0.04% latency, 0 FLOPS)
          (self_attention): ParallelAttention(
            41.94 M = 0.71% Params, 2.37 TMACs = 2.52% MACs, 5.25 ms = 1.04% latency, 903.75 TFLOPS
            (query_key_value): ColumnParallelLinear(25.17 M = 0.42% Params, 103.08 GMACs = 0.11% MACs, 1.06 ms = 0.21% latency, 195.23 TFLOPS)
            (core_attention_flash): FlashSelfAttention(0 = 0% Params, 2.2 TMACs = 2.34% MACs, 2.1 ms = 0.42% latency, 2094.55 TFLOPS)
            (dense): RowParallelLinear(16.78 M = 0.28% Params, 68.72 GMACs = 0.07% MACs, 958.68 us = 0.19% latency, 143.36 TFLOPS)
          )
          (post_attention_layernorm): RMSNorm(4.1 K = 0% Params, 0 MACs = 0% MACs, 231.27 us = 0.05% latency, 0 FLOPS)
          (mlp): ParallelMLP(
            135.27 M = 2.28% Params, 554.05 GMACs = 0.59% MACs, 5.73 ms = 1.14% latency, 193.38 TFLOPS
            (dense_h_to_4h): ColumnParallelLinear(90.18 M = 1.52% Params, 369.37 GMACs = 0.39% MACs, 3.2 ms = 0.63% latency, 230.97 TFLOPS)icromamba                         ipex-xpu-ops-install-2025-03-0
            (dense_4h_to_h): RowParallelLinear(45.09 M = 0.76% Params, 184.68 GMACs = 0.2% MACs, 1.63 ms = 0.32% latency, 226.07 TFLOPS)
          )
        )
        (31): ParallelTransformerLayer(
          177.22 M = 2.99% Params, 2.92 TMACs = 3.11% MACs, 12.29 ms = 2.44% latency, 475.97 TFLOPS
          (input_layernorm): RMSNorm(4.1 K = 0% Params, 0 MACs = 0% MACs, 215.77 us = 0.04% latency, 0 FLOPS)
          (self_attention): ParallelAttention(
            41.94 M = 0.71% Params, 2.37 TMACs = 2.52% MACs, 5.57 ms = 1.11% latency, 851.62 TFLOPS
            (query_key_value): ColumnParallelLinear(25.17 M = 0.42% Params, 103.08 GMACs = 0.11% MACs, 1.02 ms = 0.2% latency, 202.5 TFLOPS)
            (core_attention_flash): FlashSelfAttention(0 = 0% Params, 2.2 TMACs = 2.34% MACs, 2.47 ms = 0.49% latency, 1780.4 TFLOPS)
            (dense): RowParallelLinear(16.78 M = 0.28% Params, 68.72 GMACs = 0.07% MACs, 897.88 us = 0.18% latency, 153.07 TFLOPS)
          )
          (post_attention_layernorm): RMSNorm(4.1 K = 0% Params, 0 MACs = 0% MACs, 235.8 us = 0.05% latency, 0 FLOPS)
          (mlp): ParallelMLP(
            135.27 M = 2.28% Params, 554.05 GMACs = 0.59% MACs, 5.94 ms = 1.18% latency, 186.66 TFLOPS
            (dense_h_to_4h): ColumnParallelLinear(90.18 M = 1.52% Params, 369.37 GMACs = 0.39% MACs, 3.19 ms = 0.63% latency, 231.49 TFLOPS)
            (dense_4h_to_h): RowParallelLinear(45.09 M = 0.76% Params, 184.68 GMACs = 0.2% MACs, 1.85 ms = 0.37% latency, 199.31 TFLOPS)
          )
        )
      )
      (final_layernorm): RMSNorm(4.1 K = 0% Params, 0 MACs = 0% MACs, 341.65 us = 0.07% latency, 0 FLOPS)
    )
    (output_layer): ColumnParallelLinear(131.07 M = 2.21% Params, 0 MACs = 0% MACs, 0 s = 0% latency, 0 FLOPS)
  )
)
------------------------------------------------------------------------------
[2025-03-12 09:23:41,330] [INFO] [profiler.py:230:end_profile] Flops profiler finished
[2025-03-12 09:23:41,330] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd_microstep: 8004.76 | bwd_microstep: 18843.88 | bwd_inner_microstep: 18210.75 | bwd_allreduce_microstep: 632.82 | step_microstep: 237.44
[2025-03-12 09:23:41,330] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd: 0.00 | bwd: 0.00 | bwd_inner: 18210.81 | bwd_allreduce: 632.82 | step: 0.00
[2025-03-12 09:23:41][I][megatron/training_log:661]  iteration=       3/ 1271565 | consumed_samples=        1152 | consumed_tokens=     4718592 | elapsed_time_per_iteration_ms=27563.1 | learning_rate=9.43719e-09 | global_batch_size=  384 | lm loss=11.172977 | loss_scale=1.0 | grad_norm=10.962 | actual_seqlen= 4096 | number_of_skipped_iterations=  0 | number_of_nan_iterations=  0 | samples_per_second=13.932 | tokens_per_gpu_per_second_tgs=2377.670 | [LM]TFLOPs=98.09 | [DS]TFLOPs=94.26 |
(min, max) time across ranks (ms):
    forward-backward ...............................: (27285.52, 27286.41)
    optimizer ......................................: (236.47, 255.65)
[2025-03-12 09:24:08,861] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | optimizer_allgather: 223.17 | optimizer_gradients: 0.56 | optimizer_step: 1.08
[2025-03-12 09:24:08,862] [INFO] [logging.py:128:log_dist] [Rank 0] step=4, skipped=0, lr=[1.2582919473247535e-08, 1.2582919473247535e-08], mom=[(0.9, 0.95), (0.9, 0.95)]
[2025-03-12 09:24:08,862] [INFO] [timer.py:264:stop] epoch=0/micro_step=4/global_step=4, RunningAvgSamplesPerSec=151.43919798600436, CurrSamplesPerSec=150.6069635989099, MemAllocated=13.82GB, MaxMemAllocated=45.85GB
[2025-03-12 09:24:08,862] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd_microstep: 7758.90 | bwd_microstep: 19429.51 | bwd_inner_microstep: 18789.32 | bwd_allreduce_microstep: 639.89 | step_microstep: 237.57
[2025-03-12 09:24:08,863] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd: 7758.93 | bwd: 19429.51 | bwd_inner: 18789.37 | bwd_allreduce: 639.89 | step: 237.57
[2025-03-12 09:24:08][I][megatron/training_log:661]  iteration=       4/ 1271565 | consumed_samples=        1536 | consumed_tokens=     6291456 | elapsed_time_per_iteration_ms=27532.8 | learning_rate=1.25829e-08 | global_batch_size=  384 | lm loss=11.171650 | loss_scale=1.0 | grad_norm=10.696 | actual_seqlen= 4096 | number_of_skipped_iterations=  0 | number_of_nan_iterations=  0 | samples_per_second=13.947 | tokens_per_gpu_per_second_tgs=2380.292 | [LM]TFLOPs=98.19 | [DS]TFLOPs=94.36 |
(min, max) time across ranks (ms):
    forward-backward ...............................: (27272.97, 27273.83)
    optimizer ......................................: (236.54, 237.84)
[2025-03-12 09:24:36,410] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | optimizer_allgather: 222.74 | optimizer_gradients: 0.57 | optimizer_step: 1.05
[2025-03-12 09:24:36,410] [INFO] [logging.py:128:log_dist] [Rank 0] step=5, skipped=0, lr=[1.5728649341559417e-08, 1.5728649341559417e-08], mom=[(0.9, 0.95), (0.9, 0.95)]
[2025-03-12 09:24:36,411] [INFO] [timer.py:264:stop] epoch=0/micro_step=5/global_step=5, RunningAvgSamplesPerSec=151.1343465428925, CurrSamplesPerSec=150.52825193289877, MemAllocated=13.82GB, MaxMemAllocated=45.85GB
[2025-03-12 09:24:36,411] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd_microstep: 7759.95 | bwd_microstep: 19445.18 | bwd_inner_microstep: 18803.95 | bwd_allreduce_microstep: 640.93 | step_microstep: 237.32
[2025-03-12 09:24:36,411] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd: 7759.98 | bwd: 19445.17 | bwd_inner: 18804.01 | bwd_allreduce: 640.93 | step: 237.32
[2025-03-12 09:24:36][I][megatron/training_log:661]  iteration=       5/ 1271565 | consumed_samples=        1920 | consumed_tokens=     7864320 | elapsed_time_per_iteration_ms=27548.1 | learning_rate=1.57286e-08 | global_batch_size=  384 | lm loss=11.170979 | loss_scale=1.0 | grad_norm=10.837 | actual_seqlen= 4096 | number_of_skipped_iterations=  0 | number_of_nan_iterations=  0 | samples_per_second=13.939 | tokens_per_gpu_per_second_tgs=2378.962 | [LM]TFLOPs=98.14 | [DS]TFLOPs=94.31 |
(min, max) time across ranks (ms):
    forward-backward ...............................: (27288.93, 27289.64)
    optimizer ......................................: (236.47, 237.58)
[2025-03-12 09:25:03,946] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | optimizer_allgather: 223.06 | optimizer_gradients: 0.56 | optimizer_step: 1.02
[2025-03-12 09:25:03,946] [INFO] [logging.py:128:log_dist] [Rank 0] step=6, skipped=0, lr=[1.88743792098713e-08, 1.88743792098713e-08], mom=[(0.9, 0.95), (0.9, 0.95)]
[2025-03-12 09:25:03,946] [INFO] [timer.py:264:stop] epoch=0/micro_step=6/global_step=6, RunningAvgSamplesPerSec=150.95998968613813, CurrSamplesPerSec=150.43926562465137, MemAllocated=13.82GB, MaxMemAllocated=45.85GB
[2025-03-12 09:25:03,947] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd_microstep: 7770.25 | bwd_microstep: 19422.09 | bwd_inner_microstep: 18777.17 | bwd_allreduce_microstep: 644.63 | step_microstep: 237.35
[2025-03-12 09:25:03,947] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd: 7770.28 | bwd: 19422.09 | bwd_inner: 18777.22 | bwd_allreduce: 644.63 | step: 237.35
[2025-03-12 09:25:03][I][megatron/training_log:661]  iteration=       6/ 1271565 | consumed_samples=        2304 | consumed_tokens=     9437184 | elapsed_time_per_iteration_ms=27535.1 | learning_rate=1.88744e-08 | global_batch_size=  384 | lm loss=11.170456 | loss_scale=1.0 | grad_norm=10.977 | actual_seqlen= 4096 | number_of_skipped_iterations=  0 | number_of_nan_iterations=  0 | samples_per_second=13.946 | tokens_per_gpu_per_second_tgs=2380.093 | [LM]TFLOPs=98.19 | [DS]TFLOPs=94.35 |
(min, max) time across ranks (ms):
    forward-backward ...............................: (27275.75, 27276.84)
    optimizer ......................................: (236.54, 237.64)
[2025-03-12 09:25:31,504] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | optimizer_allgather: 223.49 | optimizer_gradients: 0.55 | optimizer_step: 1.04
[2025-03-12 09:25:31,504] [INFO] [logging.py:128:log_dist] [Rank 0] step=7, skipped=0, lr=[2.2020109078183185e-08, 2.2020109078183185e-08], mom=[(0.9, 0.95), (0.9, 0.95)]
[2025-03-12 09:25:31,505] [INFO] [timer.py:264:stop] epoch=0/micro_step=7/global_step=7, RunningAvgSamplesPerSec=150.86228890358845, CurrSamplesPerSec=150.47268817650092, MemAllocated=13.82GB, MaxMemAllocated=45.85GB
[2025-03-12 09:25:31,505] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd_microstep: 7749.64 | bwd_microstep: 19466.08 | bwd_inner_microstep: 18826.43 | bwd_allreduce_microstep: 639.35 | step_microstep: 237.81
[2025-03-12 09:25:31,505] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd: 7749.67 | bwd: 19466.08 | bwd_inner: 18826.49 | bwd_allreduce: 639.35 | step: 237.82
[2025-03-12 09:25:31][I][megatron/training_log:661]  iteration=       7/ 1271565 | consumed_samples=        2688 | consumed_tokens=    11010048 | elapsed_time_per_iteration_ms=27558.0 | learning_rate=2.20201e-08 | global_batch_size=  384 | lm loss=11.170864 | loss_scale=1.0 | grad_norm=10.770 | actual_seqlen= 4096 | number_of_skipped_iterations=  0 | number_of_nan_iterations=  0 | samples_per_second=13.934 | tokens_per_gpu_per_second_tgs=2378.112 | [LM]TFLOPs=98.10 | [DS]TFLOPs=94.27 |
(min, max) time across ranks (ms):
    forward-backward ...............................: (27298.44, 27299.20)
    optimizer ......................................: (236.96, 238.12)
[2025-03-12 09:25:59,059] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | optimizer_allgather: 223.21 | optimizer_gradients: 0.56 | optimizer_step: 1.04
[2025-03-12 09:25:59,060] [INFO] [logging.py:128:log_dist] [Rank 0] step=8, skipped=0, lr=[2.516583894649507e-08, 2.516583894649507e-08], mom=[(0.9, 0.95), (0.9, 0.95)]
[2025-03-12 09:25:59,060] [INFO] [timer.py:264:stop] epoch=0/micro_step=8/global_step=8, RunningAvgSamplesPerSec=150.75436123929416, CurrSamplesPerSec=150.21697225860072, MemAllocated=13.82GB, MaxMemAllocated=45.85GB
[2025-03-12 09:25:59,060] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd_microstep: 7761.38 | bwd_microstep: 19449.75 | bwd_inner_microstep: 18804.92 | bwd_allreduce_microstep: 644.53 | step_microstep: 237.69
[2025-03-12 09:25:59,060] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd: 7761.40 | bwd: 19449.74 | bwd_inner: 18804.98 | bwd_allreduce: 644.53 | step: 237.69
[2025-03-12 09:25:59][I][megatron/training_log:661]  iteration=       8/ 1271565 | consumed_samples=        3072 | consumed_tokens=    12582912 | elapsed_time_per_iteration_ms=27555.7 | learning_rate=2.51658e-08 | global_batch_size=  384 | lm loss=11.167645 | loss_scale=1.0 | grad_norm=10.765 | actual_seqlen= 4096 | number_of_skipped_iterations=  0 | number_of_nan_iterations=  0 | samples_per_second=13.935 | tokens_per_gpu_per_second_tgs=2378.312 | [LM]TFLOPs=98.11 | [DS]TFLOPs=94.28 |
(min, max) time across ranks (ms):
    forward-backward ...............................: (27295.04, 27295.93)
    optimizer ......................................: (236.43, 237.99)
[2025-03-12 09:26:26,601] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | optimizer_allgather: 223.18 | optimizer_gradients: 0.56 | optimizer_step: 1.07
[2025-03-12 09:26:26,602] [INFO] [logging.py:128:log_dist] [Rank 0] step=9, skipped=0, lr=[2.831156881480695e-08, 2.831156881480695e-08], mom=[(0.9, 0.95), (0.9, 0.95)]
[2025-03-12 09:26:26,602] [INFO] [timer.py:264:stop] epoch=0/micro_step=9/global_step=9, RunningAvgSamplesPerSec=150.74043559820174, CurrSamplesPerSec=150.65687664050782, MemAllocated=13.82GB, MaxMemAllocated=45.85GB
[2025-03-12 09:26:26,603] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd_microstep: 7760.35 | bwd_microstep: 19434.29 | bwd_inner_microstep: 18790.78 | bwd_allreduce_microstep: 643.22 | step_microstep: 237.73
[2025-03-12 09:26:26,603] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd: 7760.37 | bwd: 19434.29 | bwd_inner: 18790.83 | bwd_allreduce: 643.21 | step: 237.73
[2025-03-12 09:26:26][I][megatron/training_log:661]  iteration=       9/ 1271565 | consumed_samples=        3456 | consumed_tokens=    14155776 | elapsed_time_per_iteration_ms=27541.5 | learning_rate=2.83116e-08 | global_batch_size=  384 | lm loss=11.173370 | loss_scale=1.0 | grad_norm=10.662 | actual_seqlen= 4096 | number_of_skipped_iterations=  0 | number_of_nan_iterations=  0 | samples_per_second=13.943 | tokens_per_gpu_per_second_tgs=2379.535 | [LM]TFLOPs=98.16 | [DS]TFLOPs=94.33 |
(min, max) time across ranks (ms):
    forward-backward ...............................: (27281.87, 27282.46)
    optimizer ......................................: (236.88, 238.02)
[2025-03-12 09:26:54,144] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | optimizer_allgather: 223.17 | optimizer_gradients: 0.57 | optimizer_step: 1.04
[2025-03-12 09:26:54,145] [INFO] [logging.py:128:log_dist] [Rank 0] step=10, skipped=0, lr=[3.1457298683118834e-08, 3.1457298683118834e-08], mom=[(0.9, 0.95), (0.9, 0.95)]
[2025-03-12 09:26:54,145] [INFO] [timer.py:264:stop] epoch=0/micro_step=10/global_step=10, RunningAvgSamplesPerSec=150.69597672127665, CurrSamplesPerSec=150.38543847751936, MemAllocated=13.82GB, MaxMemAllocated=45.85GB
[2025-03-12 09:26:54,146] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd_microstep: 7759.82 | bwd_microstep: 19440.08 | bwd_inner_microstep: 18797.15 | bwd_allreduce_microstep: 642.62 | step_microstep: 237.56
[2025-03-12 09:26:54,146] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd: 7759.84 | bwd: 19440.08 | bwd_inner: 18797.21 | bwd_allreduce: 642.62 | step: 237.56
[2025-03-12 09:26:54][I][megatron/training_log:661]  iteration=      10/ 1271565 | consumed_samples=        3840 | consumed_tokens=    15728640 | elapsed_time_per_iteration_ms=27543.2 | learning_rate=3.14573e-08 | global_batch_size=  384 | lm loss=11.172956 | loss_scale=1.0 | grad_norm=10.671 | actual_seqlen= 4096 | number_of_skipped_iterations=  0 | number_of_nan_iterations=  0 | samples_per_second=13.942 | tokens_per_gpu_per_second_tgs=2379.391 | [LM]TFLOPs=98.16 | [DS]TFLOPs=94.32 |
(min, max) time across ranks (ms):
    forward-backward ...............................: (27283.03, 27283.88)
    optimizer ......................................: (236.72, 237.88)
[2025-03-12 09:27:21,695] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | optimizer_allgather: 222.93 | optimizer_gradients: 0.54 | optimizer_step: 1.04
[2025-03-12 09:27:21,695] [INFO] [logging.py:128:log_dist] [Rank 0] step=11, skipped=0, lr=[3.4603028551430715e-08, 3.4603028551430715e-08], mom=[(0.9, 0.95), (0.9, 0.95)]
[2025-03-12 09:27:21,696] [INFO] [timer.py:264:stop] epoch=0/micro_step=11/global_step=11, RunningAvgSamplesPerSec=150.7141854516556, CurrSamplesPerSec=150.85995459318923, MemAllocated=13.82GB, MaxMemAllocated=45.85GB
[2025-03-12 09:27:21,696] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd_microstep: 7762.67 | bwd_microstep: 19442.22 | bwd_inner_microstep: 18802.69 | bwd_allreduce_microstep: 639.24 | step_microstep: 237.45
[2025-03-12 09:27:21,696] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd: 7762.69 | bwd: 19442.22 | bwd_inner: 18802.74 | bwd_allreduce: 639.23 | step: 237.45
[2025-03-12 09:27:21][I][megatron/training_log:661]  iteration=      11/ 1271565 | consumed_samples=        4224 | consumed_tokens=    17301504 | elapsed_time_per_iteration_ms=27565.3 | learning_rate=3.4603e-08 | global_batch_size=  384 | lm loss=11.170139 | loss_scale=1.0 | grad_norm=10.682 | actual_seqlen= 4096 | number_of_skipped_iterations=  0 | number_of_nan_iterations=  0 | samples_per_second=13.931 | tokens_per_gpu_per_second_tgs=2377.482 | [LM]TFLOPs=98.08 | [DS]TFLOPs=94.25 |
(min, max) time across ranks (ms):
    forward-backward ...............................: (27290.26, 27291.14)
    optimizer ......................................: (236.56, 237.73)
[2025-03-12 09:27:49,306] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | optimizer_allgather: 223.11 | optimizer_gradients: 0.53 | optimizer_step: 0.99
[2025-03-12 09:27:49,306] [INFO] [logging.py:128:log_dist] [Rank 0] step=12, skipped=0, lr=[3.77487584197426e-08, 3.77487584197426e-08], mom=[(0.9, 0.95), (0.9, 0.95)]
[2025-03-12 09:27:49,307] [INFO] [timer.py:264:stop] epoch=0/micro_step=12/global_step=12, RunningAvgSamplesPerSec=150.676789018258, CurrSamplesPerSec=150.34099551614207, MemAllocated=13.82GB, MaxMemAllocated=45.85GB
[2025-03-12 09:27:49,307] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd_microstep: 7761.82 | bwd_microstep: 19490.49 | bwd_inner_microstep: 18847.88 | bwd_allreduce_microstep: 642.30 | step_microstep: 237.51
[2025-03-12 09:27:49,307] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd: 7761.85 | bwd: 19490.48 | bwd_inner: 18847.94 | bwd_allreduce: 642.30 | step: 237.51
[2025-03-12 09:27:49][I][megatron/training_log:661]  iteration=      12/ 1271565 | consumed_samples=        4608 | consumed_tokens=    18874368 | elapsed_time_per_iteration_ms=27595.0 | learning_rate=3.77488e-08 | global_batch_size=  384 | lm loss=11.173476 | loss_scale=1.0 | grad_norm=10.517 | actual_seqlen= 4096 | number_of_skipped_iterations=  0 | number_of_nan_iterations=  0 | samples_per_second=13.916 | tokens_per_gpu_per_second_tgs=2374.922 | [LM]TFLOPs=97.97 | [DS]TFLOPs=94.15 |
(min, max) time across ranks (ms):
    forward-backward ...............................: (27335.32, 27335.85)
    optimizer ......................................: (236.66, 237.80)
[2025-03-12 09:28:16,853] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | optimizer_allgather: 222.72 | optimizer_gradients: 0.55 | optimizer_step: 1.04
[2025-03-12 09:28:16,854] [INFO] [logging.py:128:log_dist] [Rank 0] step=13, skipped=0, lr=[4.0894488288054484e-08, 4.0894488288054484e-08], mom=[(0.9, 0.95), (0.9, 0.95)]
[2025-03-12 09:28:16,854] [INFO] [timer.py:264:stop] epoch=0/micro_step=13/global_step=13, RunningAvgSamplesPerSec=150.68934899689748, CurrSamplesPerSec=150.81500481310513, MemAllocated=13.82GB, MaxMemAllocated=45.85GB
[2025-03-12 09:28:16,854] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd_microstep: 7766.62 | bwd_microstep: 19433.64 | bwd_inner_microstep: 18790.03 | bwd_allreduce_microstep: 643.32 | step_microstep: 237.20
[2025-03-12 09:28:16,854] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd: 7766.64 | bwd: 19433.64 | bwd_inner: 18790.09 | bwd_allreduce: 643.31 | step: 237.20
[2025-03-12 09:28:16][I][megatron/training_log:661]  iteration=      13/ 1271565 | consumed_samples=        4992 | consumed_tokens=    20447232 | elapsed_time_per_iteration_ms=27547.4 | learning_rate=4.08945e-08 | global_batch_size=  384 | lm loss=11.174082 | loss_scale=1.0 | grad_norm=10.397 | actual_seqlen= 4096 | number_of_skipped_iterations=  0 | number_of_nan_iterations=  0 | samples_per_second=13.940 | tokens_per_gpu_per_second_tgs=2379.025 | [LM]TFLOPs=98.14 | [DS]TFLOPs=94.31 |
(min, max) time across ranks (ms):
    forward-backward ...............................: (27286.80, 27287.63)
    optimizer ......................................: (236.32, 237.52)
[2025-03-12 09:28:44,390] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | optimizer_allgather: 223.24 | optimizer_gradients: 0.56 | optimizer_step: 1.06
[2025-03-12 09:28:44,390] [INFO] [logging.py:128:log_dist] [Rank 0] step=14, skipped=0, lr=[4.404021815636637e-08, 4.404021815636637e-08], mom=[(0.9, 0.95), (0.9, 0.95)]
[2025-03-12 09:28:44,391] [INFO] [timer.py:264:stop] epoch=0/micro_step=14/global_step=14, RunningAvgSamplesPerSec=150.6799710642357, CurrSamplesPerSec=150.57683174507028, MemAllocated=13.82GB, MaxMemAllocated=45.85GB
[2025-03-12 09:28:44,391] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd_microstep: 7778.28 | bwd_microstep: 19412.22 | bwd_inner_microstep: 18773.44 | bwd_allreduce_microstep: 638.49 | step_microstep: 237.73
[2025-03-12 09:28:44,391] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd: 7778.30 | bwd: 19412.21 | bwd_inner: 18773.50 | bwd_allreduce: 638.48 | step: 237.73
[2025-03-12 09:28:44][I][megatron/training_log:661]  iteration=      14/ 1271565 | consumed_samples=        5376 | consumed_tokens=    22020096 | elapsed_time_per_iteration_ms=27536.5 | learning_rate=4.40402e-08 | global_batch_size=  384 | lm loss=11.169671 | loss_scale=1.0 | grad_norm=10.867 | actual_seqlen= 4096 | number_of_skipped_iterations=  0 | number_of_nan_iterations=  0 | samples_per_second=13.945 | tokens_per_gpu_per_second_tgs=2379.965 | [LM]TFLOPs=98.18 | [DS]TFLOPs=94.35 |
(min, max) time across ranks (ms):
    forward-backward ...............................: (27275.91, 27276.62)
    optimizer ......................................: (236.68, 238.04)
[2025-03-12 09:29:11,975] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | optimizer_allgather: 223.14 | optimizer_gradients: 0.57 | optimizer_step: 1.05
[2025-03-12 09:29:11,975] [INFO] [logging.py:128:log_dist] [Rank 0] step=15, skipped=0, lr=[4.718594802467826e-08, 4.718594802467826e-08], mom=[(0.9, 0.95), (0.9, 0.95)]
[2025-03-12 09:29:11,976] [INFO] [timer.py:264:stop] epoch=0/micro_step=15/global_step=15, RunningAvgSamplesPerSec=150.6481067913074, CurrSamplesPerSec=150.26672523308596, MemAllocated=13.82GB, MaxMemAllocated=45.85GB
[2025-03-12 09:29:11,976] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd_microstep: 7771.81 | bwd_microstep: 19468.89 | bwd_inner_microstep: 18826.52 | bwd_allreduce_microstep: 642.07 | step_microstep: 237.50
[2025-03-12 09:29:11,976] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd: 7771.83 | bwd: 19468.89 | bwd_inner: 18826.58 | bwd_allreduce: 642.06 | step: 237.50
[2025-03-12 09:29:11][I][megatron/training_log:661]  iteration=      15/ 1271565 | consumed_samples=        5760 | consumed_tokens=    23592960 | elapsed_time_per_iteration_ms=27583.6 | learning_rate=4.71859e-08 | global_batch_size=  384 | lm loss=11.166926 | loss_scale=1.0 | grad_norm=11.188 | actual_seqlen= 4096 | number_of_skipped_iterations=  0 | number_of_nan_iterations=  0 | samples_per_second=13.921 | tokens_per_gpu_per_second_tgs=2375.900 | [LM]TFLOPs=98.01 | [DS]TFLOPs=94.19 |
(min, max) time across ranks (ms):
    forward-backward ...............................: (27324.02, 27324.72)
    optimizer ......................................: (236.43, 237.81)
[2025-03-12 09:29:39,564] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | optimizer_allgather: 223.35 | optimizer_gradients: 0.53 | optimizer_step: 1.00
[2025-03-12 09:29:39,565] [INFO] [logging.py:128:log_dist] [Rank 0] step=16, skipped=0, lr=[5.033167789299014e-08, 5.033167789299014e-08], mom=[(0.9, 0.95), (0.9, 0.95)]
[2025-03-12 09:29:39,565] [INFO] [timer.py:264:stop] epoch=0/micro_step=16/global_step=16, RunningAvgSamplesPerSec=150.60937559393847, CurrSamplesPerSec=150.1076176110364, MemAllocated=13.82GB, MaxMemAllocated=45.85GB
[2025-03-12 09:29:39,565] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd_microstep: 7768.12 | bwd_microstep: 19474.25 | bwd_inner_microstep: 18832.07 | bwd_allreduce_microstep: 641.89 | step_microstep: 237.89
[2025-03-12 09:29:39,565] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd: 7768.15 | bwd: 19474.25 | bwd_inner: 18832.12 | bwd_allreduce: 641.88 | step: 237.89
[2025-03-12 09:29:39][I][megatron/training_log:661]  iteration=      16/ 1271565 | consumed_samples=        6144 | consumed_tokens=    25165824 | elapsed_time_per_iteration_ms=27589.6 | learning_rate=5.03317e-08 | global_batch_size=  384 | lm loss=11.168621 | loss_scale=1.0 | grad_norm=10.912 | actual_seqlen= 4096 | number_of_skipped_iterations=  0 | number_of_nan_iterations=  0 | samples_per_second=13.918 | tokens_per_gpu_per_second_tgs=2375.385 | [LM]TFLOPs=97.99 | [DS]TFLOPs=94.17 |
(min, max) time across ranks (ms):
    forward-backward ...............................: (27330.04, 27330.75)
    optimizer ......................................: (236.86, 238.15)
[2025-03-12 09:30:07,158] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | optimizer_allgather: 223.10 | optimizer_gradients: 0.57 | optimizer_step: 1.06
[2025-03-12 09:30:07,158] [INFO] [logging.py:128:log_dist] [Rank 0] step=17, skipped=0, lr=[5.347740776130202e-08, 5.347740776130202e-08], mom=[(0.9, 0.95), (0.9, 0.95)]
[2025-03-12 09:30:07,159] [INFO] [timer.py:264:stop] epoch=0/micro_step=17/global_step=17, RunningAvgSamplesPerSec=150.58897119256196, CurrSamplesPerSec=150.30383016000565, MemAllocated=13.82GB, MaxMemAllocated=45.85GB
[2025-03-12 09:30:07,159] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd_microstep: 7767.37 | bwd_microstep: 19483.23 | bwd_inner_microstep: 18841.54 | bwd_allreduce_microstep: 641.38 | step_microstep: 237.67
[2025-03-12 09:30:07,159] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd: 7767.39 | bwd: 19483.23 | bwd_inner: 18841.61 | bwd_allreduce: 641.37 | step: 237.67
[2025-03-12 09:30:07][I][megatron/training_log:661]  iteration=      17/ 1271565 | consumed_samples=        6528 | consumed_tokens=    26738688 | elapsed_time_per_iteration_ms=27593.2 | learning_rate=5.34774e-08 | global_batch_size=  384 | lm loss=11.169818 | loss_scale=1.0 | grad_norm=10.784 | actual_seqlen= 4096 | number_of_skipped_iterations=  0 | number_of_nan_iterations=  0 | samples_per_second=13.916 | tokens_per_gpu_per_second_tgs=2375.081 | [LM]TFLOPs=97.98 | [DS]TFLOPs=94.15 |
(min, max) time across ranks (ms):
    forward-backward ...............................: (27333.77, 27334.72)
    optimizer ......................................: (236.76, 237.94)
[2025-03-12 09:30:34,747] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | optimizer_allgather: 223.41 | optimizer_gradients: 0.56 | optimizer_step: 1.05
[2025-03-12 09:30:34,747] [INFO] [logging.py:128:log_dist] [Rank 0] step=18, skipped=0, lr=[5.66231376296139e-08, 5.66231376296139e-08], mom=[(0.9, 0.95), (0.9, 0.95)]
[2025-03-12 09:30:34,748] [INFO] [timer.py:264:stop] epoch=0/micro_step=18/global_step=18, RunningAvgSamplesPerSec=150.56977407087533, CurrSamplesPerSec=150.28234465103105, MemAllocated=13.82GB, MaxMemAllocated=45.85GB
[2025-03-12 09:30:34,748] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd_microstep: 7754.05 | bwd_microstep: 19489.14 | bwd_inner_microstep: 18845.99 | bwd_allreduce_microstep: 642.85 | step_microstep: 237.92
[2025-03-12 09:30:34,748] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd: 7754.07 | bwd: 19489.14 | bwd_inner: 18846.05 | bwd_allreduce: 642.85 | step: 237.92
[2025-03-12 09:30:34][I][megatron/training_log:661]  iteration=      18/ 1271565 | consumed_samples=        6912 | consumed_tokens=    28311552 | elapsed_time_per_iteration_ms=27588.7 | learning_rate=5.66231e-08 | global_batch_size=  384 | lm loss=11.166717 | loss_scale=1.0 | grad_norm=11.337 | actual_seqlen= 4096 | number_of_skipped_iterations=  0 | number_of_nan_iterations=  0 | samples_per_second=13.919 | tokens_per_gpu_per_second_tgs=2375.470 | [LM]TFLOPs=98.00 | [DS]TFLOPs=94.17 |
(min, max) time across ranks (ms):
    forward-backward ...............................: (27329.15, 27329.85)
    optimizer ......................................: (236.72, 238.20)
[2025-03-12 09:31:02,335] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | optimizer_allgather: 223.06 | optimizer_gradients: 0.54 | optimizer_step: 1.02
[2025-03-12 09:31:02,335] [INFO] [logging.py:128:log_dist] [Rank 0] step=19, skipped=0, lr=[5.976886749792578e-08, 5.976886749792578e-08], mom=[(0.9, 0.95), (0.9, 0.95)]
[2025-03-12 09:31:02,335] [INFO] [timer.py:264:stop] epoch=0/micro_step=19/global_step=19, RunningAvgSamplesPerSec=150.5800384194339, CurrSamplesPerSec=150.74439935155524, MemAllocated=13.82GB, MaxMemAllocated=45.85GB
[2025-03-12 09:31:02,336] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd_microstep: 7761.39 | bwd_microstep: 19482.84 | bwd_inner_microstep: 18840.61 | bwd_allreduce_microstep: 641.94 | step_microstep: 237.38
[2025-03-12 09:31:02,336] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd: 7761.41 | bwd: 19482.84 | bwd_inner: 18840.66 | bwd_allreduce: 641.93 | step: 237.38
[2025-03-12 09:31:02][I][megatron/training_log:661]  iteration=      19/ 1271565 | consumed_samples=        7296 | consumed_tokens=    29884416 | elapsed_time_per_iteration_ms=27587.7 | learning_rate=5.97689e-08 | global_batch_size=  384 | lm loss=11.164600 | loss_scale=1.0 | grad_norm=11.097 | actual_seqlen= 4096 | number_of_skipped_iterations=  0 | number_of_nan_iterations=  0 | samples_per_second=13.919 | tokens_per_gpu_per_second_tgs=2375.552 | [LM]TFLOPs=98.00 | [DS]TFLOPs=94.17 |
(min, max) time across ranks (ms):
    forward-backward ...............................: (27327.71, 27328.55)
    optimizer ......................................: (236.51, 237.67)
[2025-03-12 09:31:29,923] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | optimizer_allgather: 223.25 | optimizer_gradients: 0.54 | optimizer_step: 1.02
[2025-03-12 09:31:29,923] [INFO] [logging.py:128:log_dist] [Rank 0] step=20, skipped=0, lr=[6.291459736623767e-08, 6.291459736623767e-08], mom=[(0.9, 0.95), (0.9, 0.95)]
[2025-03-12 09:31:29,924] [INFO] [timer.py:264:stop] epoch=0/micro_step=20/global_step=20, RunningAvgSamplesPerSec=150.54739404105132, CurrSamplesPerSec=149.99453863031658, MemAllocated=13.82GB, MaxMemAllocated=45.85GB
[2025-03-12 09:31:29,924] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd_microstep: 7760.33 | bwd_microstep: 19481.08 | bwd_inner_microstep: 18836.25 | bwd_allreduce_microstep: 644.53 | step_microstep: 237.80
[2025-03-12 09:31:29,924] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd: 7760.35 | bwd: 19481.08 | bwd_inner: 18836.31 | bwd_allreduce: 644.53 | step: 237.80
[2025-03-12 09:31:29][I][megatron/training_log:661]  iteration=      20/ 1271565 | consumed_samples=        7680 | consumed_tokens=    31457280 | elapsed_time_per_iteration_ms=27588.0 | learning_rate=6.29146e-08 | global_batch_size=  384 | lm loss=11.165082 | loss_scale=1.0 | grad_norm=11.526 | actual_seqlen= 4096 | number_of_skipped_iterations=  0 | number_of_nan_iterations=  0 | samples_per_second=13.919 | tokens_per_gpu_per_second_tgs=2375.524 | [LM]TFLOPs=98.00 | [DS]TFLOPs=94.17 |
(min, max) time across ranks (ms):
    forward-backward ...............................: (27328.02, 27328.77)
    optimizer ......................................: (236.66, 238.08)
[2025-03-12 09:31:57,493] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | optimizer_allgather: 223.62 | optimizer_gradients: 0.54 | optimizer_step: 0.99
[2025-03-12 09:31:57,493] [INFO] [logging.py:128:log_dist] [Rank 0] step=21, skipped=0, lr=[6.606032723454956e-08, 6.606032723454956e-08], mom=[(0.9, 0.95), (0.9, 0.95)]
[2025-03-12 09:31:57,494] [INFO] [timer.py:264:stop] epoch=0/micro_step=21/global_step=21, RunningAvgSamplesPerSec=150.53150601956966, CurrSamplesPerSec=150.24603520544807, MemAllocated=13.82GB, MaxMemAllocated=45.85GB
[2025-03-12 09:31:57,494] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd_microstep: 7763.84 | bwd_microstep: 19463.13 | bwd_inner_microstep: 18816.92 | bwd_allreduce_microstep: 645.90 | step_microstep: 237.87
[2025-03-12 09:31:57,494] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd: 7763.86 | bwd: 19463.12 | bwd_inner: 18816.98 | bwd_allreduce: 645.90 | step: 237.87
[2025-03-12 09:31:57][I][megatron/training_log:661]  iteration=      21/ 1271565 | consumed_samples=        8064 | consumed_tokens=    33030144 | elapsed_time_per_iteration_ms=27569.9 | learning_rate=6.60603e-08 | global_batch_size=  384 | lm loss=11.169722 | loss_scale=1.0 | grad_norm=10.965 | actual_seqlen= 4096 | number_of_skipped_iterations=  0 | number_of_nan_iterations=  0 | samples_per_second=13.928 | tokens_per_gpu_per_second_tgs=2377.081 | [LM]TFLOPs=98.06 | [DS]TFLOPs=94.23 |
(min, max) time across ranks (ms):
    forward-backward ...............................: (27310.04, 27310.77)
    optimizer ......................................: (236.83, 238.14)
[2025-03-12 09:32:25,050] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | optimizer_allgather: 223.10 | optimizer_gradients: 0.53 | optimizer_step: 1.05
[2025-03-12 09:32:25,050] [INFO] [logging.py:128:log_dist] [Rank 0] step=22, skipped=0, lr=[6.920605710286143e-08, 6.920605710286143e-08], mom=[(0.9, 0.95), (0.9, 0.95)]
[2025-03-12 09:32:25,051] [INFO] [timer.py:264:stop] epoch=0/micro_step=22/global_step=22, RunningAvgSamplesPerSec=150.53416433279884, CurrSamplesPerSec=150.5846310776518, MemAllocated=13.82GB, MaxMemAllocated=45.85GB
[2025-03-12 09:32:25,051] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd_microstep: 7767.07 | bwd_microstep: 19446.17 | bwd_inner_microstep: 18806.38 | bwd_allreduce_microstep: 639.49 | step_microstep: 237.76
[2025-03-12 09:32:25,051] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd: 7767.09 | bwd: 19446.16 | bwd_inner: 18806.44 | bwd_allreduce: 639.49 | step: 237.76
[2025-03-12 09:32:25][I][megatron/training_log:661]  iteration=      22/ 1271565 | consumed_samples=        8448 | consumed_tokens=    34603008 | elapsed_time_per_iteration_ms=27556.4 | learning_rate=6.92061e-08 | global_batch_size=  384 | lm loss=11.159341 | loss_scale=1.0 | grad_norm=11.150 | actual_seqlen= 4096 | number_of_skipped_iterations=  0 | number_of_nan_iterations=  0 | samples_per_second=13.935 | tokens_per_gpu_per_second_tgs=2378.247 | [LM]TFLOPs=98.11 | [DS]TFLOPs=94.28 |
(min, max) time across ranks (ms):
    forward-backward ...............................: (27296.38, 27297.25)
    optimizer ......................................: (236.85, 238.05)
[2025-03-12 09:32:54,137] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | optimizer_allgather: 223.32 | optimizer_gradients: 0.55 | optimizer_step: 1.04
[2025-03-12 09:32:54,137] [INFO] [logging.py:128:log_dist] [Rank 0] step=23, skipped=0, lr=[7.235178697117333e-08, 7.235178697117333e-08], mom=[(0.9, 0.95), (0.9, 0.95)]
[2025-03-12 09:32:54,137] [INFO] [timer.py:264:stop] epoch=0/micro_step=23/global_step=23, RunningAvgSamplesPerSec=150.55238451198528, CurrSamplesPerSec=150.91765726183553, MemAllocated=13.82GB, MaxMemAllocated=45.85GB
[2025-03-12 09:32:54,138] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd_microstep: 7751.73 | bwd_microstep: 20989.64 | bwd_inner_microstep: 20349.86 | bwd_allreduce_microstep: 639.49 | step_microstep: 237.70
[2025-03-12 09:32:54,138] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd: 7751.75 | bwd: 20989.64 | bwd_inner: 20349.92 | bwd_allreduce: 639.49 | step: 237.70
[2025-03-12 09:32:54][I][megatron/training_log:661]  iteration=      23/ 1271565 | consumed_samples=        8832 | consumed_tokens=    36175872 | elapsed_time_per_iteration_ms=29086.3 | learning_rate=7.23518e-08 | global_batch_size=  384 | lm loss=11.165020 | loss_scale=1.0 | grad_norm=10.614 | actual_seqlen= 4096 | number_of_skipped_iterations=  0 | number_of_nan_iterations=  0 | samples_per_second=13.202 | tokens_per_gpu_per_second_tgs=2253.157 | [LM]TFLOPs=92.95 | [DS]TFLOPs=89.32 |
(min, max) time across ranks (ms):
    forward-backward ...............................: (28826.48, 28827.37)
    optimizer ......................................: (236.83, 237.98)
[2025-03-12 09:33:21,697] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | optimizer_allgather: 223.10 | optimizer_gradients: 0.54 | optimizer_step: 1.04
[2025-03-12 09:33:21,698] [INFO] [logging.py:128:log_dist] [Rank 0] step=24, skipped=0, lr=[7.54975168394852e-08, 7.54975168394852e-08], mom=[(0.9, 0.95), (0.9, 0.95)]
[2025-03-12 09:33:21,698] [INFO] [timer.py:264:stop] epoch=0/micro_step=24/global_step=24, RunningAvgSamplesPerSec=150.55342149821294, CurrSamplesPerSec=150.57514246540285, MemAllocated=13.82GB, MaxMemAllocated=45.85GB
[2025-03-12 09:33:21,699] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd_microstep: 7765.60 | bwd_microstep: 19453.20 | bwd_inner_microstep: 18811.14 | bwd_allreduce_microstep: 641.76 | step_microstep: 237.55
[2025-03-12 09:33:21,699] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd: 7765.62 | bwd: 19453.19 | bwd_inner: 18811.20 | bwd_allreduce: 641.75 | step: 237.55
[2025-03-12 09:33:21][I][megatron/training_log:661]  iteration=      24/ 1271565 | consumed_samples=        9216 | consumed_tokens=    37748736 | elapsed_time_per_iteration_ms=27560.8 | learning_rate=7.54975e-08 | global_batch_size=  384 | lm loss=11.163930 | loss_scale=1.0 | grad_norm=10.657 | actual_seqlen= 4096 | number_of_skipped_iterations=  0 | number_of_nan_iterations=  0 | samples_per_second=13.933 | tokens_per_gpu_per_second_tgs=2377.873 | [LM]TFLOPs=98.09 | [DS]TFLOPs=94.26 |
(min, max) time across ranks (ms):
    forward-backward ...............................: (27301.38, 27302.27)
    optimizer ......................................: (236.59, 237.84)
[2025-03-12 09:33:49,344] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | optimizer_allgather: 222.87 | optimizer_gradients: 0.53 | optimizer_step: 1.02
[2025-03-12 09:33:49,344] [INFO] [logging.py:128:log_dist] [Rank 0] step=25, skipped=0, lr=[7.864324670779709e-08, 7.864324670779709e-08], mom=[(0.9, 0.95), (0.9, 0.95)]
[2025-03-12 09:33:49,345] [INFO] [timer.py:264:stop] epoch=0/micro_step=25/global_step=25, RunningAvgSamplesPerSec=150.5513872935777, CurrSamplesPerSec=150.50658970475837, MemAllocated=13.82GB, MaxMemAllocated=45.85GB
[2025-03-12 09:33:49,345] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd_microstep: 7746.74 | bwd_microstep: 19555.25 | bwd_inner_microstep: 18912.86 | bwd_allreduce_microstep: 642.10 | step_microstep: 237.29
[2025-03-12 09:33:49,345] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd: 7746.77 | bwd: 19555.25 | bwd_inner: 18912.91 | bwd_allreduce: 642.10 | step: 237.29
[2025-03-12 09:33:49][I][megatron/training_log:661]  iteration=      25/ 1271565 | consumed_samples=        9600 | consumed_tokens=    39321600 | elapsed_time_per_iteration_ms=27646.4 | learning_rate=7.86432e-08 | global_batch_size=  384 | lm loss=11.166022 | loss_scale=1.0 | grad_norm=12.064 | actual_seqlen= 4096 | number_of_skipped_iterations=  0 | number_of_nan_iterations=  0 | samples_per_second=13.890 | tokens_per_gpu_per_second_tgs=2370.505 | [LM]TFLOPs=97.79 | [DS]TFLOPs=93.97 |
(min, max) time across ranks (ms):
    forward-backward ...............................: (27386.73, 27387.53)
    optimizer ......................................: (236.45, 237.60)
[2025-03-12 09:34:16,879] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | optimizer_allgather: 223.01 | optimizer_gradients: 0.53 | optimizer_step: 1.00
[2025-03-12 09:34:16,880] [INFO] [logging.py:128:log_dist] [Rank 0] step=26, skipped=0, lr=[8.178897657610897e-08, 8.178897657610897e-08], mom=[(0.9, 0.95), (0.9, 0.95)]
[2025-03-12 09:34:16,880] [INFO] [timer.py:264:stop] epoch=0/micro_step=26/global_step=26, RunningAvgSamplesPerSec=150.55682657522908, CurrSamplesPerSec=150.68197949258283, MemAllocated=13.82GB, MaxMemAllocated=45.85GB
[2025-03-12 09:34:16,880] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd_microstep: 7769.19 | bwd_microstep: 19422.44 | bwd_inner_microstep: 18781.92 | bwd_allreduce_microstep: 640.22 | step_microstep: 237.35
[2025-03-12 09:34:16,880] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd: 7769.21 | bwd: 19422.43 | bwd_inner: 18781.97 | bwd_allreduce: 640.21 | step: 237.35
[2025-03-12 09:34:16][I][megatron/training_log:661]  iteration=      26/ 1271565 | consumed_samples=        9984 | consumed_tokens=    40894464 | elapsed_time_per_iteration_ms=27534.8 | learning_rate=8.1789e-08 | global_batch_size=  384 | lm loss=11.161508 | loss_scale=1.0 | grad_norm=10.619 | actual_seqlen= 4096 | number_of_skipped_iterations=  0 | number_of_nan_iterations=  0 | samples_per_second=13.946 | tokens_per_gpu_per_second_tgs=2380.116 | [LM]TFLOPs=98.19 | [DS]TFLOPs=94.35 |
(min, max) time across ranks (ms):
    forward-backward ...............................: (27275.39, 27276.09)
    optimizer ......................................: (236.12, 237.61)
[2025-03-12 09:34:44,409] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | optimizer_allgather: 222.64 | optimizer_gradients: 0.55 | optimizer_step: 1.04
[2025-03-12 09:34:44,410] [INFO] [logging.py:128:log_dist] [Rank 0] step=27, skipped=0, lr=[8.493470644442084e-08, 8.493470644442084e-08], mom=[(0.9, 0.95), (0.9, 0.95)]
[2025-03-12 09:34:44,410] [INFO] [timer.py:264:stop] epoch=0/micro_step=27/global_step=27, RunningAvgSamplesPerSec=150.56386797728126, CurrSamplesPerSec=150.73300027261766, MemAllocated=13.82GB, MaxMemAllocated=45.85GB
[2025-03-12 09:34:44,410] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd_microstep: 7759.57 | bwd_microstep: 19426.60 | bwd_inner_microstep: 18783.88 | bwd_allreduce_microstep: 642.43 | step_microstep: 236.83
[2025-03-12 09:34:44,410] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd: 7759.59 | bwd: 19426.59 | bwd_inner: 18783.93 | bwd_allreduce: 642.43 | step: 236.83
[2025-03-12 09:34:44][I][megatron/training_log:661]  iteration=      27/ 1271565 | consumed_samples=       10368 | consumed_tokens=    42467328 | elapsed_time_per_iteration_ms=27529.4 | learning_rate=8.49347e-08 | global_batch_size=  384 | lm loss=11.158463 | loss_scale=1.0 | grad_norm=11.492 | actual_seqlen= 4096 | number_of_skipped_iterations=  0 | number_of_nan_iterations=  0 | samples_per_second=13.949 | tokens_per_gpu_per_second_tgs=2380.586 | [LM]TFLOPs=98.21 | [DS]TFLOPs=94.37 |
(min, max) time across ranks (ms):
    forward-backward ...............................: (27271.02, 27271.82)
    optimizer ......................................: (236.02, 237.10)
[2025-03-12 09:35:11,920] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | optimizer_allgather: 223.09 | optimizer_gradients: 0.53 | optimizer_step: 1.01
[2025-03-12 09:35:11,920] [INFO] [logging.py:128:log_dist] [Rank 0] step=28, skipped=0, lr=[8.808043631273274e-08, 8.808043631273274e-08], mom=[(0.9, 0.95), (0.9, 0.95)]
[2025-03-12 09:35:11,921] [INFO] [timer.py:264:stop] epoch=0/micro_step=28/global_step=28, RunningAvgSamplesPerSec=150.57626867573913, CurrSamplesPerSec=150.88689209106104, MemAllocated=13.82GB, MaxMemAllocated=45.85GB
[2025-03-12 09:35:11,921] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd_microstep: 7769.62 | bwd_microstep: 19398.55 | bwd_inner_microstep: 18760.25 | bwd_allreduce_microstep: 638.01 | step_microstep: 237.54
[2025-03-12 09:35:11,921] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd: 7769.64 | bwd: 19398.55 | bwd_inner: 18760.30 | bwd_allreduce: 638.00 | step: 237.54
[2025-03-12 09:35:11][I][megatron/training_log:661]  iteration=      28/ 1271565 | consumed_samples=       10752 | consumed_tokens=    44040192 | elapsed_time_per_iteration_ms=27510.1 | learning_rate=8.80804e-08 | global_batch_size=  384 | lm loss=11.159657 | loss_scale=1.0 | grad_norm=10.657 | actual_seqlen= 4096 | number_of_skipped_iterations=  0 | number_of_nan_iterations=  0 | samples_per_second=13.959 | tokens_per_gpu_per_second_tgs=2382.251 | [LM]TFLOPs=98.28 | [DS]TFLOPs=94.44 |
(min, max) time across ranks (ms):
    forward-backward ...............................: (27250.35, 27251.12)
    optimizer ......................................: (236.62, 237.80)
[2025-03-12 09:35:39,490] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | optimizer_allgather: 223.00 | optimizer_gradients: 0.53 | optimizer_step: 1.01
[2025-03-12 09:35:39,491] [INFO] [logging.py:128:log_dist] [Rank 0] step=29, skipped=0, lr=[9.122616618104463e-08, 9.122616618104463e-08], mom=[(0.9, 0.95), (0.9, 0.95)]
[2025-03-12 09:35:39,491] [INFO] [timer.py:264:stop] epoch=0/micro_step=29/global_step=29, RunningAvgSamplesPerSec=150.57088463013994, CurrSamplesPerSec=150.43097553247807, MemAllocated=13.82GB, MaxMemAllocated=45.85GB
[2025-03-12 09:35:39,491] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd_microstep: 7760.55 | bwd_microstep: 19464.20 | bwd_inner_microstep: 18818.97 | bwd_allreduce_microstep: 644.93 | step_microstep: 237.36
[2025-03-12 09:35:39,492] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd: 7760.58 | bwd: 19464.20 | bwd_inner: 18819.03 | bwd_allreduce: 644.93 | step: 237.36
[2025-03-12 09:35:39][I][megatron/training_log:661]  iteration=      29/ 1271565 | consumed_samples=       11136 | consumed_tokens=    45613056 | elapsed_time_per_iteration_ms=27570.7 | learning_rate=9.12262e-08 | global_batch_size=  384 | lm loss=11.151192 | loss_scale=1.0 | grad_norm=11.017 | actual_seqlen= 4096 | number_of_skipped_iterations=  0 | number_of_nan_iterations=  0 | samples_per_second=13.928 | tokens_per_gpu_per_second_tgs=2377.019 | [LM]TFLOPs=98.06 | [DS]TFLOPs=94.23 |
(min, max) time across ranks (ms):
    forward-backward ...............................: (27311.48, 27312.39)
    optimizer ......................................: (236.29, 237.66)
[2025-03-12 09:36:07,065] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | optimizer_allgather: 223.17 | optimizer_gradients: 0.56 | optimizer_step: 1.03
[2025-03-12 09:36:07,066] [INFO] [logging.py:128:log_dist] [Rank 0] step=30, skipped=0, lr=[9.437189604935652e-08, 9.437189604935652e-08], mom=[(0.9, 0.95), (0.9, 0.95)]
[2025-03-12 09:36:07,066] [INFO] [timer.py:264:stop] epoch=0/micro_step=30/global_step=30, RunningAvgSamplesPerSec=150.55450385621103, CurrSamplesPerSec=150.11350758631528, MemAllocated=13.82GB, MaxMemAllocated=45.85GB
[2025-03-12 09:36:07,066] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd_microstep: 7758.68 | bwd_microstep: 19473.38 | bwd_inner_microstep: 18831.63 | bwd_allreduce_microstep: 641.45 | step_microstep: 237.62
[2025-03-12 09:36:07,066] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd: 7758.71 | bwd: 19473.38 | bwd_inner: 18831.69 | bwd_allreduce: 641.44 | step: 237.63
[2025-03-12 09:36:07][I][megatron/training_log:661]  iteration=      30/ 1271565 | consumed_samples=       11520 | consumed_tokens=    47185920 | elapsed_time_per_iteration_ms=27574.8 | learning_rate=9.43719e-08 | global_batch_size=  384 | lm loss=11.150348 | loss_scale=1.0 | grad_norm=10.968 | actual_seqlen= 4096 | number_of_skipped_iterations=  0 | number_of_nan_iterations=  0 | samples_per_second=13.926 | tokens_per_gpu_per_second_tgs=2376.663 | [LM]TFLOPs=98.04 | [DS]TFLOPs=94.22 |
(min, max) time across ranks (ms):
    forward-backward ...............................: (27315.36, 27316.23)
    optimizer ......................................: (236.50, 237.91)
[2025-03-12 09:36:34,618] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | optimizer_allgather: 222.93 | optimizer_gradients: 0.52 | optimizer_step: 1.00
[2025-03-12 09:36:34,619] [INFO] [logging.py:128:log_dist] [Rank 0] step=31, skipped=0, lr=[9.751762591766839e-08, 9.751762591766839e-08], mom=[(0.9, 0.95), (0.9, 0.95)]
[2025-03-12 09:36:34,619] [INFO] [timer.py:264:stop] epoch=0/micro_step=31/global_step=31, RunningAvgSamplesPerSec=150.56813897779932, CurrSamplesPerSec=150.95086831394593, MemAllocated=13.82GB, MaxMemAllocated=45.85GB
[2025-03-12 09:36:34,619] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd_microstep: 7766.39 | bwd_microstep: 19441.07 | bwd_inner_microstep: 18804.55 | bwd_allreduce_microstep: 636.22 | step_microstep: 237.30
[2025-03-12 09:36:34,619] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd: 7766.41 | bwd: 19441.07 | bwd_inner: 18804.61 | bwd_allreduce: 636.22 | step: 237.30
[2025-03-12 09:36:34][I][megatron/training_log:661]  iteration=      31/ 1271565 | consumed_samples=       11904 | consumed_tokens=    48758784 | elapsed_time_per_iteration_ms=27552.4 | learning_rate=9.75176e-08 | global_batch_size=  384 | lm loss=11.149038 | loss_scale=1.0 | grad_norm=10.982 | actual_seqlen= 4096 | number_of_skipped_iterations=  0 | number_of_nan_iterations=  0 | samples_per_second=13.937 | tokens_per_gpu_per_second_tgs=2378.593 | [LM]TFLOPs=98.12 | [DS]TFLOPs=94.29 |
(min, max) time across ranks (ms):
    forward-backward ...............................: (27293.34, 27294.19)
    optimizer ......................................: (236.49, 237.58)
[2025-03-12 09:37:02,122] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | optimizer_allgather: 222.68 | optimizer_gradients: 0.55 | optimizer_step: 1.06
[2025-03-12 09:37:02,123] [INFO] [logging.py:128:log_dist] [Rank 0] step=32, skipped=0, lr=[1.0066335578598028e-07, 1.0066335578598028e-07], mom=[(0.9, 0.95), (0.9, 0.95)]
[2025-03-12 09:37:02,123] [INFO] [timer.py:264:stop] epoch=0/micro_step=32/global_step=32, RunningAvgSamplesPerSec=150.5804191636691, CurrSamplesPerSec=150.93735864820079, MemAllocated=13.82GB, MaxMemAllocated=45.85GB
[2025-03-12 09:37:02,123] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd_microstep: 7759.48 | bwd_microstep: 19401.22 | bwd_inner_microstep: 18762.84 | bwd_allreduce_microstep: 638.09 | step_microstep: 237.15
[2025-03-12 09:37:02,123] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd: 7759.50 | bwd: 19401.22 | bwd_inner: 18762.90 | bwd_allreduce: 638.08 | step: 237.15
[2025-03-12 09:37:02][I][megatron/training_log:661]  iteration=      32/ 1271565 | consumed_samples=       12288 | consumed_tokens=    50331648 | elapsed_time_per_iteration_ms=27503.5 | learning_rate=1.00663e-07 | global_batch_size=  384 | lm loss=11.146956 | loss_scale=1.0 | grad_norm=10.936 | actual_seqlen= 4096 | number_of_skipped_iterations=  0 | number_of_nan_iterations=  0 | samples_per_second=13.962 | tokens_per_gpu_per_second_tgs=2382.821 | [LM]TFLOPs=98.30 | [DS]TFLOPs=94.46 |
(min, max) time across ranks (ms):
    forward-backward ...............................: (27243.61, 27244.42)
    optimizer ......................................: (236.08, 237.44)
[2025-03-12 09:37:29,631] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | optimizer_allgather: 223.02 | optimizer_gradients: 0.56 | optimizer_step: 1.06
[2025-03-12 09:37:29,631] [INFO] [logging.py:128:log_dist] [Rank 0] step=33, skipped=0, lr=[1.0380908565429217e-07, 1.0380908565429217e-07], mom=[(0.9, 0.95), (0.9, 0.95)]
[2025-03-12 09:37:29,632] [INFO] [timer.py:264:stop] epoch=0/micro_step=33/global_step=33, RunningAvgSamplesPerSec=150.5951183632239, CurrSamplesPerSec=151.0373733107677, MemAllocated=13.82GB, MaxMemAllocated=45.85GB
[2025-03-12 09:37:29,632] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd_microstep: 7769.67 | bwd_microstep: 19394.21 | bwd_inner_microstep: 18758.87 | bwd_allreduce_microstep: 635.06 | step_microstep: 237.55
[2025-03-12 09:37:29,632] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd: 7769.69 | bwd: 19394.21 | bwd_inner: 18758.92 | bwd_allreduce: 635.06 | step: 237.55
[2025-03-12 09:37:29][I][megatron/training_log:661]  iteration=      33/ 1271565 | consumed_samples=       12672 | consumed_tokens=    51904512 | elapsed_time_per_iteration_ms=27508.8 | learning_rate=1.03809e-07 | global_batch_size=  384 | lm loss=11.143302 | loss_scale=1.0 | grad_norm=10.686 | actual_seqlen= 4096 | number_of_skipped_iterations=  0 | number_of_nan_iterations=  0 | samples_per_second=13.959 | tokens_per_gpu_per_second_tgs=2382.365 | [LM]TFLOPs=98.28 | [DS]TFLOPs=94.44 |
(min, max) time across ranks (ms):
    forward-backward ...............................: (27248.63, 27249.49)
    optimizer ......................................: (236.31, 237.86)
[2025-03-12 09:37:57,143] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | optimizer_allgather: 220.05 | optimizer_gradients: 0.57 | optimizer_step: 1.04
[2025-03-12 09:37:57,143] [INFO] [logging.py:128:log_dist] [Rank 0] step=34, skipped=0, lr=[1.0695481552260404e-07, 1.0695481552260404e-07], mom=[(0.9, 0.95), (0.9, 0.95)]
[2025-03-12 09:37:57,144] [INFO] [timer.py:264:stop] epoch=0/micro_step=34/global_step=34, RunningAvgSamplesPerSec=150.6027666009015, CurrSamplesPerSec=150.84018864625585, MemAllocated=13.82GB, MaxMemAllocated=45.85GB
[2025-03-12 09:37:57,144] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd_microstep: 7766.24 | bwd_microstep: 19404.88 | bwd_inner_microstep: 18762.95 | bwd_allreduce_microstep: 641.63 | step_microstep: 234.39
[2025-03-12 09:37:57,144] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd: 7766.26 | bwd: 19404.88 | bwd_inner: 18763.01 | bwd_allreduce: 641.62 | step: 234.39
[2025-03-12 09:37:57][I][megatron/training_log:661]  iteration=      34/ 1271565 | consumed_samples=       13056 | consumed_tokens=    53477376 | elapsed_time_per_iteration_ms=27511.3 | learning_rate=1.06955e-07 | global_batch_size=  384 | lm loss=11.142344 | loss_scale=1.0 | grad_norm=11.046 | actual_seqlen= 4096 | number_of_skipped_iterations=  0 | number_of_nan_iterations=  0 | samples_per_second=13.958 | tokens_per_gpu_per_second_tgs=2382.147 | [LM]TFLOPs=98.27 | [DS]TFLOPs=94.43 |
(min, max) time across ranks (ms):
    forward-backward ...............................: (27254.80, 27255.69)
    optimizer ......................................: (233.51, 234.69)
[2025-03-12 09:38:24,648] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | optimizer_allgather: 222.91 | optimizer_gradients: 0.53 | optimizer_step: 1.02
[2025-03-12 09:38:24,649] [INFO] [logging.py:128:log_dist] [Rank 0] step=35, skipped=0, lr=[1.1010054539091593e-07, 1.1010054539091593e-07], mom=[(0.9, 0.95), (0.9, 0.95)]
[2025-03-12 09:38:24,649] [INFO] [timer.py:264:stop] epoch=0/micro_step=35/global_step=35, RunningAvgSamplesPerSec=150.61823196981342, CurrSamplesPerSec=151.11474690757643, MemAllocated=13.82GB, MaxMemAllocated=45.85GB
[2025-03-12 09:38:24,649] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd_microstep: 7769.37 | bwd_microstep: 19387.54 | bwd_inner_microstep: 18753.48 | bwd_allreduce_microstep: 633.77 | step_microstep: 237.49
[2025-03-12 09:38:24,649] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd: 7769.39 | bwd: 19387.54 | bwd_inner: 18753.53 | bwd_allreduce: 633.77 | step: 237.49
[2025-03-12 09:38:24][I][megatron/training_log:661]  iteration=      35/ 1271565 | consumed_samples=       13440 | consumed_tokens=    55050240 | elapsed_time_per_iteration_ms=27505.0 | learning_rate=1.10101e-07 | global_batch_size=  384 | lm loss=11.141614 | loss_scale=1.0 | grad_norm=11.859 | actual_seqlen= 4096 | number_of_skipped_iterations=  0 | number_of_nan_iterations=  0 | samples_per_second=13.961 | tokens_per_gpu_per_second_tgs=2382.697 | [LM]TFLOPs=98.29 | [DS]TFLOPs=94.46 |
(min, max) time across ranks (ms):
    forward-backward ...............................: (27244.77, 27245.73)
    optimizer ......................................: (236.63, 237.77)
[2025-03-12 09:38:52,136] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | optimizer_allgather: 223.04 | optimizer_gradients: 0.56 | optimizer_step: 1.09
[2025-03-12 09:38:52,136] [INFO] [logging.py:128:log_dist] [Rank 0] step=36, skipped=0, lr=[1.132462752592278e-07, 1.132462752592278e-07], mom=[(0.9, 0.95), (0.9, 0.95)]
[2025-03-12 09:38:52,137] [INFO] [timer.py:264:stop] epoch=0/micro_step=36/global_step=36, RunningAvgSamplesPerSec=150.63801235253072, CurrSamplesPerSec=151.29363269838998, MemAllocated=13.82GB, MaxMemAllocated=45.85GB
[2025-03-12 09:38:52,137] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd_microstep: 7765.64 | bwd_microstep: 19377.51 | bwd_inner_microstep: 18741.31 | bwd_allreduce_microstep: 635.90 | step_microstep: 237.72
[2025-03-12 09:38:52,137] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd: 7765.66 | bwd: 19377.51 | bwd_inner: 18741.36 | bwd_allreduce: 635.90 | step: 237.72
[2025-03-12 09:38:52][I][megatron/training_log:661]  iteration=      36/ 1271565 | consumed_samples=       13824 | consumed_tokens=    56623104 | elapsed_time_per_iteration_ms=27487.5 | learning_rate=1.13246e-07 | global_batch_size=  384 | lm loss=11.140266 | loss_scale=1.0 | grad_norm=10.772 | actual_seqlen= 4096 | number_of_skipped_iterations=  0 | number_of_nan_iterations=  0 | samples_per_second=13.970 | tokens_per_gpu_per_second_tgs=2384.208 | [LM]TFLOPs=98.36 | [DS]TFLOPs=94.52 |
(min, max) time across ranks (ms):
    forward-backward ...............................: (27227.59, 27228.51)
    optimizer ......................................: (236.76, 238.00)
[2025-03-12 09:39:19,642] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | optimizer_allgather: 222.83 | optimizer_gradients: 0.56 | optimizer_step: 1.05
[2025-03-12 09:39:19,642] [INFO] [logging.py:128:log_dist] [Rank 0] step=37, skipped=0, lr=[1.1639200512753969e-07, 1.1639200512753969e-07], mom=[(0.9, 0.95), (0.9, 0.95)]
[2025-03-12 09:39:19,643] [INFO] [timer.py:264:stop] epoch=0/micro_step=37/global_step=37, RunningAvgSamplesPerSec=150.63988297133875, CurrSamplesPerSec=150.70345252071348, MemAllocated=13.82GB, MaxMemAllocated=45.85GB
[2025-03-12 09:39:19,643] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd_microstep: 7754.35 | bwd_microstep: 19404.82 | bwd_inner_microstep: 18763.16 | bwd_allreduce_microstep: 641.36 | step_microstep: 237.32
[2025-03-12 09:39:19,643] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd: 7754.37 | bwd: 19404.82 | bwd_inner: 18763.21 | bwd_allreduce: 641.36 | step: 237.32
[2025-03-12 09:39:19][I][megatron/training_log:661]  iteration=      37/ 1271565 | consumed_samples=       14208 | consumed_tokens=    58195968 | elapsed_time_per_iteration_ms=27506.0 | learning_rate=1.16392e-07 | global_batch_size=  384 | lm loss=11.133783 | loss_scale=1.0 | grad_norm=10.816 | actual_seqlen= 4096 | number_of_skipped_iterations=  0 | number_of_nan_iterations=  0 | samples_per_second=13.961 | tokens_per_gpu_per_second_tgs=2382.610 | [LM]TFLOPs=98.29 | [DS]TFLOPs=94.45 |
(min, max) time across ranks (ms):
    forward-backward ...............................: (27245.36, 27246.19)
    optimizer ......................................: (236.49, 237.61)
[2025-03-12 09:39:47,196] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | optimizer_allgather: 222.76 | optimizer_gradients: 0.55 | optimizer_step: 1.04
[2025-03-12 09:39:47,196] [INFO] [logging.py:128:log_dist] [Rank 0] step=38, skipped=0, lr=[1.1953773499585156e-07, 1.1953773499585156e-07], mom=[(0.9, 0.95), (0.9, 0.95)]
[2025-03-12 09:39:47,197] [INFO] [timer.py:264:stop] epoch=0/micro_step=38/global_step=38, RunningAvgSamplesPerSec=150.6428305064565, CurrSamplesPerSec=150.74600777623198, MemAllocated=13.82GB, MaxMemAllocated=45.85GB
[2025-03-12 09:39:47,197] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd_microstep: 7775.64 | bwd_microstep: 19434.79 | bwd_inner_microstep: 18794.63 | bwd_allreduce_microstep: 639.86 | step_microstep: 237.35
[2025-03-12 09:39:47,197] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd: 7775.66 | bwd: 19434.78 | bwd_inner: 18794.68 | bwd_allreduce: 639.86 | step: 237.35
[2025-03-12 09:39:47][I][megatron/training_log:661]  iteration=      38/ 1271565 | consumed_samples=       14592 | consumed_tokens=    59768832 | elapsed_time_per_iteration_ms=27553.8 | learning_rate=1.19538e-07 | global_batch_size=  384 | lm loss=11.129514 | loss_scale=1.0 | grad_norm=10.590 | actual_seqlen= 4096 | number_of_skipped_iterations=  0 | number_of_nan_iterations=  0 | samples_per_second=13.936 | tokens_per_gpu_per_second_tgs=2378.474 | [LM]TFLOPs=98.12 | [DS]TFLOPs=94.29 |
(min, max) time across ranks (ms):
    forward-backward ...............................: (27293.95, 27294.88)
    optimizer ......................................: (236.41, 237.63)

[2025-03-12 09:40:18,214] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | optimizer_allgather: 222.99 | optimizer_gradients: 0.54 | optimizer_step: 1.02
[2025-03-12 09:40:18,214] [INFO] [logging.py:128:log_dist] [Rank 0] step=39, skipped=0, lr=[1.2268346486416345e-07, 1.2268346486416345e-07], mom=[(0.9, 0.95), (0.9, 0.95)]
[2025-03-12 09:40:18,215] [INFO] [timer.py:264:stop] epoch=0/micro_step=39/global_step=39, RunningAvgSamplesPerSec=150.65532960496606, CurrSamplesPerSec=151.10662320823855, MemAllocated=13.82GB, MaxMemAllocated=45.85GB
[2025-03-12 09:40:18,215] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd_microstep: 7785.34 | bwd_microstep: 22885.13 | bwd_inner_microstep: 22249.75 | bwd_allreduce_microstep: 635.08 | step_microstep: 237.48
[2025-03-12 09:40:18,215] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd: 7785.36 | bwd: 22885.12 | bwd_inner: 22249.80 | bwd_allreduce: 635.08 | step: 237.48
[2025-03-12 09:40:18][I][megatron/training_log:661]  iteration=      39/ 1271565 | consumed_samples=       14976 | consumed_tokens=    61341696 | elapsed_time_per_iteration_ms=31017.4 | learning_rate=1.22683e-07 | global_batch_size=  384 | lm loss=11.113354 | loss_scale=1.0 | grad_norm=11.221 | actual_seqlen= 4096 | number_of_skipped_iterations=  0 | number_of_nan_iterations=  0 | samples_per_second=12.380 | tokens_per_gpu_per_second_tgs=2112.881 | [LM]TFLOPs=87.16 | [DS]TFLOPs=83.76 |
(min, max) time across ranks (ms):
    forward-backward ...............................: (30757.46, 30758.31)
    optimizer ......................................: (236.27, 237.76)
[2025-03-12 09:40:45,794] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | optimizer_allgather: 222.58 | optimizer_gradients: 0.57 | optimizer_step: 1.10
[2025-03-12 09:40:45,795] [INFO] [logging.py:128:log_dist] [Rank 0] step=40, skipped=0, lr=[1.2582919473247534e-07, 1.2582919473247534e-07], mom=[(0.9, 0.95), (0.9, 0.95)]
[2025-03-12 09:40:45,795] [INFO] [timer.py:264:stop] epoch=0/micro_step=40/global_step=40, RunningAvgSamplesPerSec=150.6476453041463, CurrSamplesPerSec=150.36381733094183, MemAllocated=13.82GB, MaxMemAllocated=45.85GB
[2025-03-12 09:40:45,795] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd_microstep: 7772.24 | bwd_microstep: 19444.79 | bwd_inner_microstep: 18805.02 | bwd_allreduce_microstep: 639.47 | step_microstep: 237.05
[2025-03-12 09:40:45,796] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd: 7772.26 | bwd: 19444.78 | bwd_inner: 18805.08 | bwd_allreduce: 639.46 | step: 237.05
[2025-03-12 09:40:45][I][megatron/training_log:661]  iteration=      40/ 1271565 | consumed_samples=       15360 | consumed_tokens=    62914560 | elapsed_time_per_iteration_ms=27580.3 | learning_rate=1.25829e-07 | global_batch_size=  384 | lm loss=11.107403 | loss_scale=1.0 | grad_norm=11.039 | actual_seqlen= 4096 | number_of_skipped_iterations=  0 | number_of_nan_iterations=  0 | samples_per_second=13.923 | tokens_per_gpu_per_second_tgs=2376.188 | [LM]TFLOPs=98.03 | [DS]TFLOPs=94.20 |
(min, max) time across ranks (ms):
    forward-backward ...............................: (27320.59, 27321.29)
    optimizer ......................................: (236.20, 237.34)
[2025-03-12 09:41:13,375] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | optimizer_allgather: 222.55 | optimizer_gradients: 0.56 | optimizer_step: 1.04
[2025-03-12 09:41:13,376] [INFO] [logging.py:128:log_dist] [Rank 0] step=41, skipped=0, lr=[1.2897492460078723e-07, 1.2897492460078723e-07], mom=[(0.9, 0.95), (0.9, 0.95)]
[2025-03-12 09:41:13,376] [INFO] [timer.py:264:stop] epoch=0/micro_step=41/global_step=41, RunningAvgSamplesPerSec=150.64290998760035, CurrSamplesPerSec=150.46312932855187, MemAllocated=13.82GB, MaxMemAllocated=45.85GB
[2025-03-12 09:41:13,376] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd_microstep: 7763.44 | bwd_microstep: 19475.17 | bwd_inner_microstep: 18835.26 | bwd_allreduce_microstep: 639.61 | step_microstep: 236.88
[2025-03-12 09:41:13,376] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd: 7763.47 | bwd: 19475.17 | bwd_inner: 18835.32 | bwd_allreduce: 639.60 | step: 236.88
[2025-03-12 09:41:13][I][megatron/training_log:661]  iteration=      41/ 1271565 | consumed_samples=       15744 | consumed_tokens=    64487424 | elapsed_time_per_iteration_ms=27580.4 | learning_rate=1.28975e-07 | global_batch_size=  384 | lm loss=11.099026 | loss_scale=1.0 | grad_norm=10.892 | actual_seqlen= 4096 | number_of_skipped_iterations=  0 | number_of_nan_iterations=  0 | samples_per_second=13.923 | tokens_per_gpu_per_second_tgs=2376.183 | [LM]TFLOPs=98.02 | [DS]TFLOPs=94.20 |
(min, max) time across ranks (ms):
    forward-backward ...............................: (27321.46, 27322.51)
    optimizer ......................................: (236.00, 237.15)
[2025-03-12 09:41:40,895] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | optimizer_allgather: 222.66 | optimizer_gradients: 0.52 | optimizer_step: 0.99
[2025-03-12 09:41:40,896] [INFO] [logging.py:128:log_dist] [Rank 0] step=42, skipped=0, lr=[1.321206544690991e-07, 1.321206544690991e-07], mom=[(0.9, 0.95), (0.9, 0.95)]
[2025-03-12 09:41:40,896] [INFO] [timer.py:264:stop] epoch=0/micro_step=42/global_step=42, RunningAvgSamplesPerSec=150.64652635998266, CurrSamplesPerSec=150.78764123135574, MemAllocated=13.82GB, MaxMemAllocated=45.85GB
[2025-03-12 09:41:40,896] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd_microstep: 7757.05 | bwd_microstep: 19396.21 | bwd_inner_microstep: 18756.71 | bwd_allreduce_microstep: 639.21 | step_microstep: 236.85
[2025-03-12 09:41:40,896] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd: 7757.07 | bwd: 19396.21 | bwd_inner: 18756.76 | bwd_allreduce: 639.21 | step: 236.85
[2025-03-12 09:41:40][I][megatron/training_log:661]  iteration=      42/ 1271565 | consumed_samples=       16128 | consumed_tokens=    66060288 | elapsed_time_per_iteration_ms=27519.3 | learning_rate=1.32121e-07 | global_batch_size=  384 | lm loss=11.096643 | loss_scale=1.0 | grad_norm=10.890 | actual_seqlen= 4096 | number_of_skipped_iterations=  0 | number_of_nan_iterations=  0 | samples_per_second=13.954 | tokens_per_gpu_per_second_tgs=2381.455 | [LM]TFLOPs=98.24 | [DS]TFLOPs=94.41 |
(min, max) time across ranks (ms):
    forward-backward ...............................: (27259.42, 27260.27)
    optimizer ......................................: (235.79, 237.12)
[2025-03-12 09:42:08,435] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | optimizer_allgather: 222.21 | optimizer_gradients: 0.56 | optimizer_step: 1.04
[2025-03-12 09:42:08,436] [INFO] [logging.py:128:log_dist] [Rank 0] step=43, skipped=0, lr=[1.3526638433741097e-07, 1.3526638433741097e-07], mom=[(0.9, 0.95), (0.9, 0.95)]
[2025-03-12 09:42:08,436] [INFO] [timer.py:264:stop] epoch=0/micro_step=43/global_step=43, RunningAvgSamplesPerSec=150.65472223210463, CurrSamplesPerSec=150.98323061286504, MemAllocated=13.82GB, MaxMemAllocated=45.85GB
[2025-03-12 09:42:08,436] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd_microstep: 7768.23 | bwd_microstep: 19412.97 | bwd_inner_microstep: 18778.12 | bwd_allreduce_microstep: 634.56 | step_microstep: 236.67
[2025-03-12 09:42:08,436] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd: 7768.25 | bwd: 19412.97 | bwd_inner: 18778.17 | bwd_allreduce: 634.55 | step: 236.67
[2025-03-12 09:42:08][I][megatron/training_log:661]  iteration=      43/ 1271565 | consumed_samples=       16512 | consumed_tokens=    67633152 | elapsed_time_per_iteration_ms=27540.6 | learning_rate=1.35266e-07 | global_batch_size=  384 | lm loss=11.086030 | loss_scale=1.0 | grad_norm=10.834 | actual_seqlen= 4096 | number_of_skipped_iterations=  0 | number_of_nan_iterations=  0 | samples_per_second=13.943 | tokens_per_gpu_per_second_tgs=2379.615 | [LM]TFLOPs=98.17 | [DS]TFLOPs=94.33 |
(min, max) time across ranks (ms):
    forward-backward ...............................: (27281.60, 27282.47)
    optimizer ......................................: (235.86, 236.94)
[2025-03-12 09:42:40,156] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | optimizer_allgather: 223.27 | optimizer_gradients: 0.56 | optimizer_step: 1.05
[2025-03-12 09:42:40,156] [INFO] [logging.py:128:log_dist] [Rank 0] step=44, skipped=0, lr=[1.3841211420572286e-07, 1.3841211420572286e-07], mom=[(0.9, 0.95), (0.9, 0.95)]
[2025-03-12 09:42:40,156] [INFO] [timer.py:264:stop] epoch=0/micro_step=44/global_step=44, RunningAvgSamplesPerSec=150.65684951614153, CurrSamplesPerSec=150.7440607402074, MemAllocated=13.82GB, MaxMemAllocated=45.85GB
[2025-03-12 09:42:40,157] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd_microstep: 7770.04 | bwd_microstep: 23589.33 | bwd_inner_microstep: 22952.42 | bwd_allreduce_microstep: 636.62 | step_microstep: 237.68
[2025-03-12 09:42:40,157] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd: 7770.06 | bwd: 23589.33 | bwd_inner: 22952.47 | bwd_allreduce: 636.61 | step: 237.68
[2025-03-12 09:42:40][I][megatron/training_log:661]  iteration=      44/ 1271565 | consumed_samples=       16896 | consumed_tokens=    69206016 | elapsed_time_per_iteration_ms=31719.6 | learning_rate=1.38412e-07 | global_batch_size=  384 | lm loss=11.081297 | loss_scale=1.0 | grad_norm=10.845 | actual_seqlen= 4096 | number_of_skipped_iterations=  0 | number_of_nan_iterations=  0 | samples_per_second=12.106 | tokens_per_gpu_per_second_tgs=2066.106 | [LM]TFLOPs=85.23 | [DS]TFLOPs=81.91 |
(min, max) time across ranks (ms):
    forward-backward ...............................: (31460.12, 31461.08)
    optimizer ......................................: (236.39, 237.94)
[2025-03-12 09:43:07,653] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | optimizer_allgather: 223.11 | optimizer_gradients: 0.54 | optimizer_step: 1.01
[2025-03-12 09:43:07,653] [INFO] [logging.py:128:log_dist] [Rank 0] step=45, skipped=0, lr=[1.4155784407403475e-07, 1.4155784407403475e-07], mom=[(0.9, 0.95), (0.9, 0.95)]
[2025-03-12 09:43:07,654] [INFO] [timer.py:264:stop] epoch=0/micro_step=45/global_step=45, RunningAvgSamplesPerSec=150.67099479124266, CurrSamplesPerSec=151.26744481924644, MemAllocated=13.82GB, MaxMemAllocated=45.85GB
[2025-03-12 09:43:07,654] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd_microstep: 7753.86 | bwd_microstep: 19400.94 | bwd_inner_microstep: 18767.76 | bwd_allreduce_microstep: 632.88 | step_microstep: 237.56
[2025-03-12 09:43:07,654] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd: 7753.88 | bwd: 19400.94 | bwd_inner: 18767.82 | bwd_allreduce: 632.87 | step: 237.56
[2025-03-12 09:43:07][I][megatron/training_log:661]  iteration=      45/ 1271565 | consumed_samples=       17280 | consumed_tokens=    70778880 | elapsed_time_per_iteration_ms=27497.0 | learning_rate=1.41558e-07 | global_batch_size=  384 | lm loss=11.083328 | loss_scale=1.0 | grad_norm=10.560 | actual_seqlen= 4096 | number_of_skipped_iterations=  0 | number_of_nan_iterations=  0 | samples_per_second=13.965 | tokens_per_gpu_per_second_tgs=2383.391 | [LM]TFLOPs=98.32 | [DS]TFLOPs=94.48 |
(min, max) time across ranks (ms):
    forward-backward ...............................: (27237.77, 27238.60)
    optimizer ......................................: (236.64, 237.83)
[2025-03-12 09:43:35,158] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | optimizer_allgather: 222.73 | optimizer_gradients: 0.53 | optimizer_step: 1.00
[2025-03-12 09:43:35,158] [INFO] [logging.py:128:log_dist] [Rank 0] step=46, skipped=0, lr=[1.4470357394234666e-07, 1.4470357394234666e-07], mom=[(0.9, 0.95), (0.9, 0.95)]
[2025-03-12 09:43:35,159] [INFO] [timer.py:264:stop] epoch=0/micro_step=46/global_step=46, RunningAvgSamplesPerSec=150.68493372147117, CurrSamplesPerSec=151.28669764056133, MemAllocated=13.82GB, MaxMemAllocated=45.85GB
[2025-03-12 09:43:35,159] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd_microstep: 7776.96 | bwd_microstep: 19387.18 | bwd_inner_microstep: 18754.51 | bwd_allreduce_microstep: 632.37 | step_microstep: 236.98
[2025-03-12 09:43:35,159] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd: 7776.99 | bwd: 19387.17 | bwd_inner: 18754.57 | bwd_allreduce: 632.36 | step: 236.98
[2025-03-12 09:43:35][I][megatron/training_log:661]  iteration=      46/ 1271565 | consumed_samples=       17664 | consumed_tokens=    72351744 | elapsed_time_per_iteration_ms=27504.6 | learning_rate=1.44704e-07 | global_batch_size=  384 | lm loss=11.064159 | loss_scale=1.0 | grad_norm=11.739 | actual_seqlen= 4096 | number_of_skipped_iterations=  0 | number_of_nan_iterations=  0 | samples_per_second=13.961 | tokens_per_gpu_per_second_tgs=2382.729 | [LM]TFLOPs=98.29 | [DS]TFLOPs=94.46 |
(min, max) time across ranks (ms):
    forward-backward ...............................: (27246.27, 27247.13)
    optimizer ......................................: (236.16, 237.25)
[2025-03-12 09:44:02,703] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | optimizer_allgather: 223.02 | optimizer_gradients: 0.53 | optimizer_step: 0.98
[2025-03-12 09:44:02,704] [INFO] [logging.py:128:log_dist] [Rank 0] step=47, skipped=0, lr=[1.4784930381065852e-07, 1.4784930381065852e-07], mom=[(0.9, 0.95), (0.9, 0.95)]
[2025-03-12 09:44:02,704] [INFO] [timer.py:264:stop] epoch=0/micro_step=47/global_step=47, RunningAvgSamplesPerSec=150.6950169635293, CurrSamplesPerSec=151.1399600384914, MemAllocated=13.82GB, MaxMemAllocated=45.85GB
[2025-03-12 09:44:02,704] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd_microstep: 7778.26 | bwd_microstep: 19423.03 | bwd_inner_microstep: 18789.28 | bwd_allreduce_microstep: 633.45 | step_microstep: 237.42
[2025-03-12 09:44:02,704] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd: 7778.28 | bwd: 19423.03 | bwd_inner: 18789.33 | bwd_allreduce: 633.45 | step: 237.42
[2025-03-12 09:44:02][I][megatron/training_log:661]  iteration=      47/ 1271565 | consumed_samples=       18048 | consumed_tokens=    73924608 | elapsed_time_per_iteration_ms=27545.2 | learning_rate=1.47849e-07 | global_batch_size=  384 | lm loss=11.060505 | loss_scale=1.0 | grad_norm=11.227 | actual_seqlen= 4096 | number_of_skipped_iterations=  0 | number_of_nan_iterations=  0 | samples_per_second=13.941 | tokens_per_gpu_per_second_tgs=2379.220 | [LM]TFLOPs=98.15 | [DS]TFLOPs=94.32 |
(min, max) time across ranks (ms):
    forward-backward ...............................: (27285.29, 27286.13)
    optimizer ......................................: (236.22, 237.69)
[2025-03-12 09:44:30,218] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | optimizer_allgather: 223.47 | optimizer_gradients: 0.53 | optimizer_step: 0.98
[2025-03-12 09:44:30,219] [INFO] [logging.py:128:log_dist] [Rank 0] step=48, skipped=0, lr=[1.509950336789704e-07, 1.509950336789704e-07], mom=[(0.9, 0.95), (0.9, 0.95)]
[2025-03-12 09:44:30,219] [INFO] [timer.py:264:stop] epoch=0/micro_step=48/global_step=48, RunningAvgSamplesPerSec=150.70708793339045, CurrSamplesPerSec=151.25223074254686, MemAllocated=13.82GB, MaxMemAllocated=45.85GB
[2025-03-12 09:44:30,219] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd_microstep: 7767.44 | bwd_microstep: 19402.55 | bwd_inner_microstep: 18770.56 | bwd_allreduce_microstep: 631.70 | step_microstep: 237.82
[2025-03-12 09:44:30,220] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd: 7767.46 | bwd: 19402.55 | bwd_inner: 18770.61 | bwd_allreduce: 631.70 | step: 237.83
[2025-03-12 09:44:30][I][megatron/training_log:661]  iteration=      48/ 1271565 | consumed_samples=       18432 | consumed_tokens=    75497472 | elapsed_time_per_iteration_ms=27515.5 | learning_rate=1.50995e-07 | global_batch_size=  384 | lm loss=11.051532 | loss_scale=1.0 | grad_norm=11.153 | actual_seqlen= 4096 | number_of_skipped_iterations=  0 | number_of_nan_iterations=  0 | samples_per_second=13.956 | tokens_per_gpu_per_second_tgs=2381.785 | [LM]TFLOPs=98.26 | [DS]TFLOPs=94.42 |
(min, max) time across ranks (ms):
    forward-backward ...............................: (27254.93, 27255.75)
    optimizer ......................................: (236.69, 238.10)
[2025-03-12 09:44:57,775] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | optimizer_allgather: 223.24 | optimizer_gradients: 0.53 | optimizer_step: 1.02
[2025-03-12 09:44:57,775] [INFO] [logging.py:128:log_dist] [Rank 0] step=49, skipped=0, lr=[1.541407635472823e-07, 1.541407635472823e-07], mom=[(0.9, 0.95), (0.9, 0.95)]
[2025-03-12 09:44:57,776] [INFO] [timer.py:264:stop] epoch=0/micro_step=49/global_step=49, RunningAvgSamplesPerSec=150.71090194884374, CurrSamplesPerSec=150.8864962974408, MemAllocated=13.82GB, MaxMemAllocated=45.85GB
[2025-03-12 09:44:57,776] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd_microstep: 7784.89 | bwd_microstep: 19426.83 | bwd_inner_microstep: 18790.67 | bwd_allreduce_microstep: 635.86 | step_microstep: 237.48
[2025-03-12 09:44:57,776] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd: 7784.91 | bwd: 19426.82 | bwd_inner: 18790.72 | bwd_allreduce: 635.85 | step: 237.48
[2025-03-12 09:44:57][I][megatron/training_log:661]  iteration=      49/ 1271565 | consumed_samples=       18816 | consumed_tokens=    77070336 | elapsed_time_per_iteration_ms=27556.0 | learning_rate=1.54141e-07 | global_batch_size=  384 | lm loss=11.055607 | loss_scale=1.0 | grad_norm=11.085 | actual_seqlen= 4096 | number_of_skipped_iterations=  0 | number_of_nan_iterations=  0 | samples_per_second=13.935 | tokens_per_gpu_per_second_tgs=2378.280 | [LM]TFLOPs=98.11 | [DS]TFLOPs=94.28 |
(min, max) time across ranks (ms):
    forward-backward ...............................: (27296.79, 27297.61)
    optimizer ......................................: (236.40, 237.75)
[2025-03-12 09:45:25,336] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | optimizer_allgather: 222.64 | optimizer_gradients: 0.53 | optimizer_step: 1.00
[2025-03-12 09:45:25,336] [INFO] [logging.py:128:log_dist] [Rank 0] step=50, skipped=0, lr=[1.5728649341559419e-07, 1.5728649341559419e-07], mom=[(0.9, 0.95), (0.9, 0.95)]
[2025-03-12 09:45:25,337] [INFO] [timer.py:264:stop] epoch=0/micro_step=50/global_step=50, RunningAvgSamplesPerSec=150.71373983157494, CurrSamplesPerSec=150.84718172369494, MemAllocated=13.82GB, MaxMemAllocated=45.85GB
[2025-03-12 09:45:25,337] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd_microstep: 7769.20 | bwd_microstep: 19451.29 | bwd_inner_microstep: 18816.92 | bwd_allreduce_microstep: 634.07 | step_microstep: 236.97
[2025-03-12 09:45:25,337] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd: 7769.22 | bwd: 19451.29 | bwd_inner: 18816.98 | bwd_allreduce: 634.06 | step: 236.97
[2025-03-12 09:45:25][I][megatron/training_log:661]  iteration=      50/ 1271565 | consumed_samples=       19200 | consumed_tokens=    78643200 | elapsed_time_per_iteration_ms=27560.6 | learning_rate=1.57286e-07 | global_batch_size=  384 | lm loss=11.036057 | loss_scale=1.0 | grad_norm=11.555 | actual_seqlen= 4096 | number_of_skipped_iterations=  0 | number_of_nan_iterations=  0 | samples_per_second=13.933 | tokens_per_gpu_per_second_tgs=2377.889 | [LM]TFLOPs=98.10 | [DS]TFLOPs=94.27 |
(min, max) time across ranks (ms):
    forward-backward ...............................: (27301.97, 27302.88)
    optimizer ......................................: (236.15, 237.23)
[2025-03-12 09:45:25][I][megatron/checkpointing:589] Saving lr_state_dict to checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/lr_state_dict.yaml
[2025-03-12 09:45:25][I][megatron/utils:368] saving checkpoint at iteration      50 to checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash
[2025-03-12 09:45:25,373] [INFO] [logging.py:128:log_dist] [Rank 0] [Torch] Checkpoint global_step50 is about to be saved!
[2025-03-12 09:45:25,383] [INFO] [logging.py:128:log_dist] [Rank 0] Saving model checkpoint: checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/global_step50/mp_rank_00_model_states.pt
[2025-03-12 09:45:25,384] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/global_step50/mp_rank_00_model_states.pt...
[2025-03-12 09:45:36,631] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/global_step50/mp_rank_00_model_states.pt.
[2025-03-12 09:45:36,655] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/global_step50/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt...
[2025-03-12 09:45:36,655] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/global_step50/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt...
[2025-03-12 09:45:36,655] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/global_step50/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt...
[2025-03-12 09:45:36,655] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/global_step50/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt...
[2025-03-12 09:45:36,655] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/global_step50/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt...
[2025-03-12 09:45:36,655] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/global_step50/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt...
[2025-03-12 09:45:36,655] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/global_step50/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt...
[2025-03-12 09:45:36,655] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/global_step50/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt...
[2025-03-12 09:45:36,655] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/global_step50/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt...
[2025-03-12 09:45:36,655] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/global_step50/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt...
[2025-03-12 09:45:36,655] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/global_step50/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt...
[2025-03-12 09:45:36,655] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/global_step50/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt...
[2025-03-12 09:45:36,655] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/global_step50/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt...
[2025-03-12 09:45:36,655] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/global_step50/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt...
[2025-03-12 09:45:36,655] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/global_step50/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt...
[2025-03-12 09:45:36,655] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/global_step50/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt...
[2025-03-12 09:45:36,655] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/global_step50/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt...
[2025-03-12 09:45:36,655] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/global_step50/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt...
[2025-03-12 09:45:36,655] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/global_step50/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt...
[2025-03-12 09:45:36,655] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/global_step50/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt...
[2025-03-12 09:45:36,656] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/global_step50/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt...
[2025-03-12 09:45:36,656] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/global_step50/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt...
[2025-03-12 09:45:36,656] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/global_step50/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt...
[2025-03-12 09:45:36,655] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/global_step50/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt...
[2025-03-12 09:45:40,037] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/global_step50/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt.
[2025-03-12 09:45:40,037] [INFO] [engine.py:3640:_save_zero_checkpoint] zero checkpoint saved checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/global_step50/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt
[2025-03-12 09:45:40,038] [INFO] [torch_checkpoint_engine.py:33:commit] [Torch] Checkpoint global_step50 is ready now!
[2025-03-12 09:45:40,074] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/global_step50/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt.
[2025-03-12 09:45:40,074] [INFO] [engine.py:3640:_save_zero_checkpoint] zero checkpoint saved checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/global_step50/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt
[2025-03-12 09:45:40,074] [INFO] [torch_checkpoint_engine.py:33:commit] [Torch] Checkpoint global_step50 is ready now!
[2025-03-12 09:45:40,094] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/global_step50/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt.
[2025-03-12 09:45:40,094] [INFO] [engine.py:3640:_save_zero_checkpoint] zero checkpoint saved checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/global_step50/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt
[2025-03-12 09:45:40,095] [INFO] [torch_checkpoint_engine.py:33:commit] [Torch] Checkpoint global_step50 is ready now!
[2025-03-12 09:45:40,098] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/global_step50/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt.
[2025-03-12 09:45:40,099] [INFO] [engine.py:3640:_save_zero_checkpoint] zero checkpoint saved checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/global_step50/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt
[2025-03-12 09:45:40,099] [INFO] [torch_checkpoint_engine.py:33:commit] [Torch] Checkpoint global_step50 is ready now!
[2025-03-12 09:45:40,106] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/global_step50/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt.
[2025-03-12 09:45:40,107] [INFO] [engine.py:3640:_save_zero_checkpoint] zero checkpoint saved checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/global_step50/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt
[2025-03-12 09:45:40,107] [INFO] [torch_checkpoint_engine.py:33:commit] [Torch] Checkpoint global_step50 is ready now!
[2025-03-12 09:45:40,107] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/global_step50/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt.
[2025-03-12 09:45:40,108] [INFO] [engine.py:3640:_save_zero_checkpoint] zero checkpoint saved checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/global_step50/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt
[2025-03-12 09:45:40,108] [INFO] [torch_checkpoint_engine.py:33:commit] [Torch] Checkpoint global_step50 is ready now!
[2025-03-12 09:45:40,108] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/global_step50/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt.
[2025-03-12 09:45:40,108] [INFO] [engine.py:3640:_save_zero_checkpoint] zero checkpoint saved checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/global_step50/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt
[2025-03-12 09:45:40,108] [INFO] [torch_checkpoint_engine.py:33:commit] [Torch] Checkpoint global_step50 is ready now!
[2025-03-12 09:45:40,111] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/global_step50/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt.
[2025-03-12 09:45:40,114] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/global_step50/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt.
[2025-03-12 09:45:40,115] [INFO] [engine.py:3640:_save_zero_checkpoint] zero checkpoint saved checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/global_step50/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt
[2025-03-12 09:45:40,115] [INFO] [torch_checkpoint_engine.py:33:commit] [Torch] Checkpoint global_step50 is ready now!
[2025-03-12 09:45:40,118] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/global_step50/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt.
[2025-03-12 09:45:40,118] [INFO] [engine.py:3640:_save_zero_checkpoint] zero checkpoint saved checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/global_step50/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt
[2025-03-12 09:45:40,118] [INFO] [torch_checkpoint_engine.py:33:commit] [Torch] Checkpoint global_step50 is ready now!
[2025-03-12 09:45:40,120] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/global_step50/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt.
[2025-03-12 09:45:40,120] [INFO] [engine.py:3640:_save_zero_checkpoint] zero checkpoint saved checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/global_step50/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt
[2025-03-12 09:45:40,120] [INFO] [torch_checkpoint_engine.py:33:commit] [Torch] Checkpoint global_step50 is ready now!
[2025-03-12 09:45:40,120] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/global_step50/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt.
[2025-03-12 09:45:40,120] [INFO] [engine.py:3640:_save_zero_checkpoint] zero checkpoint saved checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/global_step50/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt
[2025-03-12 09:45:40,121] [INFO] [torch_checkpoint_engine.py:33:commit] [Torch] Checkpoint global_step50 is ready now!
[2025-03-12 09:45:40,123] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/global_step50/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt.
[2025-03-12 09:45:40,124] [INFO] [engine.py:3640:_save_zero_checkpoint] zero checkpoint saved checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/global_step50/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt
[2025-03-12 09:45:40,124] [INFO] [torch_checkpoint_engine.py:33:commit] [Torch] Checkpoint global_step50 is ready now!
[2025-03-12 09:45:40,134] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/global_step50/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt.
[2025-03-12 09:45:40,134] [INFO] [engine.py:3640:_save_zero_checkpoint] zero checkpoint saved checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/global_step50/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt
[2025-03-12 09:45:40,134] [INFO] [torch_checkpoint_engine.py:33:commit] [Torch] Checkpoint global_step50 is ready now!
[2025-03-12 09:45:40,138] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/global_step50/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt.
[2025-03-12 09:45:40,138] [INFO] [engine.py:3640:_save_zero_checkpoint] zero checkpoint saved checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/global_step50/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt
[2025-03-12 09:45:40,138] [INFO] [torch_checkpoint_engine.py:33:commit] [Torch] Checkpoint global_step50 is ready now!
[2025-03-12 09:45:40,143] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/global_step50/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt.
[2025-03-12 09:45:40,143] [INFO] [engine.py:3640:_save_zero_checkpoint] zero checkpoint saved checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/global_step50/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt
[2025-03-12 09:45:40,143] [INFO] [torch_checkpoint_engine.py:33:commit] [Torch] Checkpoint global_step50 is ready now!
[2025-03-12 09:45:40,145] [INFO] [engine.py:3640:_save_zero_checkpoint] zero checkpoint saved checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/global_step50/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt
[2025-03-12 09:45:40,145] [INFO] [torch_checkpoint_engine.py:33:commit] [Torch] Checkpoint global_step50 is ready now!
[2025-03-12 09:45:40,146] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/global_step50/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt.
[2025-03-12 09:45:40,147] [INFO] [engine.py:3640:_save_zero_checkpoint] zero checkpoint saved checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/global_step50/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt
[2025-03-12 09:45:40,147] [INFO] [torch_checkpoint_engine.py:33:commit] [Torch] Checkpoint global_step50 is ready now!
[2025-03-12 09:45:40,159] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/global_step50/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt.
[2025-03-12 09:45:40,159] [INFO] [engine.py:3640:_save_zero_checkpoint] zero checkpoint saved checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/global_step50/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt
[2025-03-12 09:45:40,160] [INFO] [torch_checkpoint_engine.py:33:commit] [Torch] Checkpoint global_step50 is ready now!
[2025-03-12 09:45:40,162] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/global_step50/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt.
[2025-03-12 09:45:40,162] [INFO] [engine.py:3640:_save_zero_checkpoint] zero checkpoint saved checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/global_step50/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt
[2025-03-12 09:45:40,162] [INFO] [torch_checkpoint_engine.py:33:commit] [Torch] Checkpoint global_step50 is ready now!
[2025-03-12 09:45:40,180] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/global_step50/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt.
[2025-03-12 09:45:40,181] [INFO] [engine.py:3640:_save_zero_checkpoint] zero checkpoint saved checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/global_step50/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt
[2025-03-12 09:45:40,181] [INFO] [torch_checkpoint_engine.py:33:commit] [Torch] Checkpoint global_step50 is ready now!
[2025-03-12 09:45:40,181] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/global_step50/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt.
[2025-03-12 09:45:40,181] [INFO] [engine.py:3640:_save_zero_checkpoint] zero checkpoint saved checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/global_step50/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt
[2025-03-12 09:45:40,181] [INFO] [torch_checkpoint_engine.py:33:commit] [Torch] Checkpoint global_step50 is ready now!
[2025-03-12 09:45:40,283] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/global_step50/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt.
[2025-03-12 09:45:40,283] [INFO] [engine.py:3640:_save_zero_checkpoint] zero checkpoint saved checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/global_step50/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt
[2025-03-12 09:45:40,283] [INFO] [torch_checkpoint_engine.py:33:commit] [Torch] Checkpoint global_step50 is ready now!
[2025-03-12 09:45:40,356] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/global_step50/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt.
[2025-03-12 09:45:40,356] [INFO] [engine.py:3640:_save_zero_checkpoint] zero checkpoint saved checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/global_step50/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt
[2025-03-12 09:45:40,356] [INFO] [torch_checkpoint_engine.py:33:commit] [Torch] Checkpoint global_step50 is ready now!
[2025-03-12 09:45:40,359] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/global_step50/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt.
[2025-03-12 09:45:40,359] [INFO] [engine.py:3640:_save_zero_checkpoint] zero checkpoint saved checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash/global_step50/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt
[2025-03-12 09:45:40,359] [INFO] [torch_checkpoint_engine.py:33:commit] [Torch] Checkpoint global_step50 is ready now!
[2025-03-12 09:45:40][I][megatron/utils:368]   successfully saved checkpoint at iteration      50 to checkpoints/ws24_ds_stage1_nl32_hs4096_mb1_seq4096_gb384_sp1_pp1_tp1_bf16_optadamw_lr_lwf_flash
[2025-03-12 09:45:40][I][megatron/utils:368] Checkpoint Save GB: 83.064, GB/Sec: 5.53, Latency(second): 15.011
(min, max) time across ranks (ms):
    save-checkpoint ................................: (15010.94, 15011.01)
[2025-03-12 09:45:40][I][ezpz/dist:125] `save_checkpoint_and_time`((50, [DeepSpeedEngine(
  (module): GPTModel(
    (language_model): TransformerLanguageModel(
      (embedding): Embedding(
        (word_embeddings): VocabParallelEmbedding()
        (embedding_dropout): Dropout(p=0.0, inplace=False)
      )
      (rotary_pos_emb): RotaryEmbedding()
      (encoder): ParallelTransformer(
        (layers): ModuleList(
          (0-31): 32 x ParallelTransformerLayer(
            (input_layernorm): RMSNorm()
            (self_attention): ParallelAttention(
              (query_key_value): ColumnParallelLinear()
              (core_attention_flash): FlashSelfAttention()
              (dense): RowParallelLinear()
            )
            (post_attention_layernorm): RMSNorm()
            (mlp): ParallelMLP(
              (dense_h_to_4h): ColumnParallelLinear()
              (dense_4h_to_h): RowParallelLinear()
            )
          )
        )
        (final_layernorm): RMSNorm()
      )
      (output_layer): ColumnParallelLinear()
    )
  )
)], <deepspeed.runtime.zero.stage_1_and_2.DeepSpeedZeroOptimizer object at 0x152e647463b0>, <megatron.optimizer_param_scheduler.OptimizerParamScheduler object at 0x152e64716bc0>)) took: dt=15.0156s
[2025-03-12 09:46:07,899] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | optimizer_allgather: 222.88 | optimizer_gradients: 0.53 | optimizer_step: 1.00
[2025-03-12 09:46:07,899] [INFO] [logging.py:128:log_dist] [Rank 0] step=51, skipped=0, lr=[1.6043222328390607e-07, 1.6043222328390607e-07], mom=[(0.9, 0.95), (0.9, 0.95)]
[2025-03-12 09:46:07,900] [INFO] [timer.py:264:stop] epoch=0/micro_step=51/global_step=51, RunningAvgSamplesPerSec=150.72006367000233, CurrSamplesPerSec=151.0241738655975, MemAllocated=13.82GB, MaxMemAllocated=45.85GB
[2025-03-12 09:46:07,900] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd_microstep: 7764.78 | bwd_microstep: 19433.86 | bwd_inner_microstep: 18794.45 | bwd_allreduce_microstep: 639.12 | step_microstep: 237.23
[2025-03-12 09:46:07,900] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd: 7764.80 | bwd: 19433.86 | bwd_inner: 18794.50 | bwd_allreduce: 639.11 | step: 237.23
[2025-03-12 09:46:07][I][megatron/training_log:661]  iteration=      51/ 1271565 | consumed_samples=       19584 | consumed_tokens=    80216064 | elapsed_time_per_iteration_ms=42562.8 | learning_rate=1.60432e-07 | global_batch_size=  384 | lm loss=11.035048 | loss_scale=1.0 | grad_norm=11.677 | actual_seqlen= 4096 | number_of_skipped_iterations=  0 | number_of_nan_iterations=  0 | samples_per_second=9.022 | tokens_per_gpu_per_second_tgs=1539.747 | [LM]TFLOPs=63.52 | [DS]TFLOPs=61.04 |
(min, max) time across ranks (ms):
    forward-backward ...............................: (27283.22, 27284.14)
    optimizer ......................................: (236.35, 237.48)
[2025-03-12 09:46:35,470] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | optimizer_allgather: 223.11 | optimizer_gradients: 0.56 | optimizer_step: 1.02
[2025-03-12 09:46:35,470] [INFO] [logging.py:128:log_dist] [Rank 0] step=52, skipped=0, lr=[1.6357795315221793e-07, 1.6357795315221793e-07], mom=[(0.9, 0.95), (0.9, 0.95)]
[2025-03-12 09:46:35,470] [INFO] [timer.py:264:stop] epoch=0/micro_step=52/global_step=52, RunningAvgSamplesPerSec=150.71736368657093, CurrSamplesPerSec=150.58512384240427, MemAllocated=13.82GB, MaxMemAllocated=45.85GB
[2025-03-12 09:46:35,471] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd_microstep: 7767.80 | bwd_microstep: 19448.09 | bwd_inner_microstep: 18805.60 | bwd_allreduce_microstep: 642.19 | step_microstep: 237.50
[2025-03-12 09:46:35,471] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd: 7767.82 | bwd: 19448.09 | bwd_inner: 18805.66 | bwd_allreduce: 642.18 | step: 237.50
[2025-03-12 09:46:35][I][megatron/training_log:661]  iteration=      52/ 1271565 | consumed_samples=       19968 | consumed_tokens=    81788928 | elapsed_time_per_iteration_ms=27570.6 | learning_rate=1.63578e-07 | global_batch_size=  384 | lm loss=11.030777 | loss_scale=1.0 | grad_norm=11.075 | actual_seqlen= 4096 | number_of_skipped_iterations=  0 | number_of_nan_iterations=  0 | samples_per_second=13.928 | tokens_per_gpu_per_second_tgs=2377.024 | [LM]TFLOPs=98.06 | [DS]TFLOPs=94.23 |
(min, max) time across ranks (ms):
    forward-backward ...............................: (27311.23, 27311.94)
    optimizer ......................................: (236.33, 237.78)
[2025-03-12 09:47:02,999] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | optimizer_allgather: 222.64 | optimizer_gradients: 0.56 | optimizer_step: 1.02
[2025-03-12 09:47:02,999] [INFO] [logging.py:128:log_dist] [Rank 0] step=53, skipped=0, lr=[1.6672368302052982e-07, 1.6672368302052982e-07], mom=[(0.9, 0.95), (0.9, 0.95)]
[2025-03-12 09:47:03,000] [INFO] [timer.py:264:stop] epoch=0/micro_step=53/global_step=53, RunningAvgSamplesPerSec=150.72290826800395, CurrSamplesPerSec=151.00059905306884, MemAllocated=13.82GB, MaxMemAllocated=45.85GB
[2025-03-12 09:47:03,000] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd_microstep: 7763.91 | bwd_microstep: 19398.25 | bwd_inner_microstep: 18760.66 | bwd_allreduce_microstep: 637.29 | step_microstep: 237.00
[2025-03-12 09:47:03,000] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd: 7763.93 | bwd: 19398.24 | bwd_inner: 18760.71 | bwd_allreduce: 637.28 | step: 237.00
[2025-03-12 09:47:03][I][megatron/training_log:661]  iteration=      53/ 1271565 | consumed_samples=       20352 | consumed_tokens=    83361792 | elapsed_time_per_iteration_ms=27544.1 | learning_rate=1.66724e-07 | global_batch_size=  384 | lm loss=11.005310 | loss_scale=1.0 | grad_norm=11.063 | actual_seqlen= 4096 | number_of_skipped_iterations=  0 | number_of_nan_iterations=  0 | samples_per_second=13.941 | tokens_per_gpu_per_second_tgs=2379.315 | [LM]TFLOPs=98.15 | [DS]TFLOPs=94.32 |
(min, max) time across ranks (ms):
    forward-backward ...............................: (27270.48, 27271.12)
    optimizer ......................................: (236.09, 237.29)
[2025-03-12 09:47:30,531] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | optimizer_allgather: 223.01 | optimizer_gradients: 0.53 | optimizer_step: 1.03
[2025-03-12 09:47:30,531] [INFO] [logging.py:128:log_dist] [Rank 0] step=54, skipped=0, lr=[1.6986941288884168e-07, 1.6986941288884168e-07], mom=[(0.9, 0.95), (0.9, 0.95)]
[2025-03-12 09:47:30,532] [INFO] [timer.py:264:stop] epoch=0/micro_step=54/global_step=54, RunningAvgSamplesPerSec=150.72117384244626, CurrSamplesPerSec=150.63271194928672, MemAllocated=13.82GB, MaxMemAllocated=45.85GB
[2025-03-12 09:47:30,532] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd_microstep: 7772.02 | bwd_microstep: 19385.58 | bwd_inner_microstep: 18743.53 | bwd_allreduce_microstep: 641.75 | step_microstep: 237.28
[2025-03-12 09:47:30,532] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd: 7772.04 | bwd: 19385.58 | bwd_inner: 18743.59 | bwd_allreduce: 641.74 | step: 237.28
[2025-03-12 09:47:30][I][megatron/training_log:661]  iteration=      54/ 1271565 | consumed_samples=       20736 | consumed_tokens=    84934656 | elapsed_time_per_iteration_ms=27517.0 | learning_rate=1.69869e-07 | global_batch_size=  384 | lm loss=10.987799 | loss_scale=1.0 | grad_norm=11.460 | actual_seqlen= 4096 | number_of_skipped_iterations=  0 | number_of_nan_iterations=  0 | samples_per_second=13.955 | tokens_per_gpu_per_second_tgs=2381.653 | [LM]TFLOPs=98.25 | [DS]TFLOPs=94.41 |
(min, max) time across ranks (ms):
    forward-backward ...............................: (27257.15, 27258.15)
    optimizer ......................................: (236.14, 237.55)
[2025-03-12 09:47:58,101] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | optimizer_allgather: 223.37 | optimizer_gradients: 0.53 | optimizer_step: 1.01
[2025-03-12 09:47:58,102] [INFO] [logging.py:128:log_dist] [Rank 0] step=55, skipped=0, lr=[1.730151427571536e-07, 1.730151427571536e-07], mom=[(0.9, 0.95), (0.9, 0.95)]
[2025-03-12 09:47:58,102] [INFO] [timer.py:264:stop] epoch=0/micro_step=55/global_step=55, RunningAvgSamplesPerSec=150.71704256819055, CurrSamplesPerSec=150.50246896079437, MemAllocated=13.82GB, MaxMemAllocated=45.85GB
[2025-03-12 09:47:58,102] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd_microstep: 7766.86 | bwd_microstep: 19441.42 | bwd_inner_microstep: 18802.36 | bwd_allreduce_microstep: 638.76 | step_microstep: 237.69
[2025-03-12 09:47:58,102] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd: 7766.88 | bwd: 19441.42 | bwd_inner: 18802.42 | bwd_allreduce: 638.75 | step: 237.69
[2025-03-12 09:47:58][I][megatron/training_log:661]  iteration=      55/ 1271565 | consumed_samples=       21120 | consumed_tokens=    86507520 | elapsed_time_per_iteration_ms=27569.8 | learning_rate=1.73015e-07 | global_batch_size=  384 | lm loss=10.956585 | loss_scale=1.0 | grad_norm=12.528 | actual_seqlen= 4096 | number_of_skipped_iterations=  0 | number_of_nan_iterations=  0 | samples_per_second=13.928 | tokens_per_gpu_per_second_tgs=2377.090 | [LM]TFLOPs=98.06 | [DS]TFLOPs=94.23 |
(min, max) time across ranks (ms):
    forward-backward ...............................: (27310.30, 27311.15)
    optimizer ......................................: (236.76, 237.96)
[2025-03-12 09:48:25,592] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | optimizer_allgather: 222.78 | optimizer_gradients: 0.53 | optimizer_step: 1.04
[2025-03-12 09:48:25,592] [INFO] [logging.py:128:log_dist] [Rank 0] step=56, skipped=0, lr=[1.7616087262546548e-07, 1.7616087262546548e-07], mom=[(0.9, 0.95), (0.9, 0.95)]
[2025-03-12 09:48:25,593] [INFO] [timer.py:264:stop] epoch=0/micro_step=56/global_step=56, RunningAvgSamplesPerSec=150.71751857754913, CurrSamplesPerSec=150.74269220150842, MemAllocated=13.82GB, MaxMemAllocated=45.85GB
[2025-03-12 09:48:25,593] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd_microstep: 7746.50 | bwd_microstep: 19399.74 | bwd_inner_microstep: 18758.68 | bwd_allreduce_microstep: 640.77 | step_microstep: 237.31
[2025-03-12 09:48:25,593] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd: 7746.52 | bwd: 19399.73 | bwd_inner: 18758.73 | bwd_allreduce: 640.77 | step: 237.31
[2025-03-12 09:48:25][I][megatron/training_log:661]  iteration=      56/ 1271565 | consumed_samples=       21504 | consumed_tokens=    88080384 | elapsed_time_per_iteration_ms=27490.6 | learning_rate=1.76161e-07 | global_batch_size=  384 | lm loss=10.941004 | loss_scale=1.0 | grad_norm=11.555 | actual_seqlen= 4096 | number_of_skipped_iterations=  0 | number_of_nan_iterations=  0 | samples_per_second=13.968 | tokens_per_gpu_per_second_tgs=2383.944 | [LM]TFLOPs=98.35 | [DS]TFLOPs=94.51 |
(min, max) time across ranks (ms):
    forward-backward ...............................: (27231.20, 27231.96)
    optimizer ......................................: (235.97, 237.59)
[2025-03-12 09:48:53,163] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | optimizer_allgather: 223.03 | optimizer_gradients: 0.53 | optimizer_step: 0.98
[2025-03-12 09:48:53,163] [INFO] [logging.py:128:log_dist] [Rank 0] step=57, skipped=0, lr=[1.7930660249377737e-07, 1.7930660249377737e-07], mom=[(0.9, 0.95), (0.9, 0.95)]
[2025-03-12 09:48:53,164] [INFO] [timer.py:264:stop] epoch=0/micro_step=57/global_step=57, RunningAvgSamplesPerSec=150.71533374439005, CurrSamplesPerSec=150.59738768407385, MemAllocated=13.82GB, MaxMemAllocated=45.85GB
[2025-03-12 09:48:53,164] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd_microstep: 7786.85 | bwd_microstep: 19439.95 | bwd_inner_microstep: 18801.56 | bwd_allreduce_microstep: 638.08 | step_microstep: 237.27
[2025-03-12 09:48:53,164] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd: 7786.87 | bwd: 19439.94 | bwd_inner: 18801.62 | bwd_allreduce: 638.07 | step: 237.27
[2025-03-12 09:48:53][I][megatron/training_log:661]  iteration=      57/ 1271565 | consumed_samples=       21888 | consumed_tokens=    89653248 | elapsed_time_per_iteration_ms=27570.8 | learning_rate=1.79307e-07 | global_batch_size=  384 | lm loss=10.909065 | loss_scale=1.0 | grad_norm=12.329 | actual_seqlen= 4096 | number_of_skipped_iterations=  0 | number_of_nan_iterations=  0 | samples_per_second=13.928 | tokens_per_gpu_per_second_tgs=2377.011 | [LM]TFLOPs=98.06 | [DS]TFLOPs=94.23 |
(min, max) time across ranks (ms):
    forward-backward ...............................: (27310.97, 27312.03)
    optimizer ......................................: (236.29, 237.54)
[2025-03-12 09:49:20,721] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | optimizer_allgather: 223.22 | optimizer_gradients: 0.53 | optimizer_step: 1.01
[2025-03-12 09:49:20,721] [INFO] [logging.py:128:log_dist] [Rank 0] step=58, skipped=0, lr=[1.8245233236208926e-07, 1.8245233236208926e-07], mom=[(0.9, 0.95), (0.9, 0.95)]
[2025-03-12 09:49:20,722] [INFO] [timer.py:264:stop] epoch=0/micro_step=58/global_step=58, RunningAvgSamplesPerSec=150.718896873452, CurrSamplesPerSec=150.91506945042445, MemAllocated=13.82GB, MaxMemAllocated=45.85GB
[2025-03-12 09:49:20,722] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd_microstep: 7763.22 | bwd_microstep: 19449.94 | bwd_inner_microstep: 18814.93 | bwd_allreduce_microstep: 634.72 | step_microstep: 237.46
[2025-03-12 09:49:20,722] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd: 7763.24 | bwd: 19449.94 | bwd_inner: 18814.98 | bwd_allreduce: 634.71 | step: 237.47
[2025-03-12 09:49:20][I][megatron/training_log:661]  iteration=      58/ 1271565 | consumed_samples=       22272 | consumed_tokens=    91226112 | elapsed_time_per_iteration_ms=27557.6 | learning_rate=1.82452e-07 | global_batch_size=  384 | lm loss=10.902085 | loss_scale=1.0 | grad_norm=12.222 | actual_seqlen= 4096 | number_of_skipped_iterations=  0 | number_of_nan_iterations=  0 | samples_per_second=13.934 | tokens_per_gpu_per_second_tgs=2378.145 | [LM]TFLOPs=98.11 | [DS]TFLOPs=94.28 |
(min, max) time across ranks (ms):
    forward-backward ...............................: (27297.85, 27298.71)
    optimizer ......................................: (236.37, 237.74)
[2025-03-12 09:49:48,273] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | optimizer_allgather: 223.24 | optimizer_gradients: 0.52 | optimizer_step: 1.02
[2025-03-12 09:49:48,274] [INFO] [logging.py:128:log_dist] [Rank 0] step=59, skipped=0, lr=[1.8559806223040112e-07, 1.8559806223040112e-07], mom=[(0.9, 0.95), (0.9, 0.95)]
[2025-03-12 09:49:48,274] [INFO] [timer.py:264:stop] epoch=0/micro_step=59/global_step=59, RunningAvgSamplesPerSec=150.71482314914178, CurrSamplesPerSec=150.48698654380541, MemAllocated=13.82GB, MaxMemAllocated=45.85GB
[2025-03-12 09:49:48,274] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd_microstep: 7761.09 | bwd_microstep: 19449.90 | bwd_inner_microstep: 18812.11 | bwd_allreduce_microstep: 637.50 | step_microstep: 237.58
[2025-03-12 09:49:48,275] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd: 7761.11 | bwd: 19449.90 | bwd_inner: 18812.17 | bwd_allreduce: 637.49 | step: 237.58
[2025-03-12 09:49:48][I][megatron/training_log:661]  iteration=      59/ 1271565 | consumed_samples=       22656 | consumed_tokens=    92798976 | elapsed_time_per_iteration_ms=27552.8 | learning_rate=1.85598e-07 | global_batch_size=  384 | lm loss=10.873564 | loss_scale=1.0 | grad_norm=12.056 | actual_seqlen= 4096 | number_of_skipped_iterations=  0 | number_of_nan_iterations=  0 | samples_per_second=13.937 | tokens_per_gpu_per_second_tgs=2378.561 | [LM]TFLOPs=98.12 | [DS]TFLOPs=94.29 |
(min, max) time across ranks (ms):
    forward-backward ...............................: (27293.18, 27293.90)
    optimizer ......................................: (236.58, 237.85)
[2025-03-12 09:50:15,809] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | optimizer_allgather: 222.77 | optimizer_gradients: 0.55 | optimizer_step: 1.04
[2025-03-12 09:50:15,809] [INFO] [logging.py:128:log_dist] [Rank 0] step=60, skipped=0, lr=[1.8874379209871303e-07, 1.8874379209871303e-07], mom=[(0.9, 0.95), (0.9, 0.95)]
[2025-03-12 09:50:15,810] [INFO] [timer.py:264:stop] epoch=0/micro_step=60/global_step=60, RunningAvgSamplesPerSec=150.722766862357, CurrSamplesPerSec=151.17688735437054, MemAllocated=13.82GB, MaxMemAllocated=45.85GB
[2025-03-12 09:50:15,810] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd_microstep: 7770.25 | bwd_microstep: 19423.63 | bwd_inner_microstep: 18790.14 | bwd_allreduce_microstep: 633.19 | step_microstep: 237.08
[2025-03-12 09:50:15,810] [INFO] [logging.py:128:log_dist] [Rank 0] time (ms) | fwd: 7770.28 | bwd: 19423.63 | bwd_inner: 18790.20 | bwd_allreduce: 633.18 | step: 237.08
[2025-03-12 09:50:15][I][megatron/training_log:661]  iteration=      60/ 1271565 | consumed_samples=       23040 | consumed_tokens=    94371840 | elapsed_time_per_iteration_ms=27535.1 | learning_rate=1.88744e-07 | global_batch_size=  384 | lm loss=10.851992 | loss_scale=1.0 | grad_norm=12.578 | actual_seqlen= 4096 | number_of_skipped_iterations=  0 | number_of_nan_iterations=  0 | samples_per_second=13.946 | tokens_per_gpu_per_second_tgs=2380.086 | [LM]TFLOPs=98.19 | [DS]TFLOPs=94.35 |
(min, max) time across ranks (ms):
    forward-backward ...............................: (27276.42, 27277.17)
    optimizer ......................................: (236.00, 237.35)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment