Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rearrange repo, add docs, update results #17

Open
wants to merge 44 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
44 commits
Select commit Hold shift + click to select a range
4e442ee
moved systems
kessel Oct 8, 2021
b8d6f08
added everything to Helmholtz AI folder
kessel Oct 8, 2021
46d7320
modified for submission
kessel Oct 8, 2021
25076b8
fixed logs
kessel Oct 8, 2021
7587bff
added missing systems
kessel Oct 8, 2021
0e31c9b
added docs
kessel Oct 8, 2021
598632e
added docs
kessel Oct 8, 2021
52c3475
added docs
kessel Oct 8, 2021
598aed8
Update cosmoflow results
janEbert Oct 8, 2021
922ce2d
Filter updated results
janEbert Oct 8, 2021
a0d48ba
Add 10-instance weak scaling script
janEbert Oct 8, 2021
f196d14
Clean up whitespace
janEbert Oct 8, 2021
45f1573
Fix typos
janEbert Oct 8, 2021
9dc8dc1
Clean up whitespace
janEbert Oct 8, 2021
cf67c2d
Fix typos
janEbert Oct 8, 2021
bb770bb
Fix dissimilarities from DeepCAM README
janEbert Oct 8, 2021
2466258
Adjust DeepCAM code paths
janEbert Nov 8, 2021
62b025e
Adjust cached scratch location
janEbert May 10, 2022
8448eab
Adjust CosmoFlow code paths
janEbert May 10, 2022
230b0de
Adjust project name
janEbert May 10, 2022
6bf1010
Remove erroneous path separator
janEbert May 10, 2022
b46862a
Move run scripts to separate directory
janEbert May 10, 2022
72434b7
Always create output directories
janEbert May 10, 2022
28a6fba
Automatically format output directory
janEbert May 10, 2022
d227831
Fix account name
janEbert May 10, 2022
201aa77
Update HDF5 conversion scripts
janEbert May 10, 2022
0223609
Use same number of GPUs as tasks per node
janEbert May 18, 2022
eea680f
Refactor IME conditionals
janEbert May 18, 2022
f4ebedf
Adjust paths
janEbert May 18, 2022
fd555e6
Add optional IME usage
janEbert May 18, 2022
152c6e2
Adjust data paths
janEbert May 18, 2022
2c25cce
Allow running with invalid configuration
janEbert May 18, 2022
e58aec3
Increase number of nodes required for prestaging
janEbert May 19, 2022
46927c9
Set different seed automatically
janEbert May 19, 2022
e8ec724
Do not switch preshuffling if using HDF5
janEbert May 19, 2022
b70a69a
Make sure data is in cache
janEbert May 19, 2022
f1426d8
total final updates for 2021
coquelin77 May 19, 2022
18c0a17
Execute container runscript
janEbert May 19, 2022
3bc5916
Merge pull request #5 from janEbert/main
kessel May 19, 2022
cdd514b
Rename Singularity -> Apptainer
janEbert May 24, 2022
3861325
Adjust directories to be user-specific
janEbert May 24, 2022
448bbe0
Automatically select JUWELS Booster partition
janEbert May 25, 2022
6ff0f2d
Print IME usage
janEbert May 25, 2022
cec0cf9
Do not use HDF5 file locking
janEbert May 25, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
27 changes: 27 additions & 0 deletions HelmholtzAI/benchmarks/implementations/cosmoflow/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
# How To Run

This repo holds the code and configs for running on the JUWELS Booster as well as on HoreKa at KIT.
To run a training, simply use the `start_training_run.sh` script from within the `run_scripts` directory.
This will trigger the other scripts in this order: `start_**_training.sh` and `run_and_time.sh`, where `**`
is the training system (`jb` for JUWELS Booster and `horeka` for HoreKa).

Example usage:
```bash
./start_training_run --system booster --nodes 34 --time 01:00:00 --config "config_file_path"
```
On JUWELS Booster, a job can be started by calling e.g. `run_cosmoflow_256x4x1.sbatch`. This will queue a
job with 256 and 4 GPUs per node, and a local batch size of 1.

# Repo Structure

This implementation is based on the implementation of NVIDIA, based on their containers. In our implementation,
we use Apptainer with an image that is almost identical to NVIDIA's image, except for installations of the packages
h5py and pmi, which we have added to the containers, and then converted to `.sif` files for Apptainer. We have copied
NVIDIA's Python code for the benchmark out of the container and adjusted it to our needs. This modified code is found
in the directory `cosmoflow`.

In the directory, `hdf5_io` our conversion method to HDF5 is implemented. See below.

# HDF5 IO
We use HDF5 to store the input data. The conversion is implemented in `convert_cosmoflow_to_hdf5.py`. In summary, this method creates
large `.h5` files for training and validation.
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,13 @@ def stage_files(
read_chunk_size: int,
) -> Tuple[List[str], List[str], Callable]:
number_of_nodes = dist_desc.size // dist_desc.local_size // shard_mult
if number_of_nodes == 0:
print(
'WARNING: Number of nodes is too low for the data split. '
'Either discard this result or use it only for debugging.'
)
number_of_nodes = 1

current_node = dist_desc.rank // dist_desc.local_size // shard_mult
files_per_node = len(data_filenames) // number_of_nodes
assert (
Expand Down Expand Up @@ -151,6 +158,13 @@ def __init__(
if shard_type == 'local':
number_of_nodes = \
dist_desc.size // dist_desc.local_size // shard_mult
if number_of_nodes == 0:
print(
'WARNING: Number of nodes is too low for the data split. '
'Either discard this result or use it only for debugging.'
)
number_of_nodes = 1

current_node = dist_desc.rank // dist_desc.local_size // shard_mult

if self.preshuffle:
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,179 @@
import io
import os
import time

import h5py
from mpi4py import MPI
import numpy as np
import tarfile


def load_numpy(tar_file, tar_member):
with tar_file.extractfile(tar_member) as f:
return np.load(io.BytesIO(f.read()))


# We do _not_ want to preshuffle! CosmoFlow is very sensitive to data
# order/distribution.
preshuffle = False
seed = 42

project_name = 'hai_mlperf'
# Try to use CSCRATCH, fall back to SCRATCH.
using_cscratch = ('CSCRATCH_' + project_name) in os.environ
project_dir = os.getenv(
'CSCRATCH_' + project_name,
os.getenv('SCRATCH_' + project_name),
)

# Read file
tar_file_name = os.path.join(
project_dir, "cosmoUniverse_2019_05_4parE_tf_v2_numpy.tar")
out_dir = os.path.join(project_dir, "cosmoflow")
if MPI.COMM_WORLD.rank == 0:
os.makedirs(out_dir, exist_ok=True)

np.random.seed(seed)

print("this is the h5py file we are using:", h5py.__file__)

for data_subset in ['train', 'validation']:
# Write files
hdf5_file_name = os.path.join(out_dir, f"{data_subset}.h5")
files_file_name = os.path.join(out_dir, f"{data_subset}.h5.files")

if using_cscratch and MPI.COMM_WORLD.rank == 0:
# Initialize IME cache.
os.system('ime-ctl --prestage ' + tar_file_name)
# os.system('ime-ctl --prestage ' + hdf5_file_name)
# os.system('ime-ctl --prestage ' + files_file_name)
# Make sure IME cache is initialized before continuing.
MPI.COMM_WORLD.Barrier()

with tarfile.open(tar_file_name, 'r') as tar_f:
start_time = time.perf_counter()
files = [
n
for n in tar_f.getmembers()
if n.name.startswith(
f'cosmoUniverse_2019_05_4parE_tf_v2_numpy/{data_subset}')
]
print(f'{MPI.COMM_WORLD.rank}: reading members took',
time.perf_counter() - start_time, 'seconds')

data_files = list(filter(lambda x: x.name.endswith("data.npy"), files))
data_files.sort(key=lambda x: x.name)
data_files = np.array(data_files)
label_files = list(filter(
lambda x: x.name.endswith("label.npy"), files))
label_files.sort(key=lambda x: x.name)
label_files = np.array(label_files)
if preshuffle:
perm = np.random.permutation(len(label_files))
data_files = data_files[perm]
label_files = label_files[perm]

no_shards = MPI.COMM_WORLD.size
data_files_filtered = data_files[:]
label_files_filtered = label_files[:]
data_files_shards = []
label_files_shards = []
for i in range(no_shards):
shard_size = int(np.ceil(len(data_files_filtered)/no_shards))
start = i * shard_size
end = min((i + 1) * shard_size, len(data_files_filtered))
data_files_shards.append(data_files_filtered[start:end])
label_files_shards.append(label_files_filtered[start:end])

start_entries = np.cumsum([len(x) for x in data_files_shards])
start_entries = ([0] + list(start_entries))[:-1]

first_data = load_numpy(tar_f, data_files[0])
data_shape = first_data.shape
data_dtype = first_data.dtype
first_label = load_numpy(tar_f, label_files[0])
label_shape = first_label.shape
label_dtype = first_label.dtype
all_data_shape = (len(data_files_filtered),) + data_shape
all_label_shape = (len(data_files_filtered),) + label_shape

def write_to_h5_file(
data_files,
label_files,
tfname,
start_entry,
all_data_shape,
all_label_shape,
tell_progress=10,
):
if MPI.COMM_WORLD.rank == 0 and os.path.isfile(tfname):
print("file exists")
exit(1)
os.remove(tfname)
MPI.COMM_WORLD.Barrier()
print("creating file")
with h5py.File(
tfname,
'w',
driver='mpio',
comm=MPI.COMM_WORLD,
libver='latest',
) as fi:
print("creating dset")
dset = fi.create_dataset(
'data', all_data_shape, dtype=data_dtype)
print("creating lset")
lset = fi.create_dataset(
'label', all_label_shape, dtype=label_dtype)

start = time.time()
for i, (f, l) in enumerate(zip(data_files, label_files)):
data = load_numpy(tar_f, f)
label = load_numpy(tar_f, l)
dset[start_entry + i] = data
lset[start_entry + i] = label
now = time.time()
time_remaining = len(data_files) * (now-start) / (i + 1)
if i % tell_progress == 0:
print(i, time_remaining/60, f.name, l.name)
fi.close()

my_data_files = [f for f in data_files_shards[MPI.COMM_WORLD.rank]]
my_label_files = [f for f in label_files_shards[MPI.COMM_WORLD.rank]]
start_entry = start_entries[MPI.COMM_WORLD.rank]

all_data_files = np.concatenate(MPI.COMM_WORLD.allgather(
[m.name for m in my_data_files]))

print(
len(np.unique(all_data_files)),
"unique files, and",
len(all_data_files),
"total files.",
)
if len(np.unique(all_data_files)) != len(all_data_files):
print("There is an error with the file distribution")

if MPI.COMM_WORLD.rank == 0:
with open(files_file_name, "w") as g:
g.write("\n".join(all_data_files) + '\n')

write_start_time = time.perf_counter()
write_to_h5_file(
[f for f in my_data_files],
[f for f in my_label_files],
hdf5_file_name,
start_entry,
all_data_shape,
all_label_shape,
)

print('finished', data_subset, 'on', MPI.COMM_WORLD.rank, 'after',
time.perf_counter() - write_start_time, 'seconds')

MPI.COMM_WORLD.Barrier()
if using_cscratch and MPI.COMM_WORLD.rank == 0:
# Synchronize file system with IME cache.
os.system('ime-ctl --sync ' + hdf5_file_name)
os.system('ime-ctl --sync ' + files_file_name)
MPI.COMM_WORLD.Barrier()
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
#!/usr/bin/env bash

#SBATCH --nodes=4
#SBATCH --tasks-per-node=8
#SBATCH --gres=gpu:0
#SBATCH --time=23:59:59
#SBATCH --partition=booster
#SBATCH --account=hai_mlperf

ml purge
ml Stages/2022 GCC/11.2.0 OpenMPI/4.1.2 Python/3.9.6 HDF5/1.12.1 mpi4py/3.1.3 h5py/3.5.0

srun python convert_cosmoflow_to_hdf5.py
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,9 @@ DATA_SHARD_MULTIPLIER=${DATA_SHARD_MULTIPLIER:-1}

APPLY_LOG_TRANSFORM=${APPLY_LOG_TRANSFORM:-"1"}
APPLY_SHUFFLE=${APPLY_SHUFFLE:-"1"}
# If `USE_H5=1`, setting this to 1 has a large performance impact
# (either only on the staging part or on the whole run depending on
# `APPLY_PRESTAGE`).
APPLY_PRESHUFFLE=${APPLY_PRESHUFFLE:-"1"}
APPLY_PRESTAGE=${APPLY_PRESTAGE:-"1"}
NUM_TRAINING_SAMPLES=${NUM_TRAINING_SAMPLES:-"-1"}
Expand All @@ -40,15 +43,10 @@ INSTANCES=${INSTANCES:-1}
USE_H5=${USE_H5:-"1"}
READ_CHUNK_SIZE=${READ_CHUNK_SIZE:-"32"}

# Our HDF5 data is already pre-shuffled. If `USE_H5=1`, setting this
# to 1 has a large performance impact (either only on the staging part
# or on the whole run depending on `APPLY_PRESTAGE`).
APPLY_PRESHUFFLE=$(if [ "$USE_H5" -ge 1 ]; then echo 0; else echo 1; fi)

# Only apply prestaging when we have enough nodes to be able to
# support the memory requirements.
export APPLY_PRESTAGE=$(
if [ "$(($SLURM_NNODES / $INSTANCES))" -ge 64 ]; then
if [ "$(($SLURM_NNODES / $INSTANCES))" -gt 64 ]; then
echo 1
else
echo 0
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
#!/bin/bash

NODES=128
RESULTS_ROOT="/p/scratch/hai_mlperf/cosmoflow/results/$NODES"
mkdir -p "$RESULTS_ROOT"
export SEED="$(date +%s)"

./start_training_run.sh --system booster --nodes "$NODES" \
--time 01:00:00 \
--config ../cosmoflow/configs/config_DGXA100_64x8x1_reference.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
#!/bin/bash

NODES=256
RESULTS_ROOT="/p/scratch/hai_mlperf/cosmoflow/results/$NODES"
mkdir -p "$RESULTS_ROOT"
export SEED="$(date +%s)"

./start_training_run.sh --system booster --nodes "$NODES" \
--time 01:00:00 \
--config ../cosmoflow/configs/config_DGXA100_128x8x1.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
#!/bin/bash

NODES=64
RESULTS_ROOT="/p/scratch/hai_mlperf/cosmoflow/results/$NODES"
mkdir -p "$RESULTS_ROOT"
export SEED="$(date +%s)"

./start_training_run.sh --system booster --nodes "$NODES" \
--time 01:00:00 \
--config ../cosmoflow/configs/config_DGXA100_32x8x2_handrolled.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
#!/bin/bash

export INSTANCES=10
NODES_PER_INSTANCE=64
RESULTS_ROOT="/p/scratch/hai_mlperf/cosmoflow/results_weak/$INSTANCES/$NODES"
mkdir -p "$RESULTS_ROOT"
export SEED="$(date +%s)"

./start_training_run.sh --system booster \
--nodes "$((INSTANCES * NODES_PER_INSTANCE))" \
--time 02:00:00 \
--config ../cosmoflow/configs/config_DGXA100_32x8x2_handrolled.sh
Original file line number Diff line number Diff line change
Expand Up @@ -2,10 +2,11 @@

export INSTANCES=2
NODES_PER_INSTANCE=64
RESULTS_ROOT="/p/scratch/jb_benchmark/cosmoflow/results_weak/$INSTANCES/$NODES"
RESULTS_ROOT="/p/scratch/hai_mlperf/cosmoflow/results_weak/$INSTANCES/$NODES"
mkdir -p "$RESULTS_ROOT"
export SEED="$(date +%s)"

./start_training_run.sh --system booster \
--nodes "$((INSTANCES * NODES_PER_INSTANCE))" \
--time 02:00:00 \
--config cosmoflow/configs/config_DGXA100_32x8x2_handrolled.sh
--config ../cosmoflow/configs/config_DGXA100_32x8x2_handrolled.sh
Loading