What's Changed

Add docker image with CPU only dependencies by @johnugeorge in #8
Add dlio fixes by @johnugeorge in #10
Fixed issues related to checkpointing and profiling by @zhenghh04 in #13
Config parameters fixes by @johnugeorge in #11
Fixing folder number for evaluation by @johnugeorge in #14
fixed checkpoint issues by @zhenghh04 in #16
Adding PR unit tests for testing different data format and fixing issues for reading png and jpeg with pytorch data folder. by @zhenghh04 in #17
A bunch of minor fixes by @zhenghh04 in #18
Minor fixes by @zhenghh04 in #22
Add ckpting to UNET3D workload, remove old prefetch param by @lhovon in #23
Minor modification of configuration options to remove some confusion by @zhenghh04 in #25
Adding Storage interface for supporting multiple storage backends by @johnugeorge in #20
Code Fixes by @johnugeorge in #26
Add the UNET3D sleep time for V100 32GB batch size 4 by @lhovon in #29
Minor config changes by @johnugeorge in #31
Make hydra config folder configurable by @johnugeorge in #32
Mlperf storage v0.5 by @zhenghh04 in #33
Changes to support segregation of data loader and reader by @hariharan-devarajan in #37
Added application-level profile support for DLIO by @hariharan-devarajan in #39
Multithreading issue with TensorFlow and PyTorch dataloader by @hariharan-devarajan in #44
bug fix to free memory once file is completely read by @hariharan-devarajan in #51
Pull changes from mlperf_storage_v0.5.1 by @zhenghh04 in #52
Improved tracing utility added preprocessing support by @zhenghh04 in #53
Trace improvement. by @hariharan-devarajan in #48
Moved resize image to config by @zhenghh04 in #55
instead of using direct methods using enter and exit. by @hariharan-devarajan in #54
Reorganizing output files by @zhenghh04 in #56
Generator fixed random seed by @zhenghh04 in #58
Merging branch mlperf_storage_v0.5.1 by @zhenghh04 in #57
fixing mistakes in calculating total number of steps by @zhenghh04 in #59
Mlperf storage v0.5.1 by @zhenghh04 in #60
Added support for Dali data loader by @hariharan-devarajan in #49
Changed datatype to be np.uint8 universally in the call by @zhenghh04 in #61
Adding support for training on a subset of dataset by @zhenghh04 in #63
DLIO profiler integration by @hariharan-devarajan in #62
Added Support Power9PC by @hariharan-devarajan in #65
Update unet3d.yaml to correct the sample size for unet3d by @zhenghh04 in #68
For X86 and AMD machines, we can create a pip based dlio installations by @hariharan-devarajan in #66
Added validation to check enough core available for reading by @hariharan-devarajan in #73
Added custom plugin code for custom data loader and reader. by @hariharan-devarajan in #74
Changes required within DLIO Benchmark for creating a pip wheel by @hariharan-devarajan in #77
Update bert.yaml to be consistent with mlperf storage by @zhenghh04 in #79
Fixing subfolder issues and added subset tests by @zhenghh04 in #82
Documentation: Instructions to compile and run on Lassen machine. by @OlgaKogiou in #85
Changes to improve documentation by @hariharan-devarajan in #89
Fixed dali data loader execution. by @hariharan-devarajan in #91
Enhancing Dali data loader support by @zhenghh04 in #94
Fixing Dali Data loader Parallelism and Pipelining. by @hariharan-devarajan in #93
Update typo which gives issue for pytorch 1.3.1 by @hariharan-devarajan in #103
Added documentation for the JPEG generator issue by @kaushikvelusamy in #100
Workloads by @zhenghh04 in #97
Added Info logging for profiler and removed unnecessary bracket calls. by @hariharan-devarajan in #104
Fix the data dir path by @hariharan-devarajan in #108
Making DLIO Profiler default for dlio_benchmark. by @hariharan-devarajan in #111
Adding dlp logger. by @hariharan-devarajan in #109
Workloads by @zhenghh04 in #112
fixed readthedoc build issue by @zhenghh04 in #115
fix Docker file to use venv. by @hariharan-devarajan in #119
Switch dlio_profiler to use pypi instead of github by @hariharan-devarajan in #120
Added force install for profiler for avoiding caching issues by @hariharan-devarajan in #123
Update README.md by @venkat-1 in #121
torch checkpoint creation should use storage class methods by @krehm in #126
Reducing Github actions time by @zhenghh04 in #128
Create output_folder using os.makedirs() by @krehm in #124
Adding Native Dali Data Loader support for TFRecord, Images, and NPZ files by @zhenghh04 in #118
Add support for pytorch spawn and forkserver multiprocessing_context by @krehm in #129
Reopen dlio.log in non-fork reader_threads child processes by @krehm in #130
added checkpointing to support LLMs by @hariharan-devarajan in #114
added dlp for spawned workers pytorch by @hariharan-devarajan in #136
Fix MPI finalization. by @hariharan-devarajan in #139
Adding dlio_profiler to requirements.txt by @johnugeorge in #144
Fix dataloader initialization to only happen once. Not on every epoch. by @hariharan-devarajan in #143
Fix random sampling pytorch non-determinism. by @hariharan-devarajan in #145
Fixed printing for DLIO output. by @hariharan-devarajan in #142
Doc changes to fix DLIO profiler and remove IOStat by @hariharan-devarajan in #146
Support for custom checkpointing. by @hariharan-devarajan in #137
Feature/parallel io generator by @hariharan-devarajan in #148
fix random bugs and printing by @hariharan-devarajan in #147
Release for v2.0 by @zhenghh04 in #113
Fix requirements file by @johnugeorge in #150
fixed sample distribution bugs by @zhenghh04 in #152
Fix sample shuffling by @hariharan-devarajan in #154
Optimization to sample distribution by @TheAssembler1 in https://github.com/argonne-lcf/dlio_benchmark/pull...

DLIO v1.0 Release Notes

We are excited to announce the release of DLIO 1.0! There are many new features and new enhancements compared to previous 0.0.1 version:

Using YAML file to configure DLIO in Hydra.cc framework; The configuration options are organized in a hierarchical way, including model, framework, workflow, dataset, train, evaluation, checkpoint, profiling. a set of YAML files for some workloads are included.
Data loader support enhancement:
- Added data loader layer above data format to allow user to choose data loader and data format independently.
- Added PyTorch data loader support. We have full PyTorch data loader support for one sample per file dataset
- Enhanced TensorFlow tf.data loader to support for generic file format beyond tfrecord format (currently only support one sample per file case for generic data format)
New dataset support
- Added support for png and jpeg formats
- Supporting multiple subfolders for training and validation datasets.
- Supporting generating validation dataset
Profiling and logging
- Added support for iostat profiling
- Added detailed logging info
Added support for validation.
Added post processing python script
Added unit tests and GitHub Actions tests.
User and developer documentation in github.io: https://argonne-lcf.github.io/dlio_benchmark

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What's Changed

Contributors

DLIO v1.0 Release Notes

Releases: argonne-lcf/dlio_benchmark

Release v2.0.0

What's Changed

Contributors

DLIO v1.1

DLIO v1.0

DLIO v1.0 Release Notes