Releases: argonne-lcf/dlio_benchmark
Releases · argonne-lcf/dlio_benchmark
Release v2.0.0
What's Changed
- Add docker image with CPU only dependencies by @johnugeorge in #8
- Add dlio fixes by @johnugeorge in #10
- Fixed issues related to checkpointing and profiling by @zhenghh04 in #13
- Config parameters fixes by @johnugeorge in #11
- Fixing folder number for evaluation by @johnugeorge in #14
- fixed checkpoint issues by @zhenghh04 in #16
- Adding PR unit tests for testing different data format and fixing issues for reading png and jpeg with pytorch data folder. by @zhenghh04 in #17
- A bunch of minor fixes by @zhenghh04 in #18
- Minor fixes by @zhenghh04 in #22
- Add ckpting to UNET3D workload, remove old prefetch param by @lhovon in #23
- Minor modification of configuration options to remove some confusion by @zhenghh04 in #25
- Adding Storage interface for supporting multiple storage backends by @johnugeorge in #20
- Code Fixes by @johnugeorge in #26
- Add the UNET3D sleep time for V100 32GB batch size 4 by @lhovon in #29
- Minor config changes by @johnugeorge in #31
- Make hydra config folder configurable by @johnugeorge in #32
- Mlperf storage v0.5 by @zhenghh04 in #33
- Changes to support segregation of data loader and reader by @hariharan-devarajan in #37
- Added application-level profile support for DLIO by @hariharan-devarajan in #39
- Multithreading issue with TensorFlow and PyTorch dataloader by @hariharan-devarajan in #44
- bug fix to free memory once file is completely read by @hariharan-devarajan in #51
- Pull changes from mlperf_storage_v0.5.1 by @zhenghh04 in #52
- Improved tracing utility added preprocessing support by @zhenghh04 in #53
- Trace improvement. by @hariharan-devarajan in #48
- Moved resize image to config by @zhenghh04 in #55
- instead of using direct methods using enter and exit. by @hariharan-devarajan in #54
- Reorganizing output files by @zhenghh04 in #56
- Generator fixed random seed by @zhenghh04 in #58
- Merging branch mlperf_storage_v0.5.1 by @zhenghh04 in #57
- fixing mistakes in calculating total number of steps by @zhenghh04 in #59
- Mlperf storage v0.5.1 by @zhenghh04 in #60
- Added support for Dali data loader by @hariharan-devarajan in #49
- Changed datatype to be np.uint8 universally in the call by @zhenghh04 in #61
- Adding support for training on a subset of dataset by @zhenghh04 in #63
- DLIO profiler integration by @hariharan-devarajan in #62
- Added Support Power9PC by @hariharan-devarajan in #65
- Update unet3d.yaml to correct the sample size for unet3d by @zhenghh04 in #68
- For X86 and AMD machines, we can create a pip based dlio installations by @hariharan-devarajan in #66
- Added validation to check enough core available for reading by @hariharan-devarajan in #73
- Added custom plugin code for custom data loader and reader. by @hariharan-devarajan in #74
- Changes required within DLIO Benchmark for creating a pip wheel by @hariharan-devarajan in #77
- Update bert.yaml to be consistent with mlperf storage by @zhenghh04 in #79
- Fixing subfolder issues and added subset tests by @zhenghh04 in #82
- Documentation: Instructions to compile and run on Lassen machine. by @OlgaKogiou in #85
- Changes to improve documentation by @hariharan-devarajan in #89
- Fixed dali data loader execution. by @hariharan-devarajan in #91
- Enhancing Dali data loader support by @zhenghh04 in #94
- Fixing Dali Data loader Parallelism and Pipelining. by @hariharan-devarajan in #93
- Update typo which gives issue for pytorch 1.3.1 by @hariharan-devarajan in #103
- Added documentation for the JPEG generator issue by @kaushikvelusamy in #100
- Workloads by @zhenghh04 in #97
- Added Info logging for profiler and removed unnecessary bracket calls. by @hariharan-devarajan in #104
- Fix the data dir path by @hariharan-devarajan in #108
- Making DLIO Profiler default for dlio_benchmark. by @hariharan-devarajan in #111
- Adding dlp logger. by @hariharan-devarajan in #109
- Workloads by @zhenghh04 in #112
- fixed readthedoc build issue by @zhenghh04 in #115
- fix Docker file to use venv. by @hariharan-devarajan in #119
- Switch dlio_profiler to use pypi instead of github by @hariharan-devarajan in #120
- Added force install for profiler for avoiding caching issues by @hariharan-devarajan in #123
- Update README.md by @venkat-1 in #121
- torch checkpoint creation should use storage class methods by @krehm in #126
- Reducing Github actions time by @zhenghh04 in #128
- Create output_folder using os.makedirs() by @krehm in #124
- Adding Native Dali Data Loader support for TFRecord, Images, and NPZ files by @zhenghh04 in #118
- Add support for pytorch spawn and forkserver multiprocessing_context by @krehm in #129
- Reopen dlio.log in non-fork reader_threads child processes by @krehm in #130
- added checkpointing to support LLMs by @hariharan-devarajan in #114
- added dlp for spawned workers pytorch by @hariharan-devarajan in #136
- Fix MPI finalization. by @hariharan-devarajan in #139
- Adding dlio_profiler to requirements.txt by @johnugeorge in #144
- Fix dataloader initialization to only happen once. Not on every epoch. by @hariharan-devarajan in #143
- Fix random sampling pytorch non-determinism. by @hariharan-devarajan in #145
- Fixed printing for DLIO output. by @hariharan-devarajan in #142
- Doc changes to fix DLIO profiler and remove IOStat by @hariharan-devarajan in #146
- Support for custom checkpointing. by @hariharan-devarajan in #137
- Feature/parallel io generator by @hariharan-devarajan in #148
- fix random bugs and printing by @hariharan-devarajan in #147
- Release for v2.0 by @zhenghh04 in #113
- Fix requirements file by @johnugeorge in #150
- fixed sample distribution bugs by @zhenghh04 in #152
- Fix sample shuffling by @hariharan-devarajan in #154
- Optimization to sample distribution by @TheAssembler1 in https://github.com/argonne-lcf/dlio_benchmark/pull...
DLIO v1.1
In this new release, we have the following changes and new enhancements
- Added support for S3 storage
- Updated config files for MLPerf Storage workloads: UNet3D and Bert.
- Changes on configuration options:
- added variability support for sample size, training and validation computation time.
- changes on shuffling, prefetching setting.
- moved batch_size, batch_size_eval to reader session
This release is correspondence to MLPerf storage v0.5 prerelease: https://github.com/mlcommons/storage/releases/tag/v0.5-rc0
DLIO v1.0
DLIO v1.0 Release Notes
We are excited to announce the release of DLIO 1.0! There are many new features and new enhancements compared to previous 0.0.1 version:
- Using YAML file to configure DLIO in Hydra.cc framework; The configuration options are organized in a hierarchical way, including
model
,framework
,workflow
,dataset
,train
,evaluation
,checkpoint
,profiling
. a set of YAML files for some workloads are included. - Data loader support enhancement:
- Added data loader layer above data format to allow user to choose data loader and data format independently.
- Added PyTorch data loader support. We have full PyTorch data loader support for one sample per file dataset
- Enhanced TensorFlow tf.data loader to support for generic file format beyond tfrecord format (currently only support one sample per file case for generic data format)
- New dataset support
- Added support for png and jpeg formats
- Supporting multiple subfolders for training and validation datasets.
- Supporting generating validation dataset
- Profiling and logging
- Added support for iostat profiling
- Added detailed logging info
- Added support for validation.
- Added post processing python script
- Added unit tests and GitHub Actions tests.
- User and developer documentation in
github.io
: https://argonne-lcf.github.io/dlio_benchmark