From 01283ab2b2e04af6ad5b8f540c066b5959de551a Mon Sep 17 00:00:00 2001 From: Huihuo Zheng Date: Tue, 11 Jun 2024 12:08:56 -0500 Subject: [PATCH] Mlperf storage v1.0 (#206) * Bring v1.0 to the most recent commit (#202) * Request changes from MLPerf Storage (#199) * added au metric to the configuration file; set shuffling and shuffle buffer size to be 2 for cosmoflow * removed dependencies on dlioprofiler * fixed bugs * Fixed potential insufficient samples due to num_files is not divisible by comm.size (#200) * added au metric to the configuration file; set shuffling and shuffle buffer size to be 2 for cosmoflow * removed dependencies on dlioprofiler * fixed bugs * recovered back dlio_profiler * fixed potential not enough samples * Update tf_reader.py * Mlperf requests (#201) * added au metric to the configuration file; set shuffling and shuffle buffer size to be 2 for cosmoflow * removed dependencies on dlioprofiler * fixed bugs * fixed issue with dlio_profiler * bring back dlio_profiler_py * sync up (#205) * Request changes from MLPerf Storage (#199) * added au metric to the configuration file; set shuffling and shuffle buffer size to be 2 for cosmoflow * removed dependencies on dlioprofiler * fixed bugs * Fixed potential insufficient samples due to num_files is not divisible by comm.size (#200) * added au metric to the configuration file; set shuffling and shuffle buffer size to be 2 for cosmoflow * removed dependencies on dlioprofiler * fixed bugs * recovered back dlio_profiler * fixed potential not enough samples * Update tf_reader.py * Mlperf requests (#201) * added au metric to the configuration file; set shuffling and shuffle buffer size to be 2 for cosmoflow * removed dependencies on dlioprofiler * fixed bugs * fixed issue with dlio_profiler * bring back dlio_profiler_py * Bring v1.0 to the most recent commit (#202) (#203) * Request changes from MLPerf Storage (#199) * added au metric to the configuration file; set shuffling and shuffle buffer size to be 2 for cosmoflow * removed dependencies on dlioprofiler * fixed bugs * Fixed potential insufficient samples due to num_files is not divisible by comm.size (#200) * added au metric to the configuration file; set shuffling and shuffle buffer size to be 2 for cosmoflow * removed dependencies on dlioprofiler * fixed bugs * recovered back dlio_profiler * fixed potential not enough samples * Update tf_reader.py * Mlperf requests (#201) * added au metric to the configuration file; set shuffling and shuffle buffer size to be 2 for cosmoflow * removed dependencies on dlioprofiler * fixed bugs * fixed issue with dlio_profiler * bring back dlio_profiler_py * Fix requirements file (#204) Signed-off-by: Johnu George --------- Signed-off-by: Johnu George Co-authored-by: Johnu George * barrier in the beginning --------- Signed-off-by: Johnu George Co-authored-by: Johnu George --- dlio_benchmark/main.py | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/dlio_benchmark/main.py b/dlio_benchmark/main.py index 8669b41c..0d56bb20 100644 --- a/dlio_benchmark/main.py +++ b/dlio_benchmark/main.py @@ -16,7 +16,7 @@ """ import os import math -import hydra + import logging from time import time, sleep import json @@ -49,7 +49,10 @@ from dlio_benchmark.utils.utility import Profile, PerfTrace dlp = Profile(MODULE_DLIO_BENCHMARK) - +from mpi4py import MPI +# To make sure the output folder is the same in all the nodes. We have to do this. +MPI.COMM_WORLD.Barrier() +import hydra class DLIOBenchmark(object): """