-
Notifications
You must be signed in to change notification settings - Fork 8
Load Data
The basic utilities that this tutorial references tries to take an hdf5 file and convert it into a python dictionary with values converted into numpy structures for easy plotting, use, etc.
Warning: the functions if not called properly will try to load the entire hdf5 file into memory. If you are dealing with an extremely large file, this could take up your entire computer's RAM.
The main way to use this utility would be through importing the load_data
function. The command you will want to do is:
from data_logger_bag.load_h5_dataset import load_data
You can also call python load_h5_dataset.py <Path to h5 file>
to see an example of the data being loaded. Warning - this will load the entire file. Line 211 has an import pdb; pdb.set_trace()
where it stops after the data is loaded in the variable data
. You can then peruse the data
variable, which should be a python dictionary. See external tutorials about python dictionaries if you are unfamiliar with the data structure.
To load data the file that needs to be either imported or called is load_h5_dataset.py
. The file has a few specific functions, but the one that is called to load the data is load_data
.
-
load_data
takes several arguments:def load_data(input_filename, output_filename, save_to_file, directories=None, max_level=None)
-
input_filename
is the hdf5 file and it expects the full path to the filename and that the filename ends in.h5
-
output_filename
is an optional value that will write to a.pkl
file the data that has been loaded in python dictionary format. Unfortunately, this value needs to be set to something for the function to run, even if not saving. So giving an empty string is perfectly fine -
save_to_file
: expects a boolean flag that to write the output pkl file. True will write, False will not
There are optional inputs that can be used to speed up the loading process on the h5 files.
-
load_directories
: This takes an array of strings where the strings are the directories that the loader should be searching for. For example directories=["defaultTask", "defaultSkill"] would search and load only the data hierarchies in those folders. This reduces the memory storage -
max_level
: This is a number where the maximum level of directories are loaded. This is to reduce time spent looking for directories specified inload_directories
. If you know thatdefaultTask
anddefaultSkill
are located in the top two levels, thenmax_level
can be set to 2. (TODO: verify the exact number for this... if it is x or x+1)