You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I Really love this package!! I'm tring to build a way to save and load complex dataclasses from to files and this is a great find!
I wonder if I'm missing a way to make the nupy saving and loading faster (compared to np.save/load). here is what I came up with:
importtimefromdataclassesimportdataclassimportnumpyasnpfrommashumaro.mixins.msgpackimportDataClassMessagePackMixinfrommashumaro.typesimportSerializationStrategy# Generate random 1000x1000 arraybig_array=np.random.randint(0, 100, size=(100000, 1000))
# ======= Numpy =======# Time direct numpy savestart_time=time.time()
np.save("big_array.npy", big_array)
np_save_time=time.time() -start_time# Time direct numpy loadstart_time=time.time()
loaded_np_array=np.load("big_array.npy")
np_load_time=time.time() -start_time# ======= Mashumaro =======classNumpySerializationStrategy(SerializationStrategy, use_annotations=True):
defserialize(self, value: np.ndarray) ->list[bytes, tuple, str]:
return [value.data, value.shape, str(value.dtype)]
defdeserialize(self, value: list[bytes, tuple, str]) ->np.ndarray:
returnnp.frombuffer(value[0], dtype=np.dtype(value[2])).reshape(value[1])
# Create a class with Mashumaro mixin for comparison@dataclassclassMashumaroArrayParams(DataClassMessagePackMixin):
array: np.ndarrayclassConfig:
serialization_strategy= {np.ndarray: NumpySerializationStrategy()}
# Time Mashumaro msgpack savemashumaro_params=MashumaroArrayParams(array=big_array)
start_time=time.time()
withopen("mashumaro_params.msgpack", "wb") asf:
f.write(mashumaro_params.to_msgpack())
mashumaro_save_time=time.time() -start_time# Time Mashumaro msgpack loadstart_time=time.time()
withopen("mashumaro_params.msgpack", "rb") asf:
loaded_mashumaro_params=MashumaroArrayParams.from_msgpack(f.read())
mashumaro_load_time=time.time() -start_time# Time direct binary savestart_time=time.time()
withopen("array.bin", "wb") asf:
f.write(big_array.data) # Write raw memory bufferbinary_save_time=time.time() -start_time# ======= Binary =======# Time direct binary loadstart_time=time.time()
withopen("array.bin", "rb") asf:
binary_data=f.read()
# Create array from binary data with same shape and dtypeloaded_binary_array=np.frombuffer(binary_data, dtype=big_array.dtype).reshape(big_array.shape)
binary_load_time=time.time() -start_time# Print resultsprint("\nTiming Results (seconds):")
print(f"{'Operation':<25}{'Time':>10}")
print("-"*35)
print(f"{'Numpy direct save':<25}{np_save_time:>10.4f}")
print(f"{'Numpy direct load':<25}{np_load_time:>10.4f}")
print(f"{'Mashumaro msgpack save':<25}{mashumaro_save_time:>10.4f}")
print(f"{'Mashumaro msgpack load':<25}{mashumaro_load_time:>10.4f}")
print(f"{'Binary direct save':<25}{binary_save_time:>10.4f}")
print(f"{'Binary direct load':<25}{binary_load_time:>10.4f}")
# Verify arrays are equalassertnp.array_equal(big_array, loaded_np_array)
assertnp.array_equal(big_array, loaded_binary_array)
assertnp.array_equal(big_array, loaded_mashumaro_params.array)
Timing Results (seconds):
Operation Time
-----------------------------------
Numpy direct save 0.2591
Numpy direct load 0.1184
Mashumaro msgpack save 0.3928
Mashumaro msgpack load 0.1920
Binary direct save 0.2616
Binary direct load 0.1183
The text was updated successfully, but these errors were encountered:
YoniChechik
changed the title
question aboud nump saveing and loading
question aboud numpy saveing and loading
Feb 15, 2025
I Really love this package!! I'm tring to build a way to save and load complex dataclasses from to files and this is a great find!
I wonder if I'm missing a way to make the nupy saving and loading faster (compared to np.save/load). here is what I came up with:
The text was updated successfully, but these errors were encountered: