Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

question aboud numpy saveing and loading #282

Open
YoniChechik opened this issue Feb 14, 2025 · 0 comments
Open

question aboud numpy saveing and loading #282

YoniChechik opened this issue Feb 14, 2025 · 0 comments

Comments

@YoniChechik
Copy link

I Really love this package!! I'm tring to build a way to save and load complex dataclasses from to files and this is a great find!

I wonder if I'm missing a way to make the nupy saving and loading faster (compared to np.save/load). here is what I came up with:

import time
from dataclasses import dataclass

import numpy as np
from mashumaro.mixins.msgpack import DataClassMessagePackMixin
from mashumaro.types import SerializationStrategy

# Generate random 1000x1000 array
big_array = np.random.randint(0, 100, size=(100000, 1000))

# ======= Numpy =======
# Time direct numpy save
start_time = time.time()
np.save("big_array.npy", big_array)
np_save_time = time.time() - start_time

# Time direct numpy load
start_time = time.time()
loaded_np_array = np.load("big_array.npy")
np_load_time = time.time() - start_time

# ======= Mashumaro =======


class NumpySerializationStrategy(SerializationStrategy, use_annotations=True):
    def serialize(self, value: np.ndarray) -> list[bytes, tuple, str]:
        return [value.data, value.shape, str(value.dtype)]

    def deserialize(self, value: list[bytes, tuple, str]) -> np.ndarray:
        return np.frombuffer(value[0], dtype=np.dtype(value[2])).reshape(value[1])


# Create a class with Mashumaro mixin for comparison
@dataclass
class MashumaroArrayParams(DataClassMessagePackMixin):
    array: np.ndarray

    class Config:
        serialization_strategy = {np.ndarray: NumpySerializationStrategy()}


# Time Mashumaro msgpack save
mashumaro_params = MashumaroArrayParams(array=big_array)
start_time = time.time()
with open("mashumaro_params.msgpack", "wb") as f:
    f.write(mashumaro_params.to_msgpack())
mashumaro_save_time = time.time() - start_time

# Time Mashumaro msgpack load
start_time = time.time()
with open("mashumaro_params.msgpack", "rb") as f:
    loaded_mashumaro_params = MashumaroArrayParams.from_msgpack(f.read())
mashumaro_load_time = time.time() - start_time
# Time direct binary save
start_time = time.time()
with open("array.bin", "wb") as f:
    f.write(big_array.data)  # Write raw memory buffer
binary_save_time = time.time() - start_time

# ======= Binary =======

# Time direct binary load
start_time = time.time()
with open("array.bin", "rb") as f:
    binary_data = f.read()
    # Create array from binary data with same shape and dtype
    loaded_binary_array = np.frombuffer(binary_data, dtype=big_array.dtype).reshape(big_array.shape)
binary_load_time = time.time() - start_time


# Print results
print("\nTiming Results (seconds):")
print(f"{'Operation':<25} {'Time':>10}")
print("-" * 35)
print(f"{'Numpy direct save':<25} {np_save_time:>10.4f}")
print(f"{'Numpy direct load':<25} {np_load_time:>10.4f}")

print(f"{'Mashumaro msgpack save':<25} {mashumaro_save_time:>10.4f}")
print(f"{'Mashumaro msgpack load':<25} {mashumaro_load_time:>10.4f}")
print(f"{'Binary direct save':<25} {binary_save_time:>10.4f}")
print(f"{'Binary direct load':<25} {binary_load_time:>10.4f}")

# Verify arrays are equal
assert np.array_equal(big_array, loaded_np_array)
assert np.array_equal(big_array, loaded_binary_array)
assert np.array_equal(big_array, loaded_mashumaro_params.array)
Timing Results (seconds):
Operation                       Time
-----------------------------------
Numpy direct save             0.2591
Numpy direct load             0.1184
Mashumaro msgpack save        0.3928
Mashumaro msgpack load        0.1920
Binary direct save            0.2616
Binary direct load            0.1183
@YoniChechik YoniChechik changed the title question aboud nump saveing and loading question aboud numpy saveing and loading Feb 15, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant