Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use json to save metadata dataframe #354

Open
billbrod opened this issue Oct 23, 2024 · 0 comments
Open

Use json to save metadata dataframe #354

billbrod opened this issue Oct 23, 2024 · 0 comments

Comments

@billbrod
Copy link
Collaborator

In #343, we're currently using pickle to save the metadata dataframe. This has the advantage of allowing us to save anything that the user inserted into metadata (including arrays, functions, etc.), but is brittle: pickle sometimes fails to load if the version of e.g., pandas used when saving and loading differs.

Given that I think we don't need the full flexibility of pickle (why would the user insert a function as metadata), we should instead use json to save the dataframe. That would require us to check that whatever metadata value the user sets is json-serializable. It looks like

df = pd.DataFrame({0: 1, 1: lambda x: x+3}, index=[0])
df.to_json()

returns '{"0":{"0":1},"1":{"0":{}}}' (converting the lambda function to an empty set), so we'll need to come up with some other way to check whether the object is really json-serializable. We could check that df == pd.DataFrame(json.loads(df.to_json())), though this might be a bit slow.

Also, to_json forces ascii by default, so if we want to support unicode characters in metadata, will need to change that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant