-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
11 changed files
with
772 additions
and
109 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file was deleted.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
3.12.2 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -75,25 +75,12 @@ assert loaded.lookup == data.lookup | |
|
||
## Development | ||
|
||
Install poetry: | ||
``` | ||
curl -sSL https://install.python-poetry.org | python3 - | ||
``` | ||
|
||
Install [pyenv and its virtualenv plugin](https://github.com/pyenv/pyenv-virtualenv). Then: | ||
``` | ||
pyenv install 3.12.2 | ||
pyenv global 3.12.2 | ||
pyenv virtualenv 3.12.2 packio | ||
pyenv activate packio | ||
``` | ||
|
||
Install this package and its dependencies in your virtual env: | ||
``` | ||
poetry install --with dev | ||
``` | ||
|
||
Set up git hooks: | ||
Create and activate a virtual env for dev ops: | ||
``` | ||
git clone [email protected]:zkurtz/listwrap.git | ||
cd packio | ||
pip install uv | ||
uv sync | ||
source .venv/bin/activate | ||
pre-commit install | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,96 @@ | ||
Metadata-Version: 2.1 | ||
Name: packio | ||
Version: 0.0.4 | ||
Summary: IO for multiple python objects to/from a single file | ||
Author-email: Zach Kurtz <[email protected]> | ||
Project-URL: Source, https://github.com/zkurtz/packio | ||
Requires-Python: >=3.10 | ||
Description-Content-Type: text/markdown | ||
License-File: LICENSE | ||
|
||
# packio | ||
|
||
Packio allows you to use a single file to store and retrieve multiple python objects. A typical use case is to define IO methods on an instance of a class that contains multiple types of objects, such as a | ||
- dictionary | ||
- data frame | ||
- string | ||
- trained ML model (for example, lightgbm and xgboost each have built-in serialization methods for trained models) | ||
|
||
When a class contains multiple of these data types, or even multiple instances of the same data type, saving and loading the data associated with a class tends to become unwieldy, requiring the user to either keep track multiple file paths or to fall back to using pickle, which introduces other problems (see below). The goal of packio is to make it as easy as possible to write `save` and `load` methods for such a class while allowing you to keep using all of your favorite object-type-specific serializers (i.e. `to_parquet` for pandas, `json` for dictionaries, `pathlib.Path.write_text` for strings, etc). | ||
|
||
|
||
## Why not pickle? | ||
|
||
The most common approach for serialization of such complex python objects is to use `pickle`. There are many reasons do dislike pickle. As summarized by Gemini, "Python's pickle module, while convenient, has drawbacks. It poses security risks due to potential code execution vulnerabilities when handling untrusted data. Compatibility issues arise because it's Python-specific and version-dependent. Maintaining pickle can be challenging due to refactoring difficulties and complex debugging." See also [Ben Frederickson](https://www.benfrederickson.com/dont-pickle-your-data/). | ||
|
||
## Example | ||
|
||
Here is a toy example of a data class with `save` and `from_file` methods powered by `packio`: | ||
|
||
``` | ||
from dataclasses import dataclass | ||
import json | ||
from pathlib import Path | ||
import pandas as pd | ||
from packio import Reader, Writer | ||
|
||
|
||
@dataclass | ||
class MyData: | ||
"""A simple data class for testing. | ||
|
||
Attributes: | ||
documentation: Description of what this class is all about. | ||
df: A data frame. | ||
lookup: A dictionary. | ||
""" | ||
|
||
documentation: str | ||
df: pd.Dataframe | ||
lookup: dict[str, int] | ||
|
||
def save(self, path: Path) -> None: | ||
"""Save the data class to disk.""" | ||
with Writer(path) as writer: | ||
writer.file("documentation.txt").write_text(self.documentation) | ||
df.to_parquet(writer.file("df.parquet")) | ||
with writer.file("lookup.json").open("w") as f: | ||
json.dump(self.lookup, f) | ||
|
||
@classmethod | ||
def from_file(cls, path: Path) -> "MyData": | ||
"""Load the data class from disk.""" | ||
with Reader(path) as reader: | ||
documentation = reader.file("documentation.txt").read_text() | ||
df = pd.read_parquet(reader.file("df.parquet")) | ||
with reader.file("lookup.json").open() as f: | ||
lookup = json.load(f) | ||
return cls(documentation=documentation, df=df, lookup=lookup) | ||
|
||
|
||
# Create an instance of the class, save it, and re-load it as a new instance: | ||
data = MyData( | ||
documentation="This is an example.", | ||
df=pd.DataFrame({"a": [1, 2], "b": [3, 4]}), | ||
lookup={"a": 1, "b": 2}, | ||
) | ||
data.save(tmp_path / "data.mydata") | ||
loaded = MyData.from_file(tmp_path / "data.mydata") | ||
|
||
# Check that the new class instance matches the old one, at least in terms of it's data attributes: | ||
assert loaded.documentation == data.documentation | ||
pd.testing.assert_frame_equal(loaded.df, data.df) | ||
assert loaded.lookup == data.lookup | ||
``` | ||
|
||
## Development | ||
|
||
Create and activate a virtual env for dev ops: | ||
``` | ||
git clone [email protected]:zkurtz/listwrap.git | ||
cd packio | ||
pip install uv | ||
uv sync | ||
source .venv/bin/activate | ||
pre-commit install | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,10 @@ | ||
LICENSE | ||
README.md | ||
pyproject.toml | ||
packio/__init__.py | ||
packio/io.py | ||
packio.egg-info/PKG-INFO | ||
packio.egg-info/SOURCES.txt | ||
packio.egg-info/dependency_links.txt | ||
packio.egg-info/top_level.txt | ||
tests/test_packio.py |
Empty file.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
packio |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,42 +1,32 @@ | ||
[tool.poetry] | ||
[project] | ||
name = "packio" | ||
version = "0.0.2" | ||
version = "0.0.4" | ||
description = "IO for multiple python objects to/from a single file" | ||
authors = ["Zach Kurtz <[email protected]>"] | ||
authors = [{ name = "Zach Kurtz", email = "[email protected]" }] | ||
readme = "README.md" | ||
homepage = "https://github.com/zkurtz/packio" | ||
requires-python = ">=3.10" | ||
|
||
[dependency-groups] | ||
dev = [ | ||
"pre-commit >=3.8.0", | ||
"pyright >=1.1.378", | ||
"ruff >=0.6.3", | ||
"pytest >=8.3.2", | ||
"sphinx>=8.1.3", | ||
"sphinx-rtd-theme>=3.0.2", | ||
] | ||
|
||
[project.urls] | ||
Source = "https://github.com/zkurtz/packio" | ||
|
||
[build-system] | ||
requires = ["poetry-core>=1.0.0"] | ||
build-backend = "poetry.core.masonry.api" | ||
|
||
[tool.poetry.dependencies] | ||
python = "^3.10" | ||
|
||
[tool.poetry.group.dev.dependencies] | ||
ruff = "^0.6.3" | ||
pyright = "^1.1.378" | ||
pytest = "^8.3.2" | ||
pre-commit = "^3.8.0" | ||
black = "^24.8.0" | ||
build = "^1.2.1" | ||
twine = "^5.1.1" | ||
[tool.uv] | ||
package = true | ||
|
||
[tool.ruff] | ||
line-length = 120 | ||
|
||
[tool.ruff.lint] | ||
select = ["E", "F", "I"] | ||
ignore = [] | ||
|
||
[tool.pyright] | ||
include = ["packio"] | ||
include = ["packio", "tests"] | ||
|
||
[tool.pytest.ini_options] | ||
testpaths = ["tests"] | ||
|
||
[tool.black] | ||
line-length = 120 |
Oops, something went wrong.