Skip to content

Commit

Permalink
Merge pull request #3 from zkurtz/uv-migration
Browse files Browse the repository at this point in the history
migrate to uv
  • Loading branch information
zkurtz authored Nov 17, 2024
2 parents 8220f67 + f131ec9 commit 24ec8cd
Show file tree
Hide file tree
Showing 11 changed files with 772 additions and 109 deletions.
29 changes: 8 additions & 21 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -9,32 +9,19 @@ on:
jobs:
build:
name: continuous-integration

runs-on: ubuntu-latest

steps:
- uses: actions/checkout@v3

- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: '3.12.2'

- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install poetry
poetry config virtualenvs.create false
poetry install --no-interaction --no-ansi
- name: Run unit tests with pytest
run: pytest

- name: Check code formatting with Black
run: black --check .
- uses: actions/setup-python@v4
- name: Set up uv
run: pip install uv

- name: Check code quality with Ruff
run: ruff check .
run: uv run ruff check .

- name: Check type hints with pyright
run: pyright
run: uv run pyright

- name: Run unit tests with pytest
run: uv run pytest
29 changes: 0 additions & 29 deletions .github/workflows/publish.yml

This file was deleted.

28 changes: 15 additions & 13 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -6,22 +6,24 @@ repos:
- id: end-of-file-fixer
- id: check-yaml
- id: check-added-large-files

- repo: https://github.com/charliermarsh/ruff-pre-commit
rev: v0.6.3
hooks:
- id: ruff
args: [ "--fix" ]

- repo: https://github.com/psf/black
rev: 24.8.0
hooks:
- id: black

- repo: local
hooks:
- id: pyright
name: pyright
name: type checking (pyright)
entry: pyright
language: system
types: [python]
- id: ruff-format
name: formatting (ruff)
entry: ruff
language: system
types: [python]
args: ['format']
- id: ruff-lint
name: linting (ruff)
entry: ruff
language: system
types: [python]
args: ['check', '--fix', '--force-exclude']

exclude: ^(docs/)
1 change: 1 addition & 0 deletions .python-version
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
3.12.2
25 changes: 6 additions & 19 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -75,25 +75,12 @@ assert loaded.lookup == data.lookup

## Development

Install poetry:
```
curl -sSL https://install.python-poetry.org | python3 -
```

Install [pyenv and its virtualenv plugin](https://github.com/pyenv/pyenv-virtualenv). Then:
```
pyenv install 3.12.2
pyenv global 3.12.2
pyenv virtualenv 3.12.2 packio
pyenv activate packio
```

Install this package and its dependencies in your virtual env:
```
poetry install --with dev
```

Set up git hooks:
Create and activate a virtual env for dev ops:
```
git clone [email protected]:zkurtz/listwrap.git
cd packio
pip install uv
uv sync
source .venv/bin/activate
pre-commit install
```
96 changes: 96 additions & 0 deletions packio.egg-info/PKG-INFO
Original file line number Diff line number Diff line change
@@ -0,0 +1,96 @@
Metadata-Version: 2.1
Name: packio
Version: 0.0.4
Summary: IO for multiple python objects to/from a single file
Author-email: Zach Kurtz <[email protected]>
Project-URL: Source, https://github.com/zkurtz/packio
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE

# packio

Packio allows you to use a single file to store and retrieve multiple python objects. A typical use case is to define IO methods on an instance of a class that contains multiple types of objects, such as a
- dictionary
- data frame
- string
- trained ML model (for example, lightgbm and xgboost each have built-in serialization methods for trained models)

When a class contains multiple of these data types, or even multiple instances of the same data type, saving and loading the data associated with a class tends to become unwieldy, requiring the user to either keep track multiple file paths or to fall back to using pickle, which introduces other problems (see below). The goal of packio is to make it as easy as possible to write `save` and `load` methods for such a class while allowing you to keep using all of your favorite object-type-specific serializers (i.e. `to_parquet` for pandas, `json` for dictionaries, `pathlib.Path.write_text` for strings, etc).


## Why not pickle?

The most common approach for serialization of such complex python objects is to use `pickle`. There are many reasons do dislike pickle. As summarized by Gemini, "Python's pickle module, while convenient, has drawbacks. It poses security risks due to potential code execution vulnerabilities when handling untrusted data. Compatibility issues arise because it's Python-specific and version-dependent. Maintaining pickle can be challenging due to refactoring difficulties and complex debugging." See also [Ben Frederickson](https://www.benfrederickson.com/dont-pickle-your-data/).

## Example

Here is a toy example of a data class with `save` and `from_file` methods powered by `packio`:

```
from dataclasses import dataclass
import json
from pathlib import Path
import pandas as pd
from packio import Reader, Writer


@dataclass
class MyData:
"""A simple data class for testing.

Attributes:
documentation: Description of what this class is all about.
df: A data frame.
lookup: A dictionary.
"""

documentation: str
df: pd.Dataframe
lookup: dict[str, int]

def save(self, path: Path) -> None:
"""Save the data class to disk."""
with Writer(path) as writer:
writer.file("documentation.txt").write_text(self.documentation)
df.to_parquet(writer.file("df.parquet"))
with writer.file("lookup.json").open("w") as f:
json.dump(self.lookup, f)

@classmethod
def from_file(cls, path: Path) -> "MyData":
"""Load the data class from disk."""
with Reader(path) as reader:
documentation = reader.file("documentation.txt").read_text()
df = pd.read_parquet(reader.file("df.parquet"))
with reader.file("lookup.json").open() as f:
lookup = json.load(f)
return cls(documentation=documentation, df=df, lookup=lookup)


# Create an instance of the class, save it, and re-load it as a new instance:
data = MyData(
documentation="This is an example.",
df=pd.DataFrame({"a": [1, 2], "b": [3, 4]}),
lookup={"a": 1, "b": 2},
)
data.save(tmp_path / "data.mydata")
loaded = MyData.from_file(tmp_path / "data.mydata")

# Check that the new class instance matches the old one, at least in terms of it's data attributes:
assert loaded.documentation == data.documentation
pd.testing.assert_frame_equal(loaded.df, data.df)
assert loaded.lookup == data.lookup
```

## Development

Create and activate a virtual env for dev ops:
```
git clone [email protected]:zkurtz/listwrap.git
cd packio
pip install uv
uv sync
source .venv/bin/activate
pre-commit install
```
10 changes: 10 additions & 0 deletions packio.egg-info/SOURCES.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
LICENSE
README.md
pyproject.toml
packio/__init__.py
packio/io.py
packio.egg-info/PKG-INFO
packio.egg-info/SOURCES.txt
packio.egg-info/dependency_links.txt
packio.egg-info/top_level.txt
tests/test_packio.py
Empty file.
1 change: 1 addition & 0 deletions packio.egg-info/top_level.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
packio
44 changes: 17 additions & 27 deletions pyproject.toml
Original file line number Diff line number Diff line change
@@ -1,42 +1,32 @@
[tool.poetry]
[project]
name = "packio"
version = "0.0.2"
version = "0.0.4"
description = "IO for multiple python objects to/from a single file"
authors = ["Zach Kurtz <[email protected]>"]
authors = [{ name = "Zach Kurtz", email = "[email protected]" }]
readme = "README.md"
homepage = "https://github.com/zkurtz/packio"
requires-python = ">=3.10"

[dependency-groups]
dev = [
"pre-commit >=3.8.0",
"pyright >=1.1.378",
"ruff >=0.6.3",
"pytest >=8.3.2",
"sphinx>=8.1.3",
"sphinx-rtd-theme>=3.0.2",
]

[project.urls]
Source = "https://github.com/zkurtz/packio"

[build-system]
requires = ["poetry-core>=1.0.0"]
build-backend = "poetry.core.masonry.api"

[tool.poetry.dependencies]
python = "^3.10"

[tool.poetry.group.dev.dependencies]
ruff = "^0.6.3"
pyright = "^1.1.378"
pytest = "^8.3.2"
pre-commit = "^3.8.0"
black = "^24.8.0"
build = "^1.2.1"
twine = "^5.1.1"
[tool.uv]
package = true

[tool.ruff]
line-length = 120

[tool.ruff.lint]
select = ["E", "F", "I"]
ignore = []

[tool.pyright]
include = ["packio"]
include = ["packio", "tests"]

[tool.pytest.ini_options]
testpaths = ["tests"]

[tool.black]
line-length = 120
Loading

0 comments on commit 24ec8cd

Please sign in to comment.