Skip to content

Commit

Permalink
expand docs
Browse files Browse the repository at this point in the history
  • Loading branch information
zkurtz committed Nov 28, 2024
1 parent 877b89a commit a7f6ba2
Show file tree
Hide file tree
Showing 2 changed files with 22 additions and 15 deletions.
32 changes: 18 additions & 14 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -10,26 +10,30 @@ jobs:
runs-on: ubuntu-latest
strategy:
matrix:
python-version: ['3.10', '3.12']
python-version:
- '3.10'
- '3.13'

steps:
- uses: actions/checkout@v3
- uses: actions/setup-python@v4
with:
python-version: ${{ matrix.python-version }}
- name: Clone repo
uses: actions/checkout@v4

- name: Set the python version for UV
- name: Set the python version
run: echo "UV_PYTHON=${{ matrix.python-version }}" >> $GITHUB_ENV

- name: Set up uv
run: pip install uv
- name: Setup uv
uses: astral-sh/setup-uv@v3
with:
version: "0.5.4"

- name: Linting check
run: uv run ruff check

- name: Check code quality with Ruff
run: |
uv run ruff check
uv run ruff format --check
- name: Formatting check
run: uv run ruff format --check

- name: Check type hints with pyright
- name: Type checking
run: uv run pyright

- name: Run unit tests with pytest
- name: Unit tests
run: uv run pytest
5 changes: 4 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,10 +8,13 @@ Packio allows you to use a single file to store and retrieve multiple python obj

When a class contains multiple of these data types, or even multiple instances of the same data type, saving and loading the data associated with a class tends to become unwieldy, requiring the user to either keep track multiple file paths or to fall back to using pickle, which introduces other problems (see below). The goal of packio is to make it as easy as possible to write `save` and `load` methods for such a class while allowing you to keep using all of your favorite object-type-specific serializers (i.e. `to_parquet` for pandas, `json` for dictionaries, `pathlib.Path.write_text` for strings, etc).

## Why a single file and not a directory?

In a word, *encapsulation*. Copy/move operations with a file are simpler than a directory, especially when it comes to moving data across platforms such as to/from the cloud. A file is also more tamper-resistant - it's typically harder to accidentally modify the contents of a file than it is for someone to add or remove files or subdirectories in a directory.

## Why not pickle?

The most common approach for serialization of such complex python objects is to use `pickle`. There are many reasons do dislike pickle. As summarized by Gemini, "Python's pickle module, while convenient, has drawbacks. It poses security risks due to potential code execution vulnerabilities when handling untrusted data. Compatibility issues arise because it's Python-specific and version-dependent. Maintaining pickle can be challenging due to refactoring difficulties and complex debugging." See also [Ben Frederickson](https://www.benfrederickson.com/dont-pickle-your-data/).
Although `pickle` may be the most common approach for serialization of complex python objects, there are strong reasons do dislike pickle. As summarized by Gemini, "Python's pickle module, while convenient, has drawbacks. It poses security risks due to potential code execution vulnerabilities when handling untrusted data. Compatibility issues arise because it's Python-specific and version-dependent. Maintaining pickle can be challenging due to refactoring difficulties and complex debugging." See also [Ben Frederickson](https://www.benfrederickson.com/dont-pickle-your-data/).

## Example

Expand Down

0 comments on commit a7f6ba2

Please sign in to comment.