Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Changing the package name and updating files #5

Merged
merged 15 commits into from
Jan 21, 2025
39 changes: 0 additions & 39 deletions .github/workflows/lint.yml

This file was deleted.

30 changes: 15 additions & 15 deletions .github/workflows/matchers/mypy.json
Original file line number Diff line number Diff line change
@@ -1,18 +1,18 @@
{
"problemMatcher": [
"problemMatcher": [
{
"owner": "mypy",
"severity": "error",
"pattern": [
{
"owner": "mypy",
"severity": "error",
"pattern": [
{
"regexp": "^(\\S*):(\\d+):(\\d+): ([a-z]+): (.*)$",
"file": 1,
"line": 2,
"column": 3,
"severity": 4,
"message": 5
}
]
"regexp": "^(\\S*):(\\d+):(\\d+): ([a-z]+): (.*)$",
"file": 1,
"line": 2,
"column": 3,
"severity": 4,
"message": 5
}
]
}
]
}
]
}
70 changes: 70 additions & 0 deletions .github/workflows/mypy.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,70 @@
name: mypy

on:
- pull_request

jobs:
mypy:
defaults:
run:
shell: bash
strategy:
fail-fast: true
matrix:
os: ["ubuntu-latest", "macos-latest"]
python-version: ["3.10", "3.11"]
runs-on: ${{ matrix.os }}
steps:
#----------------------------------------------
# check-out repo and set-up python
#----------------------------------------------
- name: Check out repository
uses: actions/checkout@v4
- name: Set up python ${{ matrix.python-version }}
id: setup-python
uses: actions/setup-python@v5
with:
python-version: ${{ matrix.python-version }}

#----------------------------------------------
# ----- install poetry -----
#----------------------------------------------
- name: Install Poetry
uses: snok/install-poetry@v1
with:
virtualenvs-create: true
virtualenvs-in-project: true

#----------------------------------------------
# install or use cached dependencies
#----------------------------------------------
- name: Load cached venv
id: cached-poetry-dependencies
uses: actions/cache@v3
with:
path: .venv
key: venv-${{ runner.os }}-python-${{ steps.setup-python.outputs.python-version }}-${{ hashFiles('**/poetry.lock') }}
- name: Install dependencies
if: steps.cached-poetry-dependencies.outputs.cache-hit != 'true'
run: |
poetry install --no-interaction --no-root --all-extras --without dev

# always install current root package
- name: Install library
run: poetry install --no-interaction --all-extras --without dev

#----------------------------------------------
# ----- setup matchers & run mypy -----
#----------------------------------------------
- name: Setup matchers
run: |
echo "::add-matcher::.github/workflows/matchers/mypy.json"
echo "TERM: changing from $TERM -> xterm"
export TERM=xterm
- name: Run mypy
# NOTE: tomli is sometimes missing, install it explicitly
run: |
source $VENV
pip install tomli
pip install "mypy>=1.7.0"
mypy --show-column-numbers .
67 changes: 45 additions & 22 deletions .github/workflows/publish.yml
Original file line number Diff line number Diff line change
@@ -1,36 +1,59 @@
name: Publish to PyPI
# This workflow will upload a Python Package to PyPI when a release is created
# For more information see: https://docs.github.com/en/actions/automating-builds-and-tests/building-and-testing-python#publishing-to-package-registries

# This workflow uses actions that are not certified by GitHub.
# They are provided by a third-party and are governed by
# separate terms of service, privacy policy, and support
# documentation.

name: Upload Python Package

on:
push:
tags:
- 'v0.1.1'
release:
types: [published]

permissions:
contents: read

jobs:
publish:
pypi-publish:
name: Upload release to PyPI
runs-on: ubuntu-latest
environment:
name: pypi
url: https://pypi.org/project/synthius/
permissions:
id-token: write

steps:
- name: Check out code
uses: actions/checkout@v3

- name: Set up Python
uses: actions/setup-python@v4
#----------------------------------------------
# check-out repo and set-up python
#----------------------------------------------
- name: Checkout code
uses: actions/checkout@v4
- name: Set up Python 3.10
uses: actions/setup-python@v5
with:
python-version: '3.10.10'
python-version: "3.10"

#----------------------------------------------
# ----- install poetry -----
#----------------------------------------------
- name: Install Poetry
run: curl -sSL https://install.python-poetry.org | python3 -

- name: Configure Poetry
run: poetry config virtualenvs.create false
uses: snok/install-poetry@v1

#----------------------------------------------
# install dependencies and build packages
#----------------------------------------------
- name: Install dependencies
run: poetry install

- name: Build package
run: poetry install --sync --no-interaction
- name: Package project
run: poetry build

- name: Publish to PyPI
env:
POETRY_PYPI_TOKEN: ${{ secrets.PYPI_TOKEN }}
run: poetry publish --username __token__ --password ${{ secrets.PYPI_TOKEN }}
#----------------------------------------------
# upload to PyPI
#----------------------------------------------
- name: Publish release distributions to PyPI
uses: pypa/gh-action-pypi-publish@release/v1
with:
packages-dir: dist/
37 changes: 37 additions & 0 deletions .github/workflows/ruff.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
name: Ruff

on:
- pull_request

jobs:
Ruff:
defaults:
run:
shell: bash
strategy:
fail-fast: true
matrix:
os: ["ubuntu-latest", "macos-latest"]
python-version: ["3.10", "3.11"]
runs-on: ${{ matrix.os }}
steps:
#----------------------------------------------
# check-out repo and set-up python
#----------------------------------------------
- name: Check out repository
uses: actions/checkout@v4
- name: Set up python ${{ matrix.python-version }}
id: setup-python
uses: actions/setup-python@v5
with:
python-version: ${{ matrix.python-version }}

#----------------------------------------------
# ----- run ruff -----
#----------------------------------------------
- name: Lint and check format with ruff
uses: astral-sh/ruff-action@v3
with:
version-file: "pyproject.toml"
- run: ruff check --output-format=github
- run: ruff format --check
10 changes: 1 addition & 9 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -6,9 +6,6 @@ __pycache__/
# Caching directories
.mypy_cache/

# Data directories
/data

# Jupyter Notebook checkpoints
.ipynb_checkpoints/

Expand All @@ -34,9 +31,4 @@ Thumbs.db

# Other files and folders that should be ignored
*.bak
*.tmp
/models
/models-MIMIC
/notebooks/MIMIC_
/notebooks/outdated
/documentation/figures
*.tmp
6 changes: 3 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,16 +23,16 @@ pip install synthius
### Step 2: Usage Example
To understand how to use this package, explore the three example Jupyter notebooks included in the repository:

1. **[Generator](example/1_generator.ipynb)**
1. **[Generator](examples/1_generator.ipynb)**
- Demonstrates how to generate synthetic data using seven different models.
- Update paths and configurations (e.g., file paths, target column) to fit your dataset.
- Run the cells to generate synthetic datasets.

2. **[AutoGloun](example/2_autogloun.ipynb)**
2. **[AutoGloun](examples/2_autogloun.ipynb)**
- Evaluates the utility.
- Update the paths as needed to analyze your data.

3. **[Evaluation](example/3_evaluation.ipynb)**
3. **[Evaluation](examples/3_evaluation.ipynb)**
- Provides examples of computing metrics for evaluating synthetic data, including:
- Utility
- Fidelity/Similarity
Expand Down
13 changes: 7 additions & 6 deletions example/1_generator.ipynb → examples/1_generator.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@
"outputs": [],
"source": [
"import warnings\n",
"from pathlib import Path\n",
"\n",
"import pandas as pd\n",
"from sdv.metadata import SingleTableMetadata\n",
Expand All @@ -17,8 +18,8 @@
"warnings.filterwarnings(\"ignore\")\n",
"\n",
"\n",
"data_path = \"PATH_TO_ORIGINAL_DATA\" # TODO: Change this to the path of the original data\n",
"synt_path = \"PATH_TO_SYNTHETIC_DATA_DIRECTORY\" # TODO: Change this to the path of the synthetic data\n",
"data_path = Path(\"PATH_TO_ORIGINAL_DATA\") # TODO: Change this to the path of the original data\n",
"synt_path = Path(\"PATH_TO_SYNTHETIC_DATA_DIRECTORY\") # TODO: Change this to the path of the synthetic data directory\n",
"\n",
"\n",
"data = pd.read_csv(data_path, low_memory=False)\n",
Expand Down Expand Up @@ -170,7 +171,7 @@
"metadata": {},
"outputs": [],
"source": [
"from synthetic_data.model import GaussianMultivariateSynthesizer\n",
"from synthius.model import GaussianMultivariateSynthesizer\n",
"\n",
"gaussian_multivariate_synthesizer = GaussianMultivariateSynthesizer(train_data, synt_path)\n",
"gaussian_multivariate_synthesizer.synthesize(num_sample=total_samples)"
Expand All @@ -191,8 +192,8 @@
"metadata": {},
"outputs": [],
"source": [
"from synthetic_data.data import DataImputationPreprocessor\n",
"from synthetic_data.model import WGAN, data_batcher\n",
"from synthius.data import DataImputationPreprocessor\n",
"from synthius.model import WGAN, data_batcher\n",
"\n",
"data_preprocessor = DataImputationPreprocessor(train_data)\n",
"processed_train_data = data_preprocessor.fit_transform()\n",
Expand Down Expand Up @@ -225,7 +226,7 @@
"metadata": {},
"outputs": [],
"source": [
"from synthetic_data.model import ARF\n",
"from synthius.model import ARF\n",
"\n",
"model = ARF(x=train_data, id_column=ID, min_node_size=5, num_trees=50, max_features=0.3)\n",
"forde = model.forde()\n",
Expand Down
12 changes: 7 additions & 5 deletions example/2_autogloun.ipynb → examples/2_autogloun.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,9 @@
"metadata": {},
"outputs": [],
"source": [
"from synthetic_data.model import ModelFitter, ModelLoader"
"from pathlib import Path\n",
"\n",
"from synthius.model import ModelFitter, ModelLoader"
]
},
{
Expand All @@ -15,10 +17,10 @@
"metadata": {},
"outputs": [],
"source": [
"train_data = \"PATH_TO_TRAIN_DATASET_AS_CSV\" # TODO: Change this to the path of the training dataset\n",
"test_data = \"PATH_TO_TEST_DATASET_AS_CSV\" # TODO: Change this to the path of the test dataset\n",
"synt_path = \"PATH_TO_SYNTHETIC_DATA_DIRECTORY\" # TODO: Change this to the path of the synthetic data directory\n",
"models_path = \"PATH_TO_MODELS_DIRECTORY\" # TODO: Change this to the path of the models directory\n",
"train_data = Path(\"PATH_TO_TRAIN_DATASET_AS_CSV\") # TODO: Change this to the path of the training dataset\n",
"test_data = Path(\"PATH_TO_TEST_DATASET_AS_CSV\") # TODO: Change this to the path of the test dataset\n",
"synt_path = Path(\"PATH_TO_SYNTHETIC_DATA_DIRECTORY\") # TODO: Change this to the path of the synthetic data directory\n",
"models_path = Path(\"PATH_TO_MODELS_DIRECTORY\") # TODO: Change this to the path of the models directory\n",
"\n",
"synthetic_data_paths = [\n",
" synt_path / \"CopulaGAN.csv\",\n",
Expand Down
Loading
Loading