Cookiecutter template for starting a Data Science project with modern, fast Python tools.
- Pipenv for managing packages and virtualenvs in a modern way.
- Prefect for modern pipelines and data workflow.
- Weights and Biases for experiment tracking.
- FastAPI for self-documenting fast HTTP APIs - on par with NodeJS and Go - based on asyncio, ASGI, and uvicorn.
- Modern CLI with Typer.
- Batteries included: Pandas, numpy, scipy, seaborn, and jupyterlab already installed.
- Consistent code quality: black, isort, autoflake, and pylint already installed.
- Pytest for testing.
- GitHub Pages for the public website.
Install the latest Cookiecutter and Pipenv:
pip install -U pipenv cookiecutter
Generate the project:
cookiecutter gh:Koffair/cookiecutter-modern-datascience
Get inside the project:
cd <repo_name>
pipenv shell # activates virtualenv
(Optional) Start Weights & Biases locally, if you don't want to use the cloud/on-premise version:
wandb local
Start working:
This is our your new project will look like:
├── .gitignore <- GitHub's excellent Python .gitignore customized for this project
├── LICENSE <- Your project's license.
├── Pipfile <- The Pipfile for reproducing the analysis environment
├── <- The top-level README for developers using this project.
├── data
│ ├── 0_raw <- The original, immutable data dump.
│ ├── 0_external <- Data from third party sources.
│ ├── 1_interim <- Intermediate data that has been transformed.
│ └── 2_final <- The final, canonical data sets for modeling.
├── docs <- GitHub pages website
│ ├── data_dictionaries <- Data dictionaries
│ └── references <- Papers, manuals, and all other explanatory materials.
├── notebooks <- Jupyter notebooks. Naming convention is a number (for ordering),
│ the creator's initials, and a short `_` delimited description, e.g.
│ `01_cp_exploratory_data_analysis.ipynb`.
├── output
│ ├── features <- Fitted and serialized features
│ ├── models <- Trained and serialized models, model predictions, or model summaries
│ └── reports <- Generated analyses as HTML, PDF, LaTeX, etc.
│ └── figures <- Generated graphics and figures to be used in reporting
├── pipelines <- Pipelines and data workflows.
│ ├── Pipfile <- The Pipfile for reproducing the pipelines environment
│ ├── <- The CLI entry point for all the pipelines
│ ├── <repo_name> <- Code for the various steps of the pipelines
│ │ ├──
│ │ ├── <- Download, generate, and process data
│ │ ├── <- Create exploratory and results oriented visualizations
│ │ ├── <- Turn raw data into features for modeling
│ │ └── <- Train and evaluate models
│ └── tests
│ ├── fixtures <- Where to put example inputs and outputs
│ │ ├── input.json <- Test input data
│ │ └── output.json <- Test output data
│ └── <- Integration tests for the HTTP API
└── serve <- HTTP API for serving predictions
├── Dockerfile <- Dockerfile for HTTP API
├── Pipfile <- The Pipfile for reproducing the serving environment
├── <- The entry point of the HTTP API
└── tests
├── fixtures <- Where to put example inputs and outputs
│ ├── input.json <- Test input data
│ └── output.json <- Test output data
└── <- Integration tests for the HTTP API