Skip to content

Latest commit

 

History

History
91 lines (55 loc) · 1.18 KB

README.md

File metadata and controls

91 lines (55 loc) · 1.18 KB

Houses Data Science Pipeline

This is a Houses Data Science pipeline produced from an analysis you can see as steps in notebooks/

Requirements

  • conda

Installation

Run the following to install the project as a python package

pip install houses_pipeline

Starting and exploring the project

conda create -f environment.yml
conda activate houses

Building the package

python setup.py bdist_wheel

Pipeline steps

Fetch the dataset

./houses_pipeline/fetch/fetch_dataset.sh data/raw

Or simply a one-liner of

kaggle competitions download -c house-prices-advanced-regression-techniques -p data/raw ;
unzip -o data/raw/*.zip -d data/raw/

Preprocess

python houses_pipeline/preprocess data/raw/train.csv data/interim/train.csv

Data Splitting

  • Not Yet Implemented

Model Training

  • Not Yet Implemented

Running tests

conda develop .
pytest

Usage

Preprocessing

python houses_pipeline/preprocess

Training the Lasso Regression

python houses_pipeline/modelling/train_lasso.py

Contributing\Developing

pip install -e .[dev]