Generative models for creating synthetic data from Boston housing dataset.
Boston dataset is preprocessed in data_preparation.ipynb
file.
Load preprocessed data from boston_dataset_data.mat
file.
The Boston Housing Dataset (https://www.cs.toronto.edu/~delve/data/boston/bostonDetail.html) is small size dataset for benchmark machine learning algorithms.
Dataset contains 506 cases, each with 14 attributes (13 numerical/categorical predictive variables and 1 one target variable: median value of owner-occupied homes in $1000's).
Second and fourth column from predictors are deleted and target variable is joined to final dataset for generative modelling.
Shape of final dataset boston_dataset_data.mat
is (506,12)
.
Load preprocessed data with:
boston_data = loadmat('boston_dataset_data')['boston_dataset_data']
Generative models included:
- Gaussian mixture models
Distributions of 12 variables used for generative modelling: