ML Development Guide

For time series, many say don't use random split for train/test. Use earlier data to train, later data to test. I'm still not sure I completely agree with this.

Add a validation set. So you will have train/validation/test. In chronological order.

Always save the model to a file

Always save the model accuracy to a file (json)

from sklearn.metrics import accuracy_score
import json
accuracy = accuracy_score(labels, predictions)
metrics = {"accuracy": accuracy}
accuracy_path = repo_path / "metrics/accuracy.json"
accuracy_path.write_text(json.dumps(metrics))

Use git tags to manage ready-to-go models

Create a new git branch for each new feature

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ML Development Guide

Clone this wiki locally