Skip to content

Commit

Permalink
Final changes for v0.1.0 (automl#341)
Browse files Browse the repository at this point in the history
* [enhance] Increase the coverage (automl#336)

* [feat] Support statistics print by adding results manager object (automl#334)

* [feat] Support statistics print by adding results manager object

* [refactor] Make SearchResults extract run_history at __init__

Since the search results should not be kept in eternally,
I made this class to take run_history in __init__ so that
we can implicitly call extraction inside.
From this change, the call of extraction from outside is not recommended.
However, you can still call it from outside and to prevent mixup of
the environment, self.clear() will be called.

* [fix] Separate those changes into PR#336

* [fix] Fix so that test_loss includes all the metrics

* [enhance] Strengthen the test for sprint and SearchResults

* [fix] Fix an issue in documentation

* [enhance] Increase the coverage

* [refactor] Separate the test for results_manager to organize the structure

* [test] Add the test for get_incumbent_Result

* [test] Remove the previous test_get_incumbent and see the coverage

* [fix] [test] Fix reversion of metric and strengthen the test cases

* [fix] Fix flake8 issues and increase coverage

* [fix] Address Ravin's comments

* [enhance] Increase the coverage

* [fix] Fix a flake8 issu

* Update for release (automl#335)

* Create release workflow and CITATION.cff  and update README, setup.py

* fix bug in pypy token

* fix documentation formatting

* TODO for docker image

* accept suggestions from shuhei

* add further options for disable_file_output documentation

* remove  from release.yml

* [feat] Add templates for issue and PR with the Ravin's suggestions (automl#136)

* [doc] Add the workflow of the Auto-Pytorch (automl#285)

* [doc] Add workflow of the AutoPytorch

* [doc] Address Ravin's comment

* [FIX] Silence catboost (automl#338)

* set verbose=False in catboost

* fix flake

* change worst possible result of r2 (automl#340)

* Update README.md with link for master branch

* [FIX formatting in docs (automl#342)

* fix formatting in docs

* Update examples/40_advanced/example_resampling_strategy.py

* Update README.md, remove cat requirements.txt

Co-authored-by: nabenabe0928 <[email protected]>
  • Loading branch information
ravinkohli and nabenabe0928 authored Nov 23, 2021
1 parent a1512d5 commit e4863fe
Show file tree
Hide file tree
Showing 28 changed files with 3,018 additions and 259 deletions.
48 changes: 48 additions & 0 deletions .github/ISSUE_TEMPLATE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
NOTE: ISSUES ARE NOT FOR CODE HELP - Ask for Help at https://stackoverflow.com

Your issue may already be reported!
Also, please search on the [issue tracker](../) before creating one.

* **I'm submitting a ...**
- [ ] bug report
- [ ] feature request
- [ ] support request => Please do not submit support request here, see note at the top of this template.

# Issue Description
* When Issue Happens
* Steps To Reproduce
1.
1.
1.

## Expected Behavior
<!--- If you're describing a bug, tell us what should happen -->
<!--- If you're suggesting a change/improvement, tell us how it should work -->

## Current Behavior
<!--- If describing a bug, tell us what happens instead of the expected behavior -->
<!--- If suggesting a change/improvement, explain the difference from current behavior -->

## Possible Solution
<!--- Not obligatory, but suggest a fix/reason for the bug, -->
<!--- or ideas how to implement the addition or change -->

## Your Code

```
If relevant, paste all of your challenge code here
```

## Error message

```
If relevant, paste all of your error messages here
```

## Your Local environment
* Operating System, version
* Python, version
* Outputs of `pip freeze` or `conda list`

Make sure to add **all the information needed to understand the bug** so that someone can help.
If the info is missing, we'll add the 'Needs more information' label and close the issue until there is enough information.
38 changes: 38 additions & 0 deletions .github/PULL_REQUEST_TEMPLATE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
<!--- Provide a general summary of your changes in the Title above -->

## Types of changes
<!--- What types of changes does your code introduce? Put an `x` in all the boxes that apply: -->
- [ ] Breaking change (fix or feature that would cause existing functionality to not work as expected)
- [ ] Bug fix (non-breaking change which fixes an issue)
- [ ] New feature (non-breaking change which adds functionality)

Note that a Pull Request should only contain one of refactoring, new features or documentation changes.
Please separate these changes and send us individual PRs for each.
For more information on how to create a good pull request, please refer to [The anatomy of a perfect pull request](https://medium.com/@hugooodias/the-anatomy-of-a-perfect-pull-request-567382bb6067).

## Checklist:
<!--- Go over all the following points, and put an `x` in all the boxes that apply. -->
<!--- If you're unsure about any of these, don't hesitate to ask. We're here to help! -->
- [ ] My code follows the code style of this project.
- [ ] My change requires a change to the documentation.
- [ ] I have updated the documentation accordingly.
* [ ] Have you checked to ensure there aren't other open [Pull Requests](../../../pulls) for the same update/change?
* [ ] Have you added an explanation of what your changes do and why you'd like us to include them?
* [ ] Have you written new tests for your core changes, as applicable?
* [ ] Have you successfully ran tests with your changes locally?
<!--
* [ ] Have you followed the guidelines in our Contributing document?
-->


## Description
<!--- Describe your changes in detail -->

## Motivation and Context
<!--- Why is this change required? What problem does it solve? -->
<!--- If it fixes an open issue, please link to the issue here. -->

## How has this been tested?
<!--- Please describe in detail how you tested your changes. -->
<!--- Include details of your testing environment, tests ran to see how -->
<!--- your change affects other areas of the code, etc. -->
33 changes: 33 additions & 0 deletions .github/workflows/release.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
name: Push to PyPi

on:
push:
branches:
- master

jobs:
test:
runs-on: "ubuntu-latest"

steps:
- name: Checkout source
uses: actions/checkout@v2

- name: Set up Python 3.8
uses: actions/setup-python@v1
with:
python-version: 3.8

- name: Install build dependencies
run: python -m pip install build wheel

- name: Build distributions
shell: bash -l {0}
run: python setup.py sdist bdist_wheel

- name: Publish package to PyPI
if: github.repository == 'automl/Auto-PyTorch' && github.event_name == 'push' && startsWith(github.ref, 'refs/tags')
uses: pypa/gh-action-pypi-publish@master
with:
user: __token__
password: ${{ secrets.pypi_token }}
19 changes: 19 additions & 0 deletions CITATION.cff
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
preferred-citation:
type: article
authors:
- family-names: "Zimmer"
given-names: "Lucas"
affiliation: "University of Freiburg, Germany"
- family-names: "Lindauer"
given-names: "Marius"
affiliation: "University of Freiburg, Germany"
- family-names: "Hutter"
given-names: "Frank"
affiliation: "University of Freiburg, Germany"
doi: "10.1109/TPAMI.2021.3067763"
journal-title: "IEEE Transactions on Pattern Analysis and Machine Intelligence"
title: "Auto-PyTorch Tabular: Multi-Fidelity MetaLearning for Efficient and Robust AutoDL"
year: 2021
note: "also available under https://arxiv.org/abs/2006.13799"
start: 3079
end: 3090
91 changes: 81 additions & 10 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,14 +1,42 @@
# Auto-PyTorch

Copyright (C) 2019 [AutoML Group Freiburg](http://www.automl.org/)
Copyright (C) 2021 [AutoML Groups Freiburg and Hannover](http://www.automl.org/)

This an alpha version of Auto-PyTorch with improved API.
So far, Auto-PyTorch supports tabular data (classification, regression).
We plan to enable image data and time-series data.
While early AutoML frameworks focused on optimizing traditional ML pipelines and their hyperparameters, another trend in AutoML is to focus on neural architecture search. To bring the best of these two worlds together, we developed **Auto-PyTorch**, which jointly and robustly optimizes the network architecture and the training hyperparameters to enable fully automated deep learning (AutoDL).

Auto-PyTorch is mainly developed to support tabular data (classification, regression).
The newest features in Auto-PyTorch for tabular data are described in the paper ["Auto-PyTorch Tabular: Multi-Fidelity MetaLearning for Efficient and Robust AutoDL"](https://arxiv.org/abs/2006.13799) (see below for bibtex ref).
Also, find the documentation [here](https://automl.github.io/Auto-PyTorch/master).

Find the documentation [here](https://automl.github.io/Auto-PyTorch/development)
***From v0.1.0, AutoPyTorch has been updated to further improve usability, robustness and efficiency by using SMAC as the underlying optimization package as well as changing the code structure. Therefore, moving from v0.0.2 to v0.1.0 will break compatibility.
In case you would like to use the old API, you can find it at [`master_old`](https://github.com/automl/Auto-PyTorch/tree/master-old).***

## Workflow

The rough description of the workflow of Auto-Pytorch is drawn in the following figure.

<img src="figs/apt_workflow.png" width="500">

In the figure, **Data** is provided by user and
**Portfolio** is a set of configurations of neural networks that work well on diverse datasets.
The current version only supports the *greedy portfolio* as described in the paper *Auto-PyTorch Tabular: Multi-Fidelity MetaLearning for Efficient and Robust AutoDL*
This portfolio is used to warm-start the optimization of SMAC.
In other words, we evaluate the portfolio on a provided data as initial configurations.
Then API starts the following procedures:
1. **Validate input data**: Process each data type, e.g. encoding categorical data, so that Auto-Pytorch can handled.
2. **Create dataset**: Create a dataset that can be handled in this API with a choice of cross validation or holdout splits.
3. **Evaluate baselines** *1: Train each algorithm in the predefined pool with a fixed hyperparameter configuration and dummy model from `sklearn.dummy` that represents the worst possible performance.
4. **Search by [SMAC](https://github.com/automl/SMAC3)**:\
a. Determine budget and cut-off rules by [Hyperband](https://jmlr.org/papers/volume18/16-558/16-558.pdf)\
b. Sample a pipeline hyperparameter configuration *2 by SMAC\
c. Update the observations by obtained results\
d. Repeat a. -- c. until the budget runs out
5. Build the best ensemble for the provided dataset from the observations and [model selection of the ensemble](https://www.cs.cornell.edu/~caruana/ctp/ct.papers/caruana.icml04.icdm06long.pdf).

*1: Baselines are a predefined pool of machine learning algorithms, e.g. LightGBM and support vector machine, to solve either regression or classification task on the provided dataset

*2: A pipeline hyperparameter configuration specifies the choice of components, e.g. target algorithm, the shape of neural networks, in each step and
(which specifies the choice of components in each step and their corresponding hyperparameters.

## Installation

Expand All @@ -25,14 +53,57 @@ We recommend using Anaconda for developing as follows:
git submodule update --init --recursive

# Create the environment
conda create -n autopytorch python=3.8
conda activate autopytorch
conda create -n auto-pytorch python=3.8
conda activate auto-pytorch
conda install swig
cat requirements.txt | xargs -n 1 -L 1 pip install
python setup.py install

```

## Examples

In a nutshell:

```py
from autoPyTorch.api.tabular_classification import TabularClassificationTask

# data and metric imports
import sklearn.model_selection
import sklearn.datasets
import sklearn.metrics
X, y = sklearn.datasets.load_digits(return_X_y=True)
X_train, X_test, y_train, y_test = \
sklearn.model_selection.train_test_split(X, y, random_state=1)

# initialise Auto-PyTorch api
api = TabularClassificationTask()

# Search for an ensemble of machine learning algorithms
api.search(
X_train=X_train,
y_train=y_train,
X_test=X_test,
y_test=y_test,
optimize_metric='accuracy',
total_walltime_limit=300,
func_eval_time_limit_secs=50
)

# Calculate test accuracy
y_pred = api.predict(X_test)
score = api.score(y_pred, y_test)
print("Accuracy score", score)
```

For more examples including customising the search space, parellising the code, etc, checkout the `examples` folder

```sh
$ cd examples/
```


Code for the [paper](https://arxiv.org/abs/2006.13799) is available under `examples/ensemble` in the [TPAMI.2021.3067763](https://github.com/automl/Auto-PyTorch/tree/TPAMI.2021.3067763`) branch.

## Contributing

If you want to contribute to Auto-PyTorch, clone the repository and checkout our current development branch
Expand Down Expand Up @@ -63,8 +134,8 @@ Please refer to the branch `TPAMI.2021.3067763` to reproduce the paper *Auto-PyT
title = {Auto-PyTorch Tabular: Multi-Fidelity MetaLearning for Efficient and Robust AutoDL},
journal = {IEEE Transactions on Pattern Analysis and Machine Intelligence},
year = {2021},
note = {IEEE early access; also available under https://arxiv.org/abs/2006.13799},
pages = {1-12}
note = {also available under https://arxiv.org/abs/2006.13799},
pages = {3079 - 3090}
}
```

Expand Down
Loading

0 comments on commit e4863fe

Please sign in to comment.