Skip to content

Commit

Permalink
Deephyper (#23)
Browse files Browse the repository at this point in the history
* update

* update scripts

* update

* update

* update

* moved table of datasets metadata

* updated learning curve plotter

* added *.jpg to gitignore

* recording last epoch of training

* updated running scripts for polaris

* adding liblinear resutls in polaris

* fixed bug brier_score

* added json query to retrieve other key-values from anchors

* added balanced accuracy from confusion matrix

* adapted plot_learning_curves for all plots to have the same number of ranks if same lenght of list

* added function to retrieve hyperparameter values from row

* removed generic from preprocessing

* updated knn workflow

* fix in knn

* removed generic univariate feature selector

* updated knn

* adding xgboost scripts

* import mpi4py is now optional for cli/_run.py

* removed harver

* updated run function

* updated notebook

* added logging to run

* making poly_degree max to 3 for preprocessing

* do not generate bias feature for poly features

* adding comment

* adding scripts for knn

* adding logging and config to lcdb test

* updated notebooks

* default transform in baseworkflow is identity

* adding constant predictor

* updated color map for learning curve plots

* adding *.pyc to .gitignore

* adding code for lcdb 1 curves

* adding scripts for constant predictor

* running constant predictor on all datasets with different seeds

* updated ploting of learning curves

* added RandomClassifier

* added computation of OOB metrics for RFs

* some changes on dummies

ConstantWorkflow is now MajorityWorkflow

Added MeanWorkflow and MedianWorkflow and also added the option to
specify a regression task in the CLI via -tt=regression

* organizing and sharing code for neurocom

* updating .gitignore

* plots of HPOBench with constant predictor

* pushed code for multi-fidelity

* added scripts for mf-hpo

* adding notebooks

* updating scripts

* updated notebooks

* added scripts for other problems of hpobench

* updated doc of lcdb run --help

* updated fetch command to show split

* added dhb/lcdb/hpo benchmark

* updating notebooks

* made random forest scripts as array jobs

* updated create script

* updated scripts

* removing notebooks, updating scripts

* udpating scripts for knn

* solving issue #18

* updated densenn scripts

* solving issue on oob scores

* fixing bug in densenn scorer

* task-type was not a defined parameter of main

* lcdb run can now decide the type of schedule for anchor and epoch

* adding anchor and epoch schedule parameters to lcdb test

* densenn workflow now computes and return the number of parameters of the network

* updated gitignore

* added scripts for 2024-neurocom

* added pfn experimental scripts

* updated experimental scripts for MF-HPO

* updated get_iteration_schedule with generic get_schedule for random forest

* adding scripts

* adding scripts

* surf snellius scripts - setup, liblinear

* adding tensorflow dependency for nn workflows

* cleaning script for polaris installation

* updating .gitignore

* Merge branch 'deephyper' of https://github.com/fmohr/lcdb.git into
deephyper

* updated install script

* added scripts to run experiments in delft

* changed paths in the run

* updated run script

* update slurm config

* updated jobscript

* adding a comment

* update notebooks

* update

* Create Readme.md

* Update Readme.md

* update

* update

* working with keras 3.1.1

* updated dependencies

* Restriction of PolynomialFeatures to not increase the DB beyond 4GB

* updated scripts for surf-snellius cluster

* Update Readme.md

* Fix imputation of NaN values

* added lookahead regularization

* Update Readme.md

* Update Readme.md

* printing preprocess details

* updating memory consumption check in preprocessing workflow

* check iteration-wise curve by plotting

* adding jmespath to setup

* removing unecessary tmp variable

* starting to integrate the memory_limit

* serial evaluator will now use 1 worker by default

* refactor

* added memory limit to run

* changing import for deephyper analysis and removing memory check in preprocessing

* replacing profile decorator by partial with terminate_on_memory_exceeded

* adding memory limit in test

* using terminate_on_memory_exceeded for lcdb test

* handling BrokenProcessPool exception when memory_limit is exhausted

* script cleanup and plot additions

* Update Readme.md

* fixed json serialization issue, numpy types

* update highlight default hyperparameter in the plotting

* added support for regularizers: (#21)

* added support for regularizers:

- SWA
- Lookahead
- Snapshot Ensembles

* made periodicity of snapshot ensembles a hyperparameter

* Huge changes for proper scoring and pre-processing

* changed import order in utils

* added code for SWA

* added several sklearn models (some without hyperparameters yet)

* updated workflows

* solved problem with passed pp hyperparameters

* fixing issue in lcdb run when initial configs are not passed and inactive hyperparameters exist in the default hp configuration

* cleanup snellius scripts

* Update Readme.md

* updated and added several regularization techniques for DenseNN

* changed project structure, repaired bug in SVM, added campaign logic

* added logic for plotting and moved around some CLI parameters

* update data augmentation workflos in DenseNN

* update debug for extracting the traceback

* adjusted the _run.py w.r.t. to the bug reported by Andreas

* WIP submitted fix for RandomForest that may affect ExtraTrees (check max_samples and max_features arguments)

* refactor

* update README for Snellius

* Deephyper update (#22)

* WIP creating db, builder subpackages, started to revert experiments/_experiments.py

* created logic for lcdb add, and created repository logic

* resolved some bugs. Should work properly now.

* added LCDB class to ease things

* fixed some bugs in the fetching of all results.

---------

Co-authored-by: Deathn0t <[email protected]>
Co-authored-by: felix <felix@frank>

* repaired logic to initialize a non-existing LCDB in a system.

* added method to count the number of results before they are fetched

this is to avoid an overflow.

* plot was creating an undesired out.json file, which has been fixed

* Update README.md

* Update README.md

* updating lcdb run command with lcdb.builder supackage

* adding TreesEnsembleWorkflow to merge RandomForest and ExtraTrees

* setting ConfigSpace to 1.1.1

* updated functionality of repositories and lcdb init

* updated the folder logic of LCDB to always use a .lcdb folder

* Update README.md

* added campaign script, updated snellius scripts

* minor script fix

* addressed relative path issue

* temporary commit

* added logic to save output file

* refactor of script

* adds option for filetype

* developed numpy interface for results and include example notebook

* moving files to snellius/abalysis

* adding progress bar when query results from LCDB

* passing json query to get results, adding dummy queries

* updated analysis notebook

* added generative result generation with progress bar and some utilities

* removed duplicate function

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* added processors, enabled all sorts of combinations of queries

also removed config parameters from LCDB object

* Update README.md

* updates results

* incorporated config space object

* adds hyperparameter importance

* update runtime plot and regression

* added LearningCurveExatractor

* added unit tests and fixed bug in LearningCurveExtractor

* adding "parameterized" as dev extra in setup

* applying formatting to lcdb.analysis.util

* WIP porting lcdb to deephyper 0.8.0

* update to be compatible with deephyper==0.8.0

* fixing deephyper==0.8.0 in setup

* updated Installation in readme

* improved the learning curve class and enabled grouping

* Update README.md

* added tracking of runtime and dataset size after pre-processing steps

* added folder and notebook for use casess

* added logic for padding of sample-wise curves and merging for
iteration-curves

* added notebook for OOB comparison (use case 6)

* resolved bug in merge function

* fixed bug in padding

* added notebook for analaysis of variance

* made adjustments in variance use case notebook

* adjustments in use case notebooks

* resolved bugs in iteration wise learning curve of trees ensembles

also added support for timer injection.

* fixed typo

* added documentation for scheules and increased forest size to 2048

* added logging to the run

* added an error message that should be thrown if the training fails.

* adding logged warning when evaluation is failed because of memory limit

* fixed campaign float naming issue, restructure scripts

* update use case curve fitting

* update use case runtime plot

* refactor

* Update README.md

* Update README.md

* added learning curve groups

* Merge branch 'deephyper' of https://github.com/fmohr/lcdb.git into
deephyper

* fixed problems with XGBoost.

* changed standard parameter value of n_estimators in ExtraTrees

* first draft for ci pipeline with github action

* changing working dir of CI

* adding parameterized to ci install dependencies

* moving install of parameterized to tox.ini

* adding pytest mark for db requirement

* update plot for curve fitting use case

* update LCDB.debug to extract all tracebacks, error messages, and configs

* Update hyperparameter_importance.py

updates changes made locally

* Update hyperparameter_importance.py

* Update hyperparameter_importance.py

final changes to experimental setup

* update use case curves fitting

* added PCloud Repository

* using pcloud repository as the standard when initializing LCDB

* enabled uploads to the pCloud repository.

* added option to limit the token lifetime.

* Update runtime estimation usecase

* moving standardize_run_function_output to lcdb.builder.utils because removed from deephyper.evaluator

* update curve fitting plot

* adjusted some unit tests.

* disabled coverage and running pytest directly

---------

Co-authored-by: Deathn0t <[email protected]>
Co-authored-by: Jan van Rijn <[email protected]>
Co-authored-by: andreasparaskeva <[email protected]>
Co-authored-by: Cheng Yan <[email protected]>
Co-authored-by: janvanrijn <[email protected]>
Co-authored-by: Tom Viering <[email protected]>
  • Loading branch information
7 people authored Nov 18, 2024
1 parent e105db0 commit 0656f89
Show file tree
Hide file tree
Showing 293 changed files with 41,089 additions and 3,333 deletions.
37 changes: 37 additions & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
name: Continuous integration

on:
- pull_request
- push


jobs:

test:
runs-on: ubuntu-latest
strategy:
matrix:
python-version:
- "3.12"
defaults:
run:
working-directory: publications/2023-neurips/
steps:
- uses: actions/checkout@v3
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v3
with:
python-version: ${{ matrix.python-version }}
- name: Install dependencies
run: |
pip install --upgrade pip
pip install tox pylint black
# - name: Run Formatter
# run: black --diff --check $(git ls-files '*.py')
- name: Run Linter
run: pylint --exit-zero $(git ls-files '*.py')
- name: Run tests with tox
run: tox -e py3
- name: Upload coverage report
if: ${{ matrix.python-version == 3.12 }} # Only upload coverage once
uses: codecov/codecov-action@v1
6 changes: 6 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -12,8 +12,14 @@ publications/2023-neurips/build/
.DS_Store
*.egg-info/
*.log
*.err
*.db
publications/2023-neurips/lightning_logs/*
publications/2023-neurips/MNIST/*

.codecarbon.config
*.jpg
*.pyc
pf.txt
*.xz
*.gz
Original file line number Diff line number Diff line change
@@ -1,13 +1,21 @@
cpu.max = 4
mem.max = 8000
mem.max = 55000

keyfields = openmlid:int(5), learner:varchar(100), outer_seed, inner_seed_index:int(3)
resultfields = result:text

ignore.time = .*
ignore.memory = .*

openmlid = 1485, 1590, 1515, 1457, 1475, 1468, 1486, 1489, 23512, 23517, 4541, 4534, 4538, 4134, 4135, 40978, 40996, 41027, 40981, 40982, 40983, 40984, 40701, 40670, 40685, 40900, 1111, 42732, 42733, 42734, 40498, 41161, 41162, 41163, 41164, 41165, 41166, 41167, 41168, 41169, 41142, 41143, 41144, 41145, 41146, 41147, 41150, 41156, 41157, 41158, 41159, 41138, 54, 181, 188, 1461, 1494, 1464, 12, 23, 3, 1487, 40668, 1067, 1049, 40975, 31
outer_seed = 0
#openmlid = 3, 6, 11, 12, 13, 23, 30, 31, 54, 55, 60, 61, 181, 188, 201, 273, 293, 299, 336, 346, 380, 446, 1042, 1049, 1067, 1083, 1084, 1085, 1086, 1087, 1088, 1128, 1130, 1134, 1138, 1139, 1142, 1146, 1161, 1216, 1233, 1235, 1236, 1441, 1448, 1450, 1457, 1461, 1464, 1465, 1468, 1475, 1477, 1479, 1483, 1485, 1486, 1487, 1488, 1489, 1494, 1499, 1503, 1509, 1515, 1566, 1567, 1575, 1590, 1591, 1592, 1597, 4134, 4135, 4137, 4534, 4538, 4541, 23512, 23517, 40498, 40664, 40668, 40670, 40672, 40677, 40685, 40687, 40701, 40713, 40900, 40910, 40971, 40975, 40978, 40981, 40982, 40983, 40984, 40994, 40996, 41027, 41142, 41143, 41144, 41145, 41146, 41150, 41156, 41157, 41158, 41159, 41161, 41163, 41164, 41165, 41166, 41167, 41168, 41169, 41228, 41540, 41972, 42720, 42732, 42733, 42734, 42742, 42769, 42809, 42810, 42844

# rest of openmlids , , , ,

# too big: 1503, 1509, 1567

#openmlid = 40677, 40685, 40687, 40701, 40713, 40900, 40910, 40971, 40975, 40978, 40981, 40982, 40983, 40984, 40994, 40996, 41027, 41142, 41143, 41144, 41145, 41146, 41150, 41156, 41157, 41158, 41159, 41161, 41163, 41164, 41165, 41166, 41167, 41168, 41169, 41228, 41540, 41972, 42720, 42732, 42733, 42734, 42742, 42769, 42809, 42810, 42844
#,
openmlid = 1509, 1567
outer_seed = 0, 1, 2, 3, 4
inner_seed_index = 0
learner = SVC_linear, SVC_poly, SVC_rbf, SVC_sigmoid, sklearn.tree.DecisionTreeClassifier, sklearn.tree.ExtraTreeClassifier, sklearn.linear_model.LogisticRegression, sklearn.linear_model.PassiveAggressiveClassifier, sklearn.linear_model.Perceptron, sklearn.linear_model.RidgeClassifier, sklearn.linear_model.SGDClassifier, sklearn.neural_network.MLPClassifier, sklearn.discriminant_analysis.LinearDiscriminantAnalysis, sklearn.discriminant_analysis.QuadraticDiscriminantAnalysis, sklearn.naive_bayes.BernoulliNB, sklearn.naive_bayes.MultinomialNB, sklearn.neighbors.KNeighborsClassifier, sklearn.ensemble.ExtraTreesClassifier, sklearn.ensemble.RandomForestClassifier, sklearn.ensemble.GradientBoostingClassifier
13 changes: 12 additions & 1 deletion publications/2023-neurips/.gitignore
Original file line number Diff line number Diff line change
@@ -1,3 +1,14 @@
.ipynb_checkpoints
__pycache__
.idea
.idea
*.json
*.tar
*.gz
*.csv
*.png
*.yaml
*.zip

# Output I/O files from PBS scheduler
*.sh.e*
*.sh.o*
Loading

0 comments on commit 0656f89

Please sign in to comment.