Deephyper (#23)

* update * update scripts * update * update * update * moved table of datasets metadata * updated learning curve plotter * added *.jpg to gitignore * recording last epoch of training * updated running scripts for polaris * adding liblinear resutls in polaris * fixed bug brier_score * added json query to retrieve other key-values from anchors * added balanced accuracy from confusion matrix * adapted plot_learning_curves for all plots to have the same number of ranks if same lenght of list * added function to retrieve hyperparameter values from row * removed generic from preprocessing * updated knn workflow * fix in knn * removed generic univariate feature selector * updated knn * adding xgboost scripts * import mpi4py is now optional for cli/_run.py * removed harver * updated run function * updated notebook * added logging to run * making poly_degree max to 3 for preprocessing * do not generate bias feature for poly features * adding comment * adding scripts for knn * adding logging and config to lcdb test * updated notebooks * default transform in baseworkflow is identity * adding constant predictor * updated color map for learning curve plots * adding *.pyc to .gitignore * adding code for lcdb 1 curves * adding scripts for constant predictor * running constant predictor on all datasets with different seeds * updated ploting of learning curves * added RandomClassifier * added computation of OOB metrics for RFs * some changes on dummies ConstantWorkflow is now MajorityWorkflow Added MeanWorkflow and MedianWorkflow and also added the option to specify a regression task in the CLI via -tt=regression * organizing and sharing code for neurocom * updating .gitignore * plots of HPOBench with constant predictor * pushed code for multi-fidelity * added scripts for mf-hpo * adding notebooks * updating scripts * updated notebooks * added scripts for other problems of hpobench * updated doc of lcdb run --help * updated fetch command to show split * added dhb/lcdb/hpo benchmark * updating notebooks * made random forest scripts as array jobs * updated create script * updated scripts * removing notebooks, updating scripts * udpating scripts for knn * solving issue #18 * updated densenn scripts * solving issue on oob scores * fixing bug in densenn scorer * task-type was not a defined parameter of main * lcdb run can now decide the type of schedule for anchor and epoch * adding anchor and epoch schedule parameters to lcdb test * densenn workflow now computes and return the number of parameters of the network * updated gitignore * added scripts for 2024-neurocom * added pfn experimental scripts * updated experimental scripts for MF-HPO * updated get_iteration_schedule with generic get_schedule for random forest * adding scripts * adding scripts * surf snellius scripts - setup, liblinear * adding tensorflow dependency for nn workflows * cleaning script for polaris installation * updating .gitignore * Merge branch 'deephyper' of https://github.com/fmohr/lcdb.git into deephyper * updated install script * added scripts to run experiments in delft * changed paths in the run * updated run script * update slurm config * updated jobscript * adding a comment * update notebooks * update * Create Readme.md * Update Readme.md * update * update * working with keras 3.1.1 * updated dependencies * Restriction of PolynomialFeatures to not increase the DB beyond 4GB * updated scripts for surf-snellius cluster * Update Readme.md * Fix imputation of NaN values * added lookahead regularization * Update Readme.md * Update Readme.md * printing preprocess details * updating memory consumption check in preprocessing workflow * check iteration-wise curve by plotting * adding jmespath to setup * removing unecessary tmp variable * starting to integrate the memory_limit * serial evaluator will now use 1 worker by default * refactor * added memory limit to run * changing import for deephyper analysis and removing memory check in preprocessing * replacing profile decorator by partial with terminate_on_memory_exceeded * adding memory limit in test * using terminate_on_memory_exceeded for lcdb test * handling BrokenProcessPool exception when memory_limit is exhausted * script cleanup and plot additions * Update Readme.md * fixed json serialization issue, numpy types * update highlight default hyperparameter in the plotting * added support for regularizers: (#21) * added support for regularizers: - SWA - Lookahead - Snapshot Ensembles * made periodicity of snapshot ensembles a hyperparameter * Huge changes for proper scoring and pre-processing * changed import order in utils * added code for SWA * added several sklearn models (some without hyperparameters yet) * updated workflows * solved problem with passed pp hyperparameters * fixing issue in lcdb run when initial configs are not passed and inactive hyperparameters exist in the default hp configuration * cleanup snellius scripts * Update Readme.md * updated and added several regularization techniques for DenseNN * changed project structure, repaired bug in SVM, added campaign logic * added logic for plotting and moved around some CLI parameters * update data augmentation workflos in DenseNN * update debug for extracting the traceback * adjusted the _run.py w.r.t. to the bug reported by Andreas * WIP submitted fix for RandomForest that may affect ExtraTrees (check max_samples and max_features arguments) * refactor * update README for Snellius * Deephyper update (#22) * WIP creating db, builder subpackages, started to revert experiments/_experiments.py * created logic for lcdb add, and created repository logic * resolved some bugs. Should work properly now. * added LCDB class to ease things * fixed some bugs in the fetching of all results. --------- Co-authored-by: Deathn0t <[email protected]> Co-authored-by: felix <felix@frank> * repaired logic to initialize a non-existing LCDB in a system. * added method to count the number of results before they are fetched this is to avoid an overflow. * plot was creating an undesired out.json file, which has been fixed * Update README.md * Update README.md * updating lcdb run command with lcdb.builder supackage * adding TreesEnsembleWorkflow to merge RandomForest and ExtraTrees * setting ConfigSpace to 1.1.1 * updated functionality of repositories and lcdb init * updated the folder logic of LCDB to always use a .lcdb folder * Update README.md * added campaign script, updated snellius scripts * minor script fix * addressed relative path issue * temporary commit * added logic to save output file * refactor of script * adds option for filetype * developed numpy interface for results and include example notebook * moving files to snellius/abalysis * adding progress bar when query results from LCDB * passing json query to get results, adding dummy queries * updated analysis notebook * added generative result generation with progress bar and some utilities * removed duplicate function * Update README.md * Update README.md * Update README.md * Update README.md * added processors, enabled all sorts of combinations of queries also removed config parameters from LCDB object * Update README.md * updates results * incorporated config space object * adds hyperparameter importance * update runtime plot and regression * added LearningCurveExatractor * added unit tests and fixed bug in LearningCurveExtractor * adding "parameterized" as dev extra in setup * applying formatting to lcdb.analysis.util * WIP porting lcdb to deephyper 0.8.0 * update to be compatible with deephyper==0.8.0 * fixing deephyper==0.8.0 in setup * updated Installation in readme * improved the learning curve class and enabled grouping * Update README.md * added tracking of runtime and dataset size after pre-processing steps * added folder and notebook for use casess * added logic for padding of sample-wise curves and merging for iteration-curves * added notebook for OOB comparison (use case 6) * resolved bug in merge function * fixed bug in padding * added notebook for analaysis of variance * made adjustments in variance use case notebook * adjustments in use case notebooks * resolved bugs in iteration wise learning curve of trees ensembles also added support for timer injection. * fixed typo * added documentation for scheules and increased forest size to 2048 * added logging to the run * added an error message that should be thrown if the training fails. * adding logged warning when evaluation is failed because of memory limit * fixed campaign float naming issue, restructure scripts * update use case curve fitting * update use case runtime plot * refactor * Update README.md * Update README.md * added learning curve groups * Merge branch 'deephyper' of https://github.com/fmohr/lcdb.git into deephyper * fixed problems with XGBoost. * changed standard parameter value of n_estimators in ExtraTrees * first draft for ci pipeline with github action * changing working dir of CI * adding parameterized to ci install dependencies * moving install of parameterized to tox.ini * adding pytest mark for db requirement * update plot for curve fitting use case * update LCDB.debug to extract all tracebacks, error messages, and configs * Update hyperparameter_importance.py updates changes made locally * Update hyperparameter_importance.py * Update hyperparameter_importance.py final changes to experimental setup * update use case curves fitting * added PCloud Repository * using pcloud repository as the standard when initializing LCDB * enabled uploads to the pCloud repository. * added option to limit the token lifetime. * Update runtime estimation usecase * moving standardize_run_function_output to lcdb.builder.utils because removed from deephyper.evaluator * update curve fitting plot * adjusted some unit tests. * disabled coverage and running pytest directly --------- Co-authored-by: Deathn0t <[email protected]> Co-authored-by: Jan van Rijn <[email protected]> Co-authored-by: andreasparaskeva <[email protected]> Co-authored-by: Cheng Yan <[email protected]> Co-authored-by: janvanrijn <[email protected]> Co-authored-by: Tom Viering <[email protected]>
fmohr · Nov 18, 2024 · 0656f89 · 0656f89
1 parent e105db0
commit 0656f89
Show file tree

Hide file tree

Showing 293 changed files with 41,089 additions and 3,333 deletions.
diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml
@@ -0,0 +1,37 @@
+name: Continuous integration
+
+on:
+  - pull_request
+  - push
+
+
+jobs:
+
+  test:
+    runs-on: ubuntu-latest
+    strategy:
+      matrix:
+        python-version:
+          - "3.12"
+    defaults:
+      run:
+        working-directory: publications/2023-neurips/
+    steps:
+      - uses: actions/checkout@v3
+      - name: Set up Python ${{ matrix.python-version }}
+        uses: actions/setup-python@v3
+        with:
+          python-version: ${{ matrix.python-version }}
+      - name: Install dependencies
+        run: |
+          pip install --upgrade pip
+          pip install tox pylint black
+      # - name: Run Formatter
+      #   run: black --diff --check $(git ls-files '*.py')
+      - name: Run Linter
+        run: pylint --exit-zero $(git ls-files '*.py')
+      - name: Run tests with tox
+        run: tox -e py3
+      - name: Upload coverage report
+        if: ${{ matrix.python-version == 3.12 }} # Only upload coverage once
+        uses: codecov/codecov-action@v1
diff --git a/.gitignore b/.gitignore
@@ -12,8 +12,14 @@ publications/2023-neurips/build/
 .DS_Store
 *.egg-info/
 *.log
+*.err
 *.db
 publications/2023-neurips/lightning_logs/*
 publications/2023-neurips/MNIST/*
 
 .codecarbon.config
+*.jpg
+*.pyc
+pf.txt
+*.xz
+*.gz
diff --git a/publications/2022-ecml/experiments/conf/experiments_lc_collection.conf b/publications/2022-ecml/experiments/conf/experiments_lc_collection.conf
@@ -1,13 +1,21 @@
 cpu.max = 4
-mem.max = 8000
+mem.max = 55000
 
 keyfields = openmlid:int(5), learner:varchar(100), outer_seed, inner_seed_index:int(3)
 resultfields = result:text
 
 ignore.time = .*
 ignore.memory = .*
 
-openmlid = 1485, 1590, 1515, 1457, 1475, 1468, 1486, 1489, 23512, 23517, 4541, 4534, 4538, 4134, 4135, 40978, 40996, 41027, 40981, 40982, 40983, 40984, 40701, 40670, 40685, 40900,  1111, 42732, 42733, 42734, 40498, 41161, 41162, 41163, 41164, 41165, 41166, 41167, 41168, 41169, 41142, 41143, 41144, 41145, 41146, 41147, 41150, 41156, 41157, 41158,  41159, 41138, 54, 181, 188, 1461, 1494, 1464, 12, 23, 3, 1487, 40668, 1067, 1049, 40975, 31
-outer_seed = 0
+#openmlid = 3, 6, 11, 12, 13, 23, 30, 31, 54, 55, 60, 61, 181, 188, 201, 273, 293, 299, 336, 346, 380, 446, 1042, 1049, 1067, 1083, 1084, 1085, 1086, 1087, 1088, 1128, 1130, 1134, 1138, 1139, 1142, 1146, 1161, 1216, 1233, 1235, 1236, 1441, 1448, 1450, 1457, 1461, 1464, 1465, 1468, 1475, 1477, 1479, 1483, 1485, 1486, 1487, 1488, 1489, 1494, 1499, 1503, 1509, 1515, 1566, 1567, 1575, 1590, 1591, 1592, 1597, 4134, 4135, 4137, 4534, 4538, 4541, 23512, 23517, 40498, 40664, 40668, 40670, 40672, 40677, 40685, 40687, 40701, 40713, 40900, 40910, 40971, 40975, 40978, 40981, 40982, 40983, 40984, 40994, 40996, 41027, 41142, 41143, 41144, 41145, 41146, 41150, 41156, 41157, 41158, 41159, 41161, 41163, 41164, 41165, 41166, 41167, 41168, 41169, 41228, 41540, 41972, 42720, 42732, 42733, 42734, 42742, 42769, 42809, 42810, 42844
+
+# rest of openmlids , , , , 
+
+# too big: 1503, 1509, 1567
+
+#openmlid = 40677, 40685, 40687, 40701, 40713, 40900, 40910, 40971, 40975, 40978, 40981, 40982, 40983, 40984, 40994, 40996, 41027, 41142, 41143, 41144, 41145, 41146, 41150, 41156, 41157, 41158, 41159, 41161, 41163, 41164, 41165, 41166, 41167, 41168, 41169, 41228, 41540, 41972, 42720, 42732, 42733, 42734, 42742, 42769, 42809, 42810, 42844 
+#, 
+openmlid = 1509, 1567
+outer_seed = 0, 1, 2, 3, 4
 inner_seed_index = 0
 learner = SVC_linear, SVC_poly, SVC_rbf, SVC_sigmoid, sklearn.tree.DecisionTreeClassifier, sklearn.tree.ExtraTreeClassifier, sklearn.linear_model.LogisticRegression, sklearn.linear_model.PassiveAggressiveClassifier, sklearn.linear_model.Perceptron, sklearn.linear_model.RidgeClassifier, sklearn.linear_model.SGDClassifier, sklearn.neural_network.MLPClassifier, sklearn.discriminant_analysis.LinearDiscriminantAnalysis, sklearn.discriminant_analysis.QuadraticDiscriminantAnalysis, sklearn.naive_bayes.BernoulliNB, sklearn.naive_bayes.MultinomialNB, sklearn.neighbors.KNeighborsClassifier, sklearn.ensemble.ExtraTreesClassifier, sklearn.ensemble.RandomForestClassifier, sklearn.ensemble.GradientBoostingClassifier
diff --git a/publications/2023-neurips/.gitignore b/publications/2023-neurips/.gitignore
@@ -1,3 +1,14 @@
 .ipynb_checkpoints
 __pycache__
-.idea
+.idea
+*.json
+*.tar
+*.gz
+*.csv
+*.png
+*.yaml
+*.zip
+
+# Output I/O files from PBS scheduler
+*.sh.e*
+*.sh.o*