chore: debug decision tree classifier notebook #648

RomanBredehoft · 2024-04-24T15:49:57Z

this PR reduces the execution time of several notebooks that were taking too much time in the CI ( QuantizationAwareTraining, ClassifierComparison, LogisticRegression)

initially the issue was with the decision tree clasisfier/regression, taking up to an hour in the refresh notebook workflow, while only taking a few minutes locally / when refreshing the notebook alone, which is very mysterious (hence: https://github.com/zama-ai/concrete-ml-internal/issues/4423). The issue always happened at the gridsearch step, I've therefore put n_jobs=1 (to all notebooks actually) and reduced the amount of configurations taken into account in both.

the other main issue was that some notebooks were creating a meshgrid (used for contour plots) with too many data points (I've seen some ~500 000 inferences in the log reg notebook). I've therefore reduced the interval chosen for generating some of these data points

now the workflow takes under 2 hours, but normally it should even take around 45 - 1 hour at most, so we'll see in the future

note: I've also fixed a small error in the DecisionTreeRegressor notebook (it was computing a prediction error using cml vs cml instead of sklearn vs cml)

closes https://github.com/zama-ai/concrete-ml-internal/issues/4417

RomanBredehoft · 2024-04-26T09:34:20Z

script/make_utils/jupyter.sh

+# Create a list of notebooks to skip, usually because of their long execution time.
+# Deployment notebook is currently failing
+# FIXME: https://github.com/zama-ai/concrete-ml-internal/issues/4064
+NOTEBOOKS_TO_SKIP=("docs/advanced_examples/Deployment.ipynb")


the notebook does not work (it's been quite some time now) so we're skipping it for now

it will be removed in Jordan's PR

andrei-stoian-zama

Great, so it was cpu over-subscription issues again

chore: debug decision tree classifier notebook

0a757e5

cla-bot bot added the cla-signed label Apr 24, 2024

RomanBredehoft and others added 10 commits April 24, 2024 17:57

chore: refresh DecisionTreeClassifier notebook

a081d16

chore: add debug in refresh script

1234259

chore: make long notebooks much faster

ec0ff23

chore: clean and put n_jobs=10

011d03d

chore: put back debug mode for jupyter

e814ce8

chore: set n_jobs to 1 in doc examples

9d367a3

chore: remove debug log level

f2f5d0e

chore: refresh notebooks

4536ec4

chore: improve decision tree regressor notebook

3306a4d

chore: refresh DecisionTreeRegressor notebook

3ca1830

RomanBredehoft commented Apr 26, 2024

View reviewed changes

RomanBredehoft marked this pull request as ready for review April 26, 2024 09:47

RomanBredehoft requested a review from a team as a code owner April 26, 2024 09:47

andrei-stoian-zama approved these changes Apr 26, 2024

View reviewed changes

RomanBredehoft merged commit 8a9eb9d into main Apr 26, 2024
11 checks passed

RomanBredehoft deleted the chore/debug_decision_tree_classifier_notebook branch April 26, 2024 10:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore: debug decision tree classifier notebook #648

chore: debug decision tree classifier notebook #648

RomanBredehoft commented Apr 24, 2024 •

edited

Loading

RomanBredehoft Apr 26, 2024

andrei-stoian-zama Apr 26, 2024

andrei-stoian-zama left a comment

chore: debug decision tree classifier notebook #648

chore: debug decision tree classifier notebook #648

Conversation

RomanBredehoft commented Apr 24, 2024 • edited Loading

RomanBredehoft Apr 26, 2024

Choose a reason for hiding this comment

andrei-stoian-zama Apr 26, 2024

Choose a reason for hiding this comment

andrei-stoian-zama left a comment

Choose a reason for hiding this comment

RomanBredehoft commented Apr 24, 2024 •

edited

Loading