-
Notifications
You must be signed in to change notification settings - Fork 149
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chore: debug decision tree classifier notebook #648
chore: debug decision tree classifier notebook #648
Conversation
# Create a list of notebooks to skip, usually because of their long execution time. | ||
# Deployment notebook is currently failing | ||
# FIXME: https://github.com/zama-ai/concrete-ml-internal/issues/4064 | ||
NOTEBOOKS_TO_SKIP=("docs/advanced_examples/Deployment.ipynb") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the notebook does not work (it's been quite some time now) so we're skipping it for now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it will be removed in Jordan's PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great, so it was cpu over-subscription issues again
this PR reduces the execution time of several notebooks that were taking too much time in the CI (
QuantizationAwareTraining
,ClassifierComparison
,LogisticRegression
)initially the issue was with the decision tree clasisfier/regression, taking up to an hour in the refresh notebook workflow, while only taking a few minutes locally / when refreshing the notebook alone, which is very mysterious (hence: https://github.com/zama-ai/concrete-ml-internal/issues/4423). The issue always happened at the
gridsearch
step, I've therefore putn_jobs=1
(to all notebooks actually) and reduced the amount of configurations taken into account in both.the other main issue was that some notebooks were creating a meshgrid (used for contour plots) with too many data points (I've seen some ~500 000 inferences in the log reg notebook). I've therefore reduced the interval chosen for generating some of these data points
now the workflow takes under 2 hours, but normally it should even take around 45 - 1 hour at most, so we'll see in the future
note: I've also fixed a small error in the
DecisionTreeRegressor
notebook (it was computing a prediction error using cml vs cml instead of sklearn vs cml)closes https://github.com/zama-ai/concrete-ml-internal/issues/4417