Improve memory usage of `MetaLearnerGridSearch` #62

FrancescMartiEscofetQC · 2024-07-17T12:42:05Z

The idea of this PR is to improve the memory usage of MetaLearnerGridSearch. The main issue was that all metalearners instances were saved in memory. Each of this was referenced in two places, the jobs list and the raw_results_ list.

To remove it from the jobs list now _FitAndScoreJob stores only the factory and the parameters, then the MetaLearner object is created inside the joblib job which runs _fit_and_score.
For the reference in the raw_results_ now one can choose to return a generator from joblib.parallel instead of materializing all the results in a list.

It is important to notice that if the user wants to iterate themselves over the results and not materialize them (for example, to compute policy values and only store the one with the highest policy) then store_results and store_raw_results must be set to False as if store_results is True then the generator would be consumed when calling _format_results.

It also adds a grid_size_ attribute which can be useful if a generator is returned.

Checklist

Added a CHANGELOG.rst entry

codecov · 2024-07-17T12:48:37Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 94.88%. Comparing base (9406ef7) to head (1590e94).
Report is 24 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main      #62      +/-   ##
==========================================
+ Coverage   94.85%   94.88%   +0.03%     
==========================================
  Files          15       15              
  Lines        1534     1544      +10     
==========================================
+ Hits         1455     1465      +10     
  Misses         79       79

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

kklein

Thanks for working on this! :)

metalearners/grid_search.py

kklein · 2024-07-21T14:35:47Z

metalearners/grid_search.py

+      and you do not want to store all MetaLearners objects rather evaluate them after
+      fitting each one and just store one.
+
+    ``grid_size_`` will contain the number of hyperparameter combinations after fitting.


I haven't quite grasped yet what the motivation for this attribute is. If the user expects a generator, won't they just be used to querying it until it no longer yields?

Yes! But I think it may be useful in the case the user wants to show a progress bar or similar, this way it's easier for them to access the number of metalearners to fit instead of having to calculate it manually.

I think I understand what you mean. At the same time this use case is not super clear to me. Shall we simplify and remove it until we witness that there is a need for it?

I think I may not explain it correctly, in the case we set store_raw_results = False and store_results = False then the fit method finishes "instantly" (it does not wait for fitting the individual metalearners). Then, afaict these are only fitted when the user requests them by iterating over the generator where it may be of use to use this grid_size_ to display some progress bar or similar. If you still think it's not clear lmk and we can discuss it further.

As discussed, I added some explanation about this in the docstring:
1590e94

Co-authored-by: Kevin Klein <[email protected]>

kklein

LGTM :)

kklein

LGTM

FrancescMartiEscofetQC added 3 commits July 17, 2024 13:51

Add options for storing

b6c3dd0

Tests

e8e3b39

Finish TODO

44fa6ec

FrancescMartiEscofetQC added 3 commits July 17, 2024 15:40

Reduce memory usage by not creating metalearner object

629999d

Update CHANGELOG

6da4180

Use generator_unordered

1e945ba

FrancescMartiEscofetQC marked this pull request as ready for review July 17, 2024 14:06

FrancescMartiEscofetQC requested a review from kklein as a code owner July 17, 2024 14:06

FrancescMartiEscofetQC added 5 commits July 17, 2024 16:37

Add grid_size_ and move attributes initialization to fit

4682e53

Fix

b1fd5b8

Fix

4ef8e6e

grid_size_ docstring

950dda3

Add new options to tutorial

46a88cc

kklein reviewed Jul 21, 2024

View reviewed changes

FrancescMartiEscofetQC and others added 5 commits July 22, 2024 09:17

Remove check empty generator

e34b751

Merge branch 'main' into generator_grid_search

62eacd4

Merge branch 'main' into generator_grid_search

7c16b02

Apply suggestions from code review

4c1c12d

Co-authored-by: Kevin Klein <[email protected]>

Merge branch 'main' into generator_grid_search

273b864

kklein previously approved these changes Jul 22, 2024

View reviewed changes

Add explanation grid_size_

1590e94

FrancescMartiEscofetQC dismissed kklein’s stale review via 1590e94 July 22, 2024 15:21

FrancescMartiEscofetQC requested a review from kklein July 22, 2024 15:21

kklein approved these changes Jul 22, 2024

View reviewed changes

FrancescMartiEscofetQC merged commit 80ce219 into main Jul 22, 2024
16 checks passed

FrancescMartiEscofetQC deleted the generator_grid_search branch July 22, 2024 15:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve memory usage of `MetaLearnerGridSearch` #62

Improve memory usage of `MetaLearnerGridSearch` #62

FrancescMartiEscofetQC commented Jul 17, 2024 •

edited

Loading

codecov bot commented Jul 17, 2024 •

edited

Loading

kklein left a comment

kklein Jul 21, 2024

FrancescMartiEscofetQC Jul 22, 2024

kklein Jul 22, 2024

FrancescMartiEscofetQC Jul 22, 2024

FrancescMartiEscofetQC Jul 22, 2024

kklein left a comment •

edited

Loading

kklein left a comment

Improve memory usage of MetaLearnerGridSearch #62

Improve memory usage of MetaLearnerGridSearch #62

Conversation

FrancescMartiEscofetQC commented Jul 17, 2024 • edited Loading

Checklist

codecov bot commented Jul 17, 2024 • edited Loading

Codecov Report

kklein left a comment

Choose a reason for hiding this comment

kklein Jul 21, 2024

Choose a reason for hiding this comment

FrancescMartiEscofetQC Jul 22, 2024

Choose a reason for hiding this comment

kklein Jul 22, 2024

Choose a reason for hiding this comment

FrancescMartiEscofetQC Jul 22, 2024

Choose a reason for hiding this comment

FrancescMartiEscofetQC Jul 22, 2024

Choose a reason for hiding this comment

kklein left a comment • edited Loading

Choose a reason for hiding this comment

kklein left a comment

Choose a reason for hiding this comment

Improve memory usage of `MetaLearnerGridSearch` #62

Improve memory usage of `MetaLearnerGridSearch` #62

FrancescMartiEscofetQC commented Jul 17, 2024 •

edited

Loading

codecov bot commented Jul 17, 2024 •

edited

Loading

kklein left a comment •

edited

Loading