Make loading features from storage robust to order. #9

edpizzi · 2022-12-19T19:41:01Z

The current load_features implementation relies on features from each video (same video_id) being in a contiguous block. This matches how store_features organizes feature files.

Update load_features to accept descriptors in any order by sorting by video_id (then by start timestamp) before constructing VideoFeature structures. Also change store_features to sort by video_id before storing features.

The current `load_features` implementation relies on features from each video (same video_id) being in a contiguous block. This matches how `store_features` organizes feature files. Update `load_features` to accept descriptors in any order by sorting by video_id (then by start timestamp) before constructing `VideoFeature` structures. Also change `store_features` to sort by video_id before storing features.

chrisjkuch · 2022-12-20T04:49:29Z

tests/test_storage.py

+            restored = load_features(f.name)
+
+        features.sort(key=lambda x: x.video_id)
+        restored.sort(key=lambda x: x.video_id)


Should we be testing that restored is already properly sorted when loading with load_features? I'm not sure we should sort it here.

chrisjkuch · 2022-12-21T20:38:58Z

For the sake of completeness, we also tracked down the reason we believe the memory error was caused.

vsc2022/vsc/storage.py

Line 60 in 5d8af86

for video_id, start, end in same_value_ranges(video_ids):

In load_features, we iterate through same_value_ranges. For an unsorted array of video ids, this gives us a resulting array of VideoFeatures that is close to or exactly the same as the length of the array, rather than being the length of the number of videos.

vsc2022/vsc/descriptor_eval_lib.py

Line 39 in 3afe07a

num_candidates = int(AGGREGATED_CANDIDATES_PER_QUERY * len(query_features))

The resulting calculated number of query candidates to generate for a given input query descriptor is then more than an order of magnitude larger than we intend. When we exhaustively search for and return this number of candidates in our exponential iterator, we return increasingly large copies of matrices until we run out of memory.

facebook-github-bot · 2024-05-18T01:05:15Z

Hi @edpizzi!

Thank you for your pull request.

We require contributors to sign our Contributor License Agreement, and yours needs attention.

You currently have a record in our system, but the CLA is no longer valid, and will need to be resubmitted.

Process

In order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA.

Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with CLA signed. The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it.

If you have received this in error or have any questions, please contact us at [email protected]. Thanks!

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Dec 19, 2022

edpizzi requested review from mdouze, chrisjkuch and gkordo December 19, 2022 20:21

chrisjkuch reviewed Dec 20, 2022

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make loading features from storage robust to order. #9

Make loading features from storage robust to order. #9

edpizzi commented Dec 19, 2022

chrisjkuch Dec 20, 2022

chrisjkuch commented Dec 21, 2022

facebook-github-bot commented May 18, 2024

Make loading features from storage robust to order. #9

Are you sure you want to change the base?

Make loading features from storage robust to order. #9

Conversation

edpizzi commented Dec 19, 2022

chrisjkuch Dec 20, 2022

Choose a reason for hiding this comment

chrisjkuch commented Dec 21, 2022

facebook-github-bot commented May 18, 2024

Process