Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make loading features from storage robust to order. #9

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

edpizzi
Copy link
Contributor

@edpizzi edpizzi commented Dec 19, 2022

The current load_features implementation relies on features from each video (same video_id) being in a contiguous block. This matches how store_features organizes feature files.

Update load_features to accept descriptors in any order by sorting by video_id (then by start timestamp) before constructing VideoFeature structures. Also change store_features to sort by video_id before storing features.

The current `load_features` implementation relies on features from each
video (same video_id) being in a contiguous block. This matches how
`store_features` organizes feature files.

Update `load_features` to accept descriptors in any order by sorting
by video_id (then by start timestamp) before constructing
`VideoFeature` structures. Also change `store_features` to sort by
video_id before storing features.
@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Dec 19, 2022
restored = load_features(f.name)

features.sort(key=lambda x: x.video_id)
restored.sort(key=lambda x: x.video_id)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we be testing that restored is already properly sorted when loading with load_features? I'm not sure we should sort it here.

@chrisjkuch
Copy link
Contributor

For the sake of completeness, we also tracked down the reason we believe the memory error was caused.

for video_id, start, end in same_value_ranges(video_ids):

In load_features, we iterate through same_value_ranges. For an unsorted array of video ids, this gives us a resulting array of VideoFeatures that is close to or exactly the same as the length of the array, rather than being the length of the number of videos.

num_candidates = int(AGGREGATED_CANDIDATES_PER_QUERY * len(query_features))

The resulting calculated number of query candidates to generate for a given input query descriptor is then more than an order of magnitude larger than we intend. When we exhaustively search for and return this number of candidates in our exponential iterator, we return increasingly large copies of matrices until we run out of memory.

@facebook-github-bot
Copy link

Hi @edpizzi!

Thank you for your pull request.

We require contributors to sign our Contributor License Agreement, and yours needs attention.

You currently have a record in our system, but the CLA is no longer valid, and will need to be resubmitted.

Process

In order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA.

Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with CLA signed. The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it.

If you have received this in error or have any questions, please contact us at [email protected]. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants