Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CSUP: Allow multiple objects #5507

Merged
merged 1 commit into from
Dec 3, 2024
Merged

CSUP: Allow multiple objects #5507

merged 1 commit into from
Dec 3, 2024

Conversation

mattnibs
Copy link
Collaborator

This pr changes the protocol for CSUP files so that a single file can contain multiple vng Objects. When writing a CSUP file a new object is created every 120,000 values.

@mattnibs mattnibs requested review from a team and removed request for a team November 26, 2024 20:13
@mattnibs mattnibs force-pushed the csup-multiple-objects branch from 85b4740 to b1f99bd Compare November 26, 2024 20:28
@mattnibs mattnibs requested a review from a team November 26, 2024 20:29
@mattnibs
Copy link
Collaborator Author

I should say there's an issue with this pr in the vcache.Cache expects a file to have a single object which in turn has a single vector. I'm not sure what to do about this but since we're not really worrying about lakes at this point I kind of think we should punt on this area for the time being.

@mattnibs mattnibs force-pushed the csup-multiple-objects branch from b1f99bd to 178acba Compare November 26, 2024 21:00
@philrz
Copy link
Contributor

philrz commented Nov 26, 2024

Note to self: Per a heads up from @mattnibs, I'll need to recreate the CSUP files used for mgbench once this merges so we can take advantage of parallel reads.

This pr changes the protocol for CSUP files so that a single file can
contain multiple vng Objects. When writing a CSUP file a new object is
created every 120,000 values.
@mattnibs mattnibs force-pushed the csup-multiple-objects branch from 178acba to 46a0f43 Compare November 27, 2024 17:48
@mccanne
Copy link
Collaborator

mccanne commented Nov 28, 2024

I should say there's an issue with this pr in the vcache.Cache expects a file to have a single object which in turn has a single vector. I'm not sure what to do about this but since we're not really worrying about lakes at this point I kind of think we should punt on this area for the time being.

We just need to name objects as ksuid:obj# and we can keep everything the same for now. Later we can add metadata that spans the objects in a file but that is stilled keyed on ksuid so I think it all hangs.

Copy link
Collaborator

@mccanne mccanne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cool!

@mattnibs mattnibs merged commit 26269cc into main Dec 3, 2024
3 checks passed
@mattnibs mattnibs deleted the csup-multiple-objects branch December 3, 2024 17:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants