Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mito: Null value should also contribute to memtable usage #3221

Closed
evenyag opened this issue Jan 23, 2024 · 4 comments · Fixed by #3722
Closed

mito: Null value should also contribute to memtable usage #3221

evenyag opened this issue Jan 23, 2024 · 4 comments · Fixed by #3722
Assignees
Labels
A-storage Involves code in storage engines good first issue Good for newcomers

Comments

@evenyag
Copy link
Contributor

evenyag commented Jan 23, 2024

What type of enhancement is this?

Other

What does the enhancement do?

Now the ValueRef always considers null as zero size

pub fn data_size(&self) -> usize {
match *self {
ValueRef::Null => 0,

The memtable uses ValueRef::data_size() to compute the allocated size, which is incorrect if there are too many nulls.

allocated += fields.iter().map(|v| v.data_size()).sum::<usize>();

The arrow array allocates space for nulls.

Implementation challenges

We might assume nulls occupy spaces or find a new way to estimate the memory used by array builders.

@evenyag evenyag added the A-storage Involves code in storage engines label Jan 23, 2024
@evenyag evenyag self-assigned this Jan 23, 2024
@evenyag evenyag added this to mito2 Jan 23, 2024
@killme2008
Copy link
Contributor

I wonder if we can do a quick fix.

@evenyag evenyag moved this to Todo in mito2 Jan 30, 2024
@tisonkun tisonkun added the good first issue Good for newcomers label Apr 15, 2024
@tisonkun
Copy link
Collaborator

@evenyag what will be the size of a single Null in memtable? Is it a constant or it can be dynamic?

@etolbakov
Copy link
Collaborator

was just passing by and noticed this conversation
@tisonkun @evenyag
how do you folks feel about the following?

ValueRef::Null => std::mem::size_of::<*const u8>(),

@evenyag
Copy link
Contributor Author

evenyag commented Apr 16, 2024

It depends on the data type and the arrow array's implementation. Primitive arrays always push a default value into the data buffer so it occupies the same space as the native type. BinaryArray and StringArray also push an offset into the offset buffer so the size is the size of the offset type. I ignore the size of the null bitmap and buffer padding for simplicity.

It's hard to maintain the correct memory usage of the memtable via accumulating ValueRef::data_size(). Maybe we can use a constant for a quick fix? Maybe 8, the size of int64. At least, it is better than 0.

@etolbakov etolbakov self-assigned this Apr 16, 2024
@github-project-automation github-project-automation bot moved this from Todo to Done in mito2 Apr 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-storage Involves code in storage engines good first issue Good for newcomers
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

4 participants