Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to estimate memory usage? #43

Open
okkez opened this issue Jun 29, 2020 · 4 comments
Open

How to estimate memory usage? #43

okkez opened this issue Jun 29, 2020 · 4 comments

Comments

@okkez
Copy link
Contributor

okkez commented Jun 29, 2020

We are running columnify as a part of fluent-plugin-s3 compressor (msgpack to parquet) for these days.
But columnify caused no memory error in some environments.
So I want to estimate memory usage of columnify.
Or is there a way to keep the memory usage constant regardless of the file size?

In my research, memory usage is proportional to file size.
Large files use 5 to 6 times the file size in memory.
For example, a large msgpack file (223MB) consumes memory about 1.3GB (ps command's RSS).

@syucream
Copy link
Contributor

syucream commented Jul 7, 2020

I think the part consumes momery should formatted row data by FormatToMap() and we can estimate memory usage by counting sizes of the row data.

@syucream
Copy link
Contributor

syucream commented Jul 7, 2020

I will try to repot the estimation result. I'm thinking to prepare something that writes estimation logs to stderr when it's called with -verbose flag or others.
On the other hand, I guess we can reduce memory consumption with rethinking intermediate representation.

@syucream
Copy link
Contributor

syucream commented Jul 7, 2020

Hmm it looks harder than I expected ... current parquet package highly depends on parquet-go and to suit it we have a redundant conversion at parquet.MarshalMap() will consumer many memory ...

@syucream
Copy link
Contributor

syucream commented Jul 7, 2020

First I will dig the problem more with using pprof.
#44

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants