Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory usage in createrepo_c > 1 #443

Open
kwizart opened this issue Nov 22, 2024 · 7 comments
Open

Memory usage in createrepo_c > 1 #443

kwizart opened this issue Nov 22, 2024 · 7 comments
Assignees
Labels
Triaged Someone on the DNF team has read the issue and determined the next steps to take

Comments

@kwizart
Copy link

kwizart commented Nov 22, 2024

Current createrepo_c uses lot more memory than previous versions. There is a need to discover why and if it's legitimate.

With our koji infra migrated to createrepo_c 1+ (at rpmfusion.org), we have experienced lot or OOM error on koji regen-repo tasks. Our builders only have 4Go of RAM and previously could have handled 4 mergerepos_c (<1) tasks at the same time (one for each arches). But with createrepo_c > 1, we could only handle 3 of same without OOM.

We have found a workaround with tuning koji task weight but we need to figure out why theses task use lot more memory (about 1,1G each) over previous createrepo_c.

Side node: we are using createrepo_c from updated fedora f39/f40 (not the fedora infra-tags version when relevant).

@kwizart
Copy link
Author

kwizart commented Nov 22, 2024

In order to reproduce:

mkdir /tmp/koji ; cd /tmp/koji
curl -LO https://koji.rpmfusion.org/kojifiles/repos/f42-free-multilibs-build/83898/x86_64/blocklist
curl -LO https://koji.rpmfusion.org/kojifiles/repos/f42-free-multilibs-build/83898/groups/comps.xml
/usr/bin/mergerepo_c --koji -b blocklist -a x86_64 -o /var/tmp/koji/tasks/564/660564/repo -g comps.xml -r http://koji.rpmfusion.org/buildsys-override/f42-free/x86_64/ -r http://dl.fedoraproject.org/pub/fedora/linux/development/rawhide/Everything/x86_64/os/ -r http://codecs.fedoraproject.org/openh264/42/x86_64/os/

Using valgrind --tool=massif /usr/bin/mergerepo_c ... seems to output a final memory usage of about:

  • 1.11G with f40 userspace and createrepo_c 1.1.4
  • 1,01G with f39 userspace and createrepo_c-0.21.1-4.fc39.3.x86_64 (from fedora koji infra-tags repository)

So it's about a 10% increase. (not sure how to read the results and how to better characterize the memory usage with creatrepo_c)

@kontura kontura self-assigned this Nov 25, 2024
@kontura
Copy link
Contributor

kontura commented Nov 25, 2024

I did some testing using the reproducer and I think this kind of difference could be caused by the change of default compression. I am getting roughly the same difference just by switching from zstd back to gz.

Can you try with --compress-type=gz?

While it takes more resources to compress, in my testing the zstd metdata are 68.9 MiB while gz are 87.3 MiB.
zstd should also be faster to decompress but I haven't tested it in this particular case.

@kontura
Copy link
Contributor

kontura commented Jan 20, 2025

@kwizart ^

@kwizart
Copy link
Author

kwizart commented Jan 20, 2025

@kontura thanks for the reminder. And yes, compression looks like a good candidate for the 10% increase.
I will be able to reproduce hopefully by the end of the week.

Still another side of the question is why so much memory ? (~1G, if ever the valgrind test is relevant ?)
Beyond compression, can createrepo_c uses less ?

@kontura
Copy link
Contributor

kontura commented Jan 21, 2025

Still another side of the question is why so much memory ? (~1G, if ever the valgrind test is relevant ?)
Beyond compression, can createrepo_c uses less ?

As usual the problem is mostly with filelists, they are fully loaded to memory and they account for more than 60% of all usage.
I don't think they are needed for the merging (and neither are the changelogs - other.xml) so we could load them on demand only when dumping the merged repo. createrepo_c libs have a recently new API for parsing the packages one at a time.

I can think of only one issue: the packages are sorted when merged and if the input repos were sorted differently we might have to parse more packages to get the one we need (at worst we might have to parse the full metadata - but this is the current situation). However repos created by createrepo_c are sorted so I guess we could do better by around 80%.

@kontura kontura added the Triaged Someone on the DNF team has read the issue and determined the next steps to take label Jan 21, 2025
@kwizart
Copy link
Author

kwizart commented Jan 27, 2025

I have tried to reproduce with f41 createrepo_c 1.20 x86_64 (dnf5) and here I'm experiencing 10x improvements compared to createrepo_c < 1.20 memory usage. (still unsure about the numbers but it's more about 90M with tar.gz to 103 with default compression)

The --compress-type=gz also gives better result with time (faster).

Still trying to reproduce with older createrepo_c to compare with...

@kwizart
Copy link
Author

kwizart commented Jan 27, 2025

Also (unrelated) but while trying to compute with mergerepo_c and rawhide, I wonder if merge repos ins't using both compression with multiple choice are available from a given kind.

Today's rawhide seems to be generated with zstd only, but f41 ones are using both zstd and tar.gz compression...
Using only one or another would be enough.... (maybe that could explain a rise in memory usage also ?)...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Triaged Someone on the DNF team has read the issue and determined the next steps to take
Projects
None yet
Development

No branches or pull requests

2 participants