Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cleanup ~/.cache/fus metadata before fus exits. #71

Open
jankaluza opened this issue Aug 13, 2019 · 14 comments · May be fixed by #72
Open

Cleanup ~/.cache/fus metadata before fus exits. #71

jankaluza opened this issue Aug 13, 2019 · 14 comments · May be fixed by #72

Comments

@jankaluza
Copy link

Hi,

I have found out fus is storing cached metadata in ~/.cache/fus. This is quite OK, but it should try to remove them once it finishes using it. I understand that in case of crash or some unpredictable issue, the metadata can stay there, but I think generally it should remove it.

So far I have to add following cronjob to clean that directory to composer machine we use:

find ~/.cache/fus -type f -mtime +1 -exec rm {} ;

@jankaluza
Copy link
Author

This is related to #61 which introduced the caching.

@r4f4
Copy link
Collaborator

r4f4 commented Aug 13, 2019

What if you want to keep making queries to the cached metadata? In that case we can't just remove it before exiting. What is the definition of "fus has finished using this metadata"?
@ignatenkobrain any opinion on this?

@r4f4
Copy link
Collaborator

r4f4 commented Aug 13, 2019

Of course one possible solution is to add an option --clear-cache-on-exit for those who are running only one query on the metadata.

@jankaluza
Copy link
Author

The issue is that everytime the repository metadata changes, fus downloads new set of metadata and previously cached metadata is stored on fs forever. For me, this is 1GB of metadata every day. I generate 3 composes using Pungi every hour as CI.

I also think that when executing fus in Pungi, the repository passed to fus is actually different on every pungi run. @lubomir probably knows that for sure.

It is OK for me to remove metadata older than 1 day by that cron job, but I'm afraid that this will have to be sooner or later be done by any fus (or at least pungi) user and therefore I was thinking it might be better to address this in fus directly.

@ignatenkobrain
Copy link
Member

Well, cache exists for a reason. If we would remove it, we could just not create it in the first place :)

what I think we should do is encode filenames in cache by the name of repo and not by url. @r4f4 could you make such patch?

@r4f4
Copy link
Collaborator

r4f4 commented Aug 14, 2019

@ignatenkobrain but we never used URL. The current layout of the cache is:

~/.cache/fus/$(repo name used in cmdline)/repodata/repomd.xml
~/.cache/fus/$(repo name used in cmdline)/repodata/$(repo-checksum)-{primary,comps,filelists}.xml.gz
~/.cache/fus/$(repo name used in cmdline)/$(repo-checksum).solv

So your suggestion is to use $(repo name used in cmdline) instead of $(repo-checksum)?

@ignatenkobrain
Copy link
Member

yes

@r4f4
Copy link
Collaborator

r4f4 commented Aug 14, 2019

Ok, will do. Thanks.

@lubomir
Copy link
Collaborator

lubomir commented Aug 14, 2019

Pungi reuses the same repo name, but there are no guarantees they will point to similar repos (particular in @hanzz's use case in ODCS). Caching by repo name instead of checksum should help, since that should result in the files being overwritten instead of always adding a new ones.

@jankaluza
Copy link
Author

That's how the cache directory looks for me currently:

https://paste.fedoraproject.org/paste/Sr41jYTwmLnJnUGzjDLOfg

@lubomir
Copy link
Collaborator

lubomir commented Aug 14, 2019

As I understand the proposed change, the files in the repodata subdir will use lookaside-X or repo-X name and thus there won't be that many of them.

@r4f4
Copy link
Collaborator

r4f4 commented Aug 14, 2019

@lubomir exactly. If that helps, I'll implement it today.

@jankaluza
Copy link
Author

That would help me a lot.

r4f4 added a commit to r4f4/fus that referenced this issue Aug 14, 2019
In some cases (e.g in Pungi), the metadata is changing all the time, fus
is run multiple times a day and the cache just grows. So instead of
using checksum we use the reponame passed in the command line invocation
and the metadata type to create a filename so that only one copy exists
for that reponame. Therefore the cache layout now is:

$CACHEDIR/fus/$reponame/repodata/repomd.xml
$CACHEDIR/fus/$reponame/repodata/primary.xml.gz
$CACHEDIR/fus/$reponame/repodata/modules.xml.gz $CACHEDIR/fus/$reponame/repodata/group_gz.x86_64.xml.xz
$CACHEDIR/fus/$reponame/repodata/filelists.xml.gz

Fixes fedora-modularity#71

Signed-off-by: Rafael Fonseca <[email protected]>
r4f4 added a commit to r4f4/fus that referenced this issue Aug 14, 2019
In some cases (e.g in Pungi), the metadata is changing all the time, fus
is run multiple times a day and the cache just grows. So instead of
using checksum we use the reponame passed in the command line invocation
and the metadata type to create a filename so that only one copy exists
for that reponame. Therefore the cache layout now is:

$CACHEDIR/fus/$reponame/repodata/repomd.xml
$CACHEDIR/fus/$reponame/repodata/primary.xml.gz
$CACHEDIR/fus/$reponame/repodata/modules.xml.gz
$CACHEDIR/fus/$reponame/repodata/group_gz.x86_64.xml.xz
$CACHEDIR/fus/$reponame/repodata/filelists.xml.gz

Fixes fedora-modularity#71

Signed-off-by: Rafael Fonseca <[email protected]>
@r4f4 r4f4 linked a pull request Aug 14, 2019 that will close this issue
r4f4 added a commit to r4f4/fus that referenced this issue Aug 14, 2019
In some cases (e.g in Pungi), the metadata is changing all the time, fus
is run multiple times a day and the cache just grows. So instead of
using checksum we use the reponame passed in the command line invocation
and the metadata type to create a filename so that only one copy exists
for that reponame. Therefore the cache layout now is:

$CACHEDIR/fus/$reponame/$chksum.solv
$CACHEDIR/fus/$reponame/$chksum.solvx
$CACHEDIR/fus/$reponame/repodata/repomd.xml
$CACHEDIR/fus/$reponame/repodata/primary.xml.gz
$CACHEDIR/fus/$reponame/repodata/modules.xml.gz
$CACHEDIR/fus/$reponame/repodata/group_gz.x86_64.xml.xz
$CACHEDIR/fus/$reponame/repodata/filelists.xml.gz

Fixes fedora-modularity#71

Signed-off-by: Rafael Fonseca <[email protected]>
@lubomir
Copy link
Collaborator

lubomir commented Jan 22, 2020

FYI for the Pungi use case this issue is no longer a problem since it uses a temporary directory as a cache and only shares it between runs with the same repos. Afterwards it gets cleaned by Pungi.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants