-
Notifications
You must be signed in to change notification settings - Fork 143
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Countme should report system age, not repository age #1611
Comments
Probably the "best" thing to do is find the oldest countme file (including disabled repos). Hacky but maybe more accurate — does dnf create any other files in /var or /etc at install time that would likely have a corresponding file date which could be used. Both of these will probably cause "jumps" in my data — but I'm okay with that, really. |
I think we could use the transaction ID 1 in the DNF history database which, I believe, represents the fresh install through Anaconda. The transaction record contains the timestamp. On the CLI, you can check that with:
That way, we wouldn't need to store the "epoch" in the cookie file, and would just always use the above timestamp for that. |
Thinking about it more, the first-ever transaction may not be a reliable indicator of the system age for ephemeral systems that are not installed through Anaconda but from an image (e.g. Podman containers). So we may need a different strategy (for those). |
There's more systems that are not installed through Anaconda (the ARM version often gets installed from an image, virtual machines at cloud providers, etc) so I wouldn't special case it :) |
Thanks, that's a useful data point to have 😄 |
Just FTR, @james-antill suggested in a chat that one solution would also be keeping per-repo countme files but doing that in directories named after the repo ID only (not a hash). |
Just as FYI, here is the implementation in rpm-ostree that does not have this issue: https://github.com/coreos/rpm-ostree/blob/main/rust/src/countme/cookie.rs |
Is there any movement on this? What is the implementation like in DNF 5? |
I've just checked it, it's basically a clone of the dnf4 implementation.
We'll discuss it with leadership and the team in the following days and provide feedback soon. |
Ah, bug-for-bug compatibility. :) Am I possibly currently getting double-counts from people using both, or using e.g. GNOME Software + dnf5 in f39? |
If dnf4 and dnf5 both use a different repo "persistdir", then yep, we're likely double-counting already. This is really silly and needs to be fixed ASAP. Since I wrote that code (and still remember how it works, kinda), it just makes sense for me to have a closer look, then... So I'll do just that, assigning to myself now. |
Good news, I guess. I've just checked and dnf5 uses the same persistent directories ( |
TL;DR: A simple fix is underway. I'll be on PTO next week, so expect silence here until I'm back. Having thought about this more, we do need to continue tracking the countme timestamps ("cookie" files in However, what we do want to change is so that the timestamps aren't dependent on the I have a working (one-line) patch for that locally, as well as an updated The tricky part is to ensure that the cookie is not reset when the existing systems upgrade to the fixed libdnf version (once released). Since the directory name changes, libdnf would think that the system doesn't yet have a cookie file and thus 1) would start over, with age set to 1 ( To prevent that, the cookie file needs to stay the same when you upgrade libdnf to the fixed version, as well as if you decide to downgrade to the old version for some reason. The easiest solution to that seems to be the following:
This way, the same cookie file would be reused after upgrading to the new libdnf version as well as after downgrading it. What the scriptlet needs to decide, though, is which directory to choose for the symlink target if there are multiple - that can happen easily, such as if I think it should choose the one that corresponds to the running Fedora version, e.g. by looking at In fact, I also have a draft scriptlet locally which works as described above, we just need to decide on which repositories to "migrate". I'd think "fedora" and "updates" should suffice, but please let me know otherwise. So, that's for a status update. I've decided to dump my thoughts here because I'll be on vacation next week and might otherwise forget the details 😄 Any feedback is of course welcome in the meantime. Just know that I'll only be able to respond when I'm back. |
Actually use the system's installation time (if known) as the reference point, instead of the first-ever countme event recorded for the given repo. This is what the dnf.conf(5) man page has been saying all along, the code just never lived up to it. This fixes the following issues: 1. Systems that only reach out to the repos after an initial period of time after their installation appear "younger" than they really are. 2. Prebuilt OS images may include repo persistdirs with countme cookies in them that were created at build time, making all instances spawned from those images (physical machines, VMs or containers) appear much "older" than they really are. 3. System upgrades cause the bucket to be effectively reset to 1 due to the fact that a changed $releasever value causes a new persistdir to be created. Use the machine-id(5) file's mtime as the single source of truth. This file is typically tied to the system's installation or first boot where it's populated by an installer tool or init system, respectively, and is never changed afterwards. Keep the "relative" epoch (first countme event) as a fallback method, though. This is useful on those systems that don't have a machine-id file (such as OCI containers) but are still used long-term. In those cases, system upgrades aren't really a thing so the above point 3 does not apply. Some containers may also choose to bind-mount the machine-id file from the host (such as what toolbox(1) does), in which case their age will be the same as that of the host. Conveniently, that's also what we want, since the purpose of such containers is to blend with the host as much as possible. Fixes: rpm-software-management#1611
Actually use the system's installation time (if known) as the reference point, instead of the first-ever countme event recorded for the given repo. This is what the dnf.conf(5) man page has been saying all along, the code just never lived up to it. This fixes the following issues: 1. Systems that only reach out to the repos after an initial period of time after their installation appear "younger" than they really are. 2. Prebuilt OS images may include repo persistdirs with countme cookies in them that were created at build time, making all instances spawned from those images (physical machines, VMs or containers) appear much "older" than they really are. 3. System upgrades cause the bucket to be effectively reset to 1 due to the fact that a changed $releasever value causes a new persistdir to be created. Use the machine-id(5) file's mtime as the single source of truth. This file is typically tied to the system's installation or first boot where it's populated by an installer tool or init system, respectively, and is never changed afterwards. Keep the "relative" epoch (first countme event) as a fallback method, though. This is useful on those systems that don't have a machine-id file (such as OCI containers) but are still used long-term. In those cases, system upgrades aren't really a thing so the above point 3 does not apply. Some containers may also choose to bind-mount the machine-id file from the host (such as what toolbox(1) does), in which case their age will be the same as that of the host. Conveniently, that's also what we want, since the purpose of such containers is to blend with the host as much as possible. Fixes: rpm-software-management#1611
Actually use the system's installation time (if known) as the reference point, instead of the first-ever countme event recorded for the given repo. This is what the dnf.conf(5) man page has been saying all along, the code just never lived up to it. This fixes the following issues: 1. Systems that only reach out to the repos after an initial period of time after their installation appear "younger" than they really are. 2. Prebuilt OS images may include repo persistdirs with countme cookies in them that were created at build time, making all instances spawned from those images (physical machines, VMs or containers) appear much "older" than they really are. 3. System upgrades cause the bucket to be effectively reset to 1 due to the fact that a changed $releasever value causes a new persistdir to be created. Use the machine-id(5) file's mtime as a single source of truth. This file is typically tied to the system's installation or first boot where it's populated by an installer tool or init system, respectively, and is never changed afterwards. Keep the "relative" epoch (first countme event) as a fallback method, though. This is useful on those systems that don't have a machine-id file (such as OCI containers) but are still used long-term. In those cases, system upgrades aren't really a thing so the above point 3 does not apply. Some containers may also choose to bind-mount the machine-id file from the host (such as what toolbox(1) does), in which case their age will be the same as that of the host. Conveniently, that's also what we want, since the purpose of such containers is to blend with the host as much as possible. Fixes: rpm-software-management#1611
Actually use the system's installation time (if known) as the reference point, instead of the first-ever countme event recorded for the given repo. This is what the dnf.conf(5) man page has been saying all along, the code just never lived up to it. This fixes the following issues: 1. Systems that only reach out to the repos after an initial period of time after their installation appear "younger" than they really are. 2. Prebuilt OS images may include repo persistdirs with countme cookies in them that were created at build time, making all instances spawned from those images (physical machines, VMs or containers) appear much "older" than they really are. 3. System upgrades cause the bucket to be effectively reset to 1 due to the fact that a changed $releasever value causes a new persistdir to be created. Use the machine-id(5) file's mtime to infer the installation time. This file is typically tied to the system's installation or first boot where it's populated by an installer tool or init system, respectively, and is never changed afterwards. Keep the "relative" epoch (first countme event) as a fallback method, though. This is useful on those systems that don't have a machine-id file (such as OCI containers) but are still used long-term. In those cases, system upgrades aren't really a thing so the above point 3 does not apply. Some containers may also choose to bind-mount the machine-id file from the host (such as what toolbox(1) does), in which case their age will be the same as that of the host. Conveniently, that's also what we want, since the purpose of such containers is to blend with the host as much as possible. Fixes: rpm-software-management#1611
Actually use the system's installation time (if known) as the reference point, instead of the first-ever countme event recorded for the given repo. This is what the dnf.conf(5) man page always said about the countme option, the code just never lived up to that. This makes bucket calculation more accurate: 1. System upgrades will no longer reset the bucket to 1 (this used to be the case due to a new persistdir being created whenever $releasever changed). 2. Systems that only reach out to the repos after an initial time period after being installed will no longer appear younger than they really are. 3. Prebuilt OS images that happen to include countme cookies created at build time will no longer cause all the instances spawned from those images (physical machines, VMs or containers) to appear older than they really are. Use the machine-id(5) file's mtime to infer the installation time. This file is semantically tied to the system's lifetime since it's typically populated at installation time or during the first boot by an installer tool or init system, respectively, and remains unchanged. The fact that it's a well-defined file with clear semantics ensures that OS images won't accidentally include a prepopulated version of this file with a timestamp corresponding to the image build, unlike our own cookie files (see point 3 above). In some cases, such as in OCI containers without an init system running, the machine-id file may be missing or empty, even though the system is still used long-term. To cover those, keep the original, relative epoch as a fallback method. System upgrades aren't really a thing for such systems so the above point 1 doesn't apply here. Some containers, such as those created by toolbox(1), may also choose to bind-mount the host's machine-id file, thus falling into the same bucket as their host. Conveniently, that's what we want, since the purpose of such containers is to blend with the host as much as possible. Fixes: rpm-software-management#1611
Actually use the system's installation time (if known) as the reference point, instead of the first-ever countme event recorded for the given repo. This is what the dnf.conf(5) man page always said about the countme option, the code just never lived up to that. This makes bucket calculation more accurate: 1. System upgrades will no longer reset the bucket to 1 (this used to be the case due to a new persistdir being created whenever $releasever changed). 2. Systems that only reach out to the repos after an initial time period after being installed will no longer appear younger than they really are. 3. Prebuilt OS images that happen to include countme cookies created at build time will no longer cause all the instances spawned from those images (physical machines, VMs or containers) to appear older than they really are. Use the machine-id(5) file's mtime to infer the installation time. This file is semantically tied to the system's lifetime since it's typically populated at installation time or during the first boot by an installer tool or init system, respectively, and remains unchanged. The fact that it's a well-defined file with clear semantics ensures that OS images won't accidentally include a prepopulated version of this file with a timestamp corresponding to the image build, unlike our own cookie files (see point 3 above). In some cases, such as in OCI containers without an init system running, the machine-id file may be missing or empty, even though the system is still used long-term. To cover those, keep the original, relative epoch as a fallback method. System upgrades aren't really a thing for such systems so the above point 1 doesn't apply here. Some containers, such as those created by toolbox(1), may also choose to bind-mount the host's machine-id file, thus falling into the same bucket as their host. Conveniently, that's what we want, since the purpose of such containers is to blend with the host as much as possible. Fixes: rpm-software-management#1611
Actually use the system's installation time (if known) as the reference point, instead of the first-ever countme event recorded for the given repo. This is what the dnf.conf(5) man page always said about the countme option, the code just never lived up to that. This makes bucket calculation more accurate: 1. System upgrades will no longer reset the bucket to 1 (this used to be the case due to a new persistdir being created whenever $releasever changed). 2. Systems that only reach out to the repos after an initial time period after being installed will no longer appear younger than they really are. 3. Prebuilt OS images that happen to include countme cookies created at build time will no longer cause all the instances spawned from those images (physical machines, VMs or containers) to appear older than they really are. Use the machine-id(5) file's mtime to infer the installation time. This file is semantically tied to the system's lifetime since it's typically populated at installation time or during the first boot by an installer tool or init system, respectively, and remains unchanged. The fact that it's a well-defined file with clear semantics ensures that OS images won't accidentally include a prepopulated version of this file with a timestamp corresponding to the image build, unlike our own cookie files (see point 3 above). In some cases, such as in OCI containers without an init system running, the machine-id file may be missing or empty, even though the system is still used long-term. To cover those, keep the original, relative epoch as a fallback method. System upgrades aren't really a thing for such systems so the above point 1 doesn't apply here. Some containers, such as those created by toolbox(1), may also choose to bind-mount the host's machine-id file, thus falling into the same bucket as their host. Conveniently, that's what we want, since the purpose of such containers is to blend with the host as much as possible. Fixes: rpm-software-management#1611
Actually use the system's installation time (if known) as the reference point, instead of the first-ever countme event recorded for the given repo. This is what the dnf.conf(5) man page always said about the countme option, the code just never lived up to that. This makes bucket calculation more accurate: 1. System upgrades will no longer reset the bucket to 1 (this used to be the case due to a new persistdir being created whenever $releasever changed). 2. Systems that only reach out to the repos after an initial time period after being installed will no longer appear younger than they really are. 3. Prebuilt OS images that happen to include countme cookies created at build time will no longer cause all the instances spawned from those images (physical machines, VMs or containers) to appear older than they really are. Use the machine-id(5) file's mtime to infer the installation time. This file is semantically tied to the system's lifetime since it's typically populated at installation time or during the first boot by an installer tool or init system, respectively, and remains unchanged. The fact that it's a well-defined file with clear semantics ensures that OS images won't accidentally include a prepopulated version of this file with a timestamp corresponding to the image build, unlike our own cookie files (see point 3 above). In some cases, such as in OCI containers without an init system running, the machine-id file may be missing or empty, even though the system is still used long-term. To cover those, keep the original, relative epoch as a fallback method. System upgrades aren't really a thing for such systems so the above point 1 doesn't apply here. Some containers, such as those created by toolbox(1), may also choose to bind-mount the host's machine-id file, thus falling into the same bucket as their host. Conveniently, that's what we want, since the purpose of such containers is to blend with the host as much as possible. Fixes: rpm-software-management#1611
Not dnf, but there's the So, scratch my above ponderings about changing the persistdir naming scheme. Instead, I've submitted #1662 which switches age counting to the machine-id file's timestamp. Here's an updated BDD feature file which demonstrates the new logic (see the Examples table at the bottom of the Scenario Outline): https://github.com/rpm-software-management/ci-dnf-stack/blob/ab365d2bad19f69e188fb449fb6bcdd8834f5815/dnf-behave-tests/dnf/countme.feature#L44 |
Actually use the system's installation time (if known) as the reference point, instead of the first-ever countme event recorded for the given repo. This is what the dnf.conf(5) man page always said about the countme option, the code just never lived up to that. This makes bucket calculation more accurate: 1. System upgrades will no longer reset the bucket to 1 (this used to be the case due to a new persistdir being created whenever $releasever changed). 2. Systems that only reach out to the repos after an initial time period after being installed will no longer appear younger than they really are. 3. Prebuilt OS images that happen to include countme cookies created at build time will no longer cause all the instances spawned from those images (physical machines, VMs or containers) to appear older than they really are. Use the machine-id(5) file's mtime to infer the installation time. This file is semantically tied to the system's lifetime since it's typically populated at installation time or during the first boot by an installer tool or init system, respectively, and remains unchanged. The fact that it's a well-defined file with clear semantics ensures that OS images won't accidentally include a prepopulated version of this file with a timestamp corresponding to the image build, unlike our own cookie files (see point 3 above). In some cases, such as in OCI containers without an init system running, the machine-id file may be missing or empty, even though the system is still used long-term. To cover those, keep the original, relative epoch as a fallback method. System upgrades aren't really a thing for such systems so the above point 1 doesn't apply here. Some containers, such as those created by toolbox(1), may also choose to bind-mount the host's machine-id file, thus falling into the same bucket as their host. Conveniently, that's what we want, since the purpose of such containers is to blend with the host as much as possible. Fixes: #1611
Currently, when we compute the system's age bucket (1 through 4) to report in the weekly
countme
flag, we do that relative to the first-ever metadata refresh (called the epoch) of the respective repository. However, the original proposal intended that it would be the absolute age bucket, that is, since the installation.This is because we store the cookie files (containing the timestamps) in per-repository directories (
persistdir
) whose names contain hashes derived from various repository properties including thereleasever
value. That means, the system's age bucket is effectively reset on each Fedora system upgrade which is not what we want.To fix this, we should simply keep one single cookie file for the entire system and use that to determine the system's age bucket.
There's a second countme implementation in rpm-ostree (here's why) which reportedly does the right thing. Looking at the code, they do appear to store only one cookie file per system (at
/var/lib/rpm-ostree-countme/cookie
), as it should be. I think we should just do the same.To avoid skewing the metrics, the fix should probably include a check for an old, repo-specific cookie file and if it exists, it should load the values from it and then remove the file. When it comes to storing the new values at the end of the
addCountmeFlag()
function, that should already go into the system-wide cookie file. That way, systems that upgrade to the fixed DNF version would simply continue where they left off, instead of being reset to age 1. Note that this may need special care in case repositories are fetched in parallel.The text was updated successfully, but these errors were encountered: