Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support repo's Appstream data download and install #1844

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

mcrha
Copy link
Contributor

@mcrha mcrha commented Nov 8, 2024

Repositories can provide Appstream data for the packages they contain. This Appstream data is consumed by applications like gnome-software or KDE Discover, thus the users can see the packages (apps) in them.

This is to be in pair with PackageKit, which does download and install the repo's Appstream data.

As this adds a dependency on the appstream library, there is also a CMake option WITH_APPSTREAM to be able to turn the support off. The support is enabled by default.


That is, the /var/cache/libdnf5/$REPO_NAME/repodata/ directory can have downloaded also files with appstream in their name and, when installed, the data is stored under /var/cache/swcatalog/xml/.

CC @m-blaha @jan-kolarik

@mcrha mcrha force-pushed the wip/appstream-support branch from c067fd1 to a261aa3 Compare November 8, 2024 11:46
@mcrha
Copy link
Contributor Author

mcrha commented Nov 8, 2024

Hrm, the CI either needs to add appstream-devel (or better pkgconfig(appstream)>=1.0) dependency or turn off build of these bits with -DWITH_APPSTREAM=OFF to the cmake command. No idea how to do it, but if you could guide me I'll change it either way.

libdnf5/repo/repo.cpp Outdated Show resolved Hide resolved
@jan-kolarik jan-kolarik self-requested a review November 13, 2024 10:27
@mcrha mcrha force-pushed the wip/appstream-support branch from a261aa3 to 10f8055 Compare November 13, 2024 12:32
@Conan-Kudo
Copy link
Member

We do not want AppStream data being downloaded by default, as it's quite huge, so only callers that can do something with that data should request it.

@mcrha
Copy link
Contributor Author

mcrha commented Nov 13, 2024

We do not want AppStream data being downloaded by default, as it's quite huge, so only callers that can do something with that data should request it.

It's not that you (the caller) only can work with it, the Appstream data is installed system wide (to /var/cache/swcatalog/...), where any Appstream-capable app can use it. Preparing it beforehand makes sense, no?

Or do you mean, in the "we" distro, when there's no gnome-software nor KDE Discover, there's also no PacakgeKit (which downloads and installs the appstream data unconditionally)? Note these are only the GUI apps I know of, which consume the appstream data. The appstreamcli reads it too, to name one on the command line.

@mcrha
Copy link
Contributor Author

mcrha commented Nov 13, 2024

Hrm, the CI...

I looked around, and please correct me if I'm wrong, because I can be wrong, it seems to me the CI is located in a separate project, in the https://github.com/rpm-software-management/ci-dnf-stack . I looked briefly into it, but I realized it's a blackbox for me, I even do not see the build parameters to be applied there. I'm sorry.

@Conan-Kudo
Copy link
Member

We do not want AppStream data being downloaded by default, as it's quite huge, so only callers that can do something with that data should request it.

It's not that you (the caller) only can work with it, the Appstream data is installed system wide (to /var/cache/swcatalog/...), where any Appstream-capable app can use it. Preparing it beforehand makes sense, no?

Or do you mean, in the "we" distro, when there's no gnome-software nor KDE Discover, there's also no PacakgeKit (which downloads and installs the appstream data unconditionally)? Note these are only the GUI apps I know of, which consume the appstream data. The appstreamcli reads it too, to name one on the command line.

Right, If PackageKit-DNF5 or GNOME Software through dnf5daemon want it, they can ask for it to be downloaded. But the regular dnf5 and dnf5daemon CLI can't do anything with it, so it's not useful to download.

@mcrha mcrha force-pushed the wip/appstream-support branch from 10f8055 to bdaa04d Compare November 13, 2024 13:36
@mcrha
Copy link
Contributor Author

mcrha commented Nov 13, 2024

While I agree, I also do not think it would be helpful. It's the same as if you would want from the PackageKit to not download and install the appstream data from the repository metadata - it cannot be done. This is better than PackageKit, because with dnf and PackageKit on one machine you've duplicated the repo data in the local cache, which is bad (no PackageKit => half repo data stored on the machine).

Could you point me to a repo, which provides that large appstream data, please? I've been in an impression that the distros provide its appstream data in certain packages, not as the repo metadata. Fedora has appstream-data package, RPMFusion has rpmfusion-*-appstream-data (rpmfusion-free-appstream-data and so on). I do not know what package OpenSUSE uses, but for example the GNOME repo there does not provide appstream data as the repo metadata, while that repository contains GUI apps, which do provide their .metainfo.xml files. For example RPMFusion's Steam repo provides the Appstream data, see it here.

Is it possible we are talking about different things?

@mcrha
Copy link
Contributor Author

mcrha commented Nov 13, 2024

If the download & install of the repo's appstream metadata is a per-session option, then the refresh of the repository needs to run differently, because when something updates the repo without the appstream data and then another app will need the appstream data, then a simple "is repo up-to-date" check is not simple anymore, one would need two checks, one for the repo and one for the appstream data. Handling of a half-updated repo sounds complicated on its own, the more for the maintenance of the code.

@kontura
Copy link
Contributor

kontura commented Nov 14, 2024

I agree with @Conan-Kudo, I really don't think we should download it by default.

If the download & install of the repo's appstream metadata is a per-session option, then the refresh of the repository needs to run differently, because when something updates the repo without the appstream data and then another app will need the appstream data, then a simple "is repo up-to-date" check is not simple anymore, one would need two checks, one for the repo and one for the appstream data. Handling of a half-updated repo sounds complicated on its own, the more for the maintenance of the code.

I don't think this would be a problem, by default we already download just primary, comps and updateinfo everything else is downloaded only when needed. Typically in different sessions (different dnf invocations) like changelogs (other) or filelists.

In my option this could fit well into the optional_metadata_types config option, then any client can configure it. It could even be set by a distribution in dnf config file.

@kontura
Copy link
Contributor

kontura commented Nov 14, 2024

Hrm, the CI...

I looked around, and please correct me if I'm wrong, because I can be wrong, it seems to me the CI is located in a separate project, in the https://github.com/rpm-software-management/ci-dnf-stack . I looked briefly into it, but I realized it's a blackbox for me, I even do not see the build parameters to be applied there. I'm sorry.

Yes, it is located in ci-dnf-stack but the build is done according to the dnf5.spec file in this repository. You would need to add the dependency there.

Though I think we should discuss if it should be a dependency of libdnf, that is if the install should be done by libdnf. This is related to the other comments but since libdnf is not using the metadata it would make more sense to me if the install was done by the client that requested it.

@mcrha
Copy link
Contributor Author

mcrha commented Nov 14, 2024

This is related to the other comments but since libdnf is not using the metadata it would make more sense to me if the install was done by the client that requested it.

Okay.

In that case I'd need access to the metadata itself from the client (through the dnf5daemon). Either to the repomd.xml filenames for each repository (doing N D-Bus calls, one for each of the N configured and enabled repos, sounds boring and waste of time and resources to me, thus a single "give me repomd filenames for each configured & enabled repo" would be better), with another API to get URL of the repo metadata (which cannot be done in bulk then), to avoid code duplication and avoid expectations which can change in the future, like about where the files are stored and such, which the libdnf knows, but the client having only the repomd.xml does not. Then the client would be able to download the data from the provided URL, though you do more than just the download, you also verify the content, right?

Alternatively, a new API to list provided metadata by respective repo, then a new API to download chosen types, which would result in a list of full path names for the places where it had been downloaded. Ideally allow download in bulk, a list of the types to be downloaded resulting in a list of the filenames.

Alternatively, anything you think would be better and would work for you.

Let me know which it is, please, and I can try to update the merge request accordingly, or pass it to someone whom knows the internals of the dnf5 better than me (I know basically nothing). Thanks in advance.

@kontura
Copy link
Contributor

kontura commented Nov 14, 2024

Alternatively, anything you think would be better and would work for you.

I was thinking the client would just add something like "appstream" to the optional_metadata_types config option before metadata sync, libdnf would download/sync all the requested metadata as usual and then the client would call libdnf5::repo::Repo::get_metadata_path("appstream") or some daemon equivalent to get the actual file path(s).

This would probably needed additional work to handle the fact that its multiple files with unknown type with prefix appstream/appdata.

This is just my idea, I will bring it up on our team meeting today.

@mcrha
Copy link
Contributor Author

mcrha commented Nov 14, 2024

That would work for me. It's close to the second suggestion, but not exactly the same.

@mcrha
Copy link
Contributor Author

mcrha commented Nov 14, 2024

This would probably needed additional work to handle the fact that its multiple files with unknown type with prefix appstream/appdata.

Maybe, to not need from the libdnf to decide whether it's prefix/suffix/wildcard/regex/..., to just list downloaded (or all) metadata types from the repomd.xml file, which the caller can "filter" as it needs to and then it'll ask for the paths to those interesting to it.

@mcrha
Copy link
Contributor Author

mcrha commented Nov 21, 2024

I realized the list of the downloaded appstream data is useless, because the client usually runs in the user space and cannot write to the /var/cache/..., that can be done only by a privilege processes. Saving the data per-user would be a waste of disk space.

Nonetheless, I went with the config approach. The client can ask to use the repo's appstream data by the optional_metadata_types config key (named appstream also for the appdata* types) and it'll be downloaded & installed only if requested, not by default. The appstream support can be still disabled in the configure time of the project, of course.

The just added commit is not complete, it doesn't work, I'll be looking on it. It seems like I cannot just pick "everything appstream*/appdata*" easily (the added change is about that, see the commit message), thus I might end up with a list of the possible types to be read, which is sad.

@mcrha mcrha changed the title Support repo's Appstream data download and install Draft: Support repo's Appstream data download and install Nov 21, 2024
@Conan-Kudo
Copy link
Member

At least this will make the code simpler for the PackageKit DNF5 backend too, I don't have to manually download the appstream data. 😄

@mcrha mcrha force-pushed the wip/appstream-support branch from 4d87555 to 86d6203 Compare November 21, 2024 12:40
@Conan-Kudo Conan-Kudo marked this pull request as draft November 21, 2024 12:40
@Conan-Kudo Conan-Kudo changed the title Draft: Support repo's Appstream data download and install Support repo's Appstream data download and install Nov 21, 2024
@mcrha mcrha force-pushed the wip/appstream-support branch from 86d6203 to 1f64c2d Compare November 21, 2024 12:48
@mcrha
Copy link
Contributor Author

mcrha commented Nov 21, 2024

I've another go on this. This time:

  • the appstream data is downloaded only if the client has set in the optional_metadata_types also appstream
  • there is a hard-coded list of the expected appstream-related types; I would read it from the repomd.xml, but that file content is not available in the RepoDownloader::init_local_handle() and when it's created in the RepoDownloader::load_local(), it's too late to add it into the LRO_YUMDLIST (yeah, I tried to re-run the result, but then it fails with an error about missing appstream data or what). It's a pita to have the hard coded list of the types, I hoped this can do better than PackageKit, but... I do not know how to do it.

I tested that this works as expected also for a case when another client reads the repository data without the appstream (virtual) type first. The client with this type causes load (and install) of this data even if the repo did not change, which is great.

P.S.: I do not see in this confusing GitHub interface how to set a Draft flag on this merge request; could anyone unset it, please?

@m-blaha m-blaha marked this pull request as ready for review November 21, 2024 13:04
@m-blaha
Copy link
Member

m-blaha commented Nov 21, 2024

P.S.: I do not see in this confusing GitHub interface how to set a Draft flag on this merge request; could anyone unset it, please?

There is "Ready for review" button down under the checks. I've pressed it :)

@mcrha mcrha force-pushed the wip/appstream-support branch from 1f64c2d to c46d964 Compare November 21, 2024 13:08
@mcrha
Copy link
Contributor Author

mcrha commented Nov 21, 2024

There is "Ready for review" button down under the checks. I've pressed it :)

Hmm, I guess there might be another button to let me move it to draft again (which I do not want to), but I do not see it there. Either it does not look like a button, or it's named somehow oddly, or I'm stupid. I'm definitely not blind (yet) ;)

@Conan-Kudo
Copy link
Member

The move it to draft link is at the top right under the list of Reviewers.

@mcrha
Copy link
Contributor Author

mcrha commented Nov 21, 2024

The move it to draft link is at the top right under the list of Reviewers.

I see, a tiny grey link, on a very different place than the other button. Confusing from the start. Anyway, thanks for the pointers.

libdnf5/repo/repo.cpp Outdated Show resolved Hide resolved
libdnf5/repo/repo.cpp Outdated Show resolved Hide resolved
libdnf5/repo/repo_downloader.cpp Outdated Show resolved Hide resolved
libdnf5/repo/repo_downloader.cpp Outdated Show resolved Hide resolved
libdnf5/repo/repo_downloader.cpp Outdated Show resolved Hide resolved
@mcrha mcrha force-pushed the wip/appstream-support branch 3 times, most recently from 12f0f72 to 6fb904c Compare November 28, 2024 08:55
@jan-kolarik
Copy link
Member

The only last thing I am thinking about is the appstream dependency. Because IIUC this is something not needed for pure console users and it could bloat the minimal environments with unnecessary binary. We could consider turning this into plugin, need to discuss this with team next week.

@ppisar
Copy link
Contributor

ppisar commented Dec 2, 2024

These are the dependencies of appstream library:

# pkgconf --libs appstream
-lappstream -lgio-2.0 -lgobject-2.0 -lglib-2.0
# scanelf -n /usr/lib64/libappstream.so
 TYPE   NEEDED FILE 
ET_DYN libglib-2.0.so.0,libgobject-2.0.so.0,libcurl.so.4,libxmlb.so.2,libgio-2.0.so.0,libxml2.so.2,libyaml-0.so.2,libsystemd.so.0,libzstd.so.1,libstemmer.so.0,libgcc_s.so.1,libc.so.6 /usr/lib64/libappstream.so

@mcrha
Copy link
Contributor Author

mcrha commented Dec 2, 2024

The libxmlb.so.2 libstemmer.so.0 libappstream.so are the new dependencies for sure. Whether also libyaml-0.so and libzstd.so.1 I do not know. The others are "always there", I believe.

rpm -qi appstream libstemmer libxmlb | grep Size

says 3678623 + 614269 + 295603 = ~4.376MB . Hrm, that can be considered a lot for the installation media.

@mcrha
Copy link
Contributor Author

mcrha commented Dec 5, 2024

Would it be acceptable if I add a new method to the IPlugin, something like:

    /// The repo_loaded hook.
    /// It is called when a single repository had been loaded (in Impl).
    virtual void repo_loaded(Repo * repo);

which I'd call in the place where repo->install_appstream() is called in this merge request?

There is that IPlugin::repos_loaded(), but seeing how many things are considered before the repo is added into the repos_for_processing, I'd prefer to avoid the loop and code duplication and I'd just move the body of the Repo::install_appstream() into the new plugin, using this new IPlugin method.

The alternative of traversing all repos (enabled/disabled/broken/...) and check whether the cache has the appstream data and install it if exists is also possible and I can go by that route as the starter, it only feels performance heavy on the first look.

@@ -4,3 +4,7 @@ set(CMAKE_C_VISIBILITY_PRESET hidden)
add_subdirectory("actions")
add_subdirectory("python_plugins_loader")
add_subdirectory("rhsm")

if(WITH_APPSTREAM)
add_subdirectory("appstream")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It works, the Appstream data is installed as expected, with "one little caveat": when a session ends (on the dnf5daemon-server side), you also destroy the plugin class (which is fine), but you also unload the module (.so file). It breaks glib internals, because when another session opens and loads the plugin and calls the appstream library function, the glib aborts the dnf5daemon-server with:

(process:49835): GLib-GObject-CRITICAL **: 15:47:52.469: cannot register existing type 'AsMetadata'

(process:49835): GLib-GObject-CRITICAL **: 15:47:52.469: cannot add private field to invalid (non-instantiatable) type '<invalid>'

(process:49835): GLib-CRITICAL **: 15:47:52.469: g_once_init_leave_pointer: assertion 'result != 0' failed

(process:49835): GLib-GObject-CRITICAL **: 15:47:52.469: g_object_new_with_properties: assertion 'G_TYPE_IS_OBJECT (object_type)' failed
Segmentation fault (core dumped)

This does not exhibit when there's at least one session always loaded, which keeps the plugin module loaded in the memory, thus the glib type system is left in tact.

Is the plugin module unload really needed? I guess it can break more than just glib, it can break anything what stores some global data.

Copy link
Member

@jan-kolarik jan-kolarik Dec 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. So this issue is related to the code that loads and unloads shared libraries here.

Could we simply solve this by passing RTLD_NODELETE during the library's dlopen to prevent unloading the libraries for the duration of the daemon's lifetime? Alternatively, should this be conditional for specific plugins only? Another approach could be to implement custom logic to unload libraries when the daemon terminates. @m-blaha @jrohel, could you please share your thoughts on this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know I'm not asked here, but if I can add my opinion, then I believe the RTLD_NODELETE is a good thing to add. As the plugins are loaded only inside the deamon, they are freed together with the deamon too. Not within the session object the deamon creates, which is good and intentional.

I gave it a try and it works as expected, no more crash of the dnf5-daemon process and no runtime warnings either. See the added (second) commit. I cleaned up the commits too, the merge request is ready for the final review now.

@mcrha mcrha force-pushed the wip/appstream-support branch from ad24ee0 to a575c38 Compare December 5, 2024 15:02
dnf5.spec Outdated Show resolved Hide resolved
CMakeLists.txt Outdated Show resolved Hide resolved
CMakeLists.txt Outdated Show resolved Hide resolved
@jan-kolarik jan-kolarik self-assigned this Dec 19, 2024
@mcrha mcrha force-pushed the wip/appstream-support branch from a575c38 to 7bbd7c5 Compare January 7, 2025 16:58
mcrha added 2 commits January 7, 2025 18:45
Repositories can provide Appstream data for the packages they contain.
This Appstream data is consumed by applications like the GNOME Software
or the KDE Discover, thus the users can see the packages (apps) in them.

This is to be in pair with PackageKit, which does download and install
the repo's Appstream data.

The plugin is built by default, but the Appstream data is not downloaded
unless "optional_metadata_types" config option contains "appstream".

The plugin adds a dependency on the `appstream` library. It can be disabled
with a WITH_PLUGIN_APPSTREAM CMake option set to OFF.

Closes rpm-software-management#1564
Some plugins can use libraries, which have saved global data and unloading
such library can break this global data in a way which can cause a crash
of the daemon. One such library is glib, where this unload can break its
type system, with this shown on the dnf5daemon process console when it happens:

```
    (process:49835): GLib-GObject-CRITICAL **: 15:47:52.469: cannot register existing type 'AsMetadata'

    (process:49835): GLib-GObject-CRITICAL **: 15:47:52.469: cannot add private field to invalid (non-instantiatable) type '<invalid>'

    (process:49835): GLib-CRITICAL **: 15:47:52.469: g_once_init_leave_pointer: assertion 'result != 0' failed

    (process:49835): GLib-GObject-CRITICAL **: 15:47:52.469: g_object_new_with_properties: assertion 'G_TYPE_IS_OBJECT (object_type)' failed
    Segmentation fault (core dumped)
```
@mcrha mcrha force-pushed the wip/appstream-support branch from 7bbd7c5 to cde45bc Compare January 7, 2025 17:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants