Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stop using Xerces-C #99

Open
ryandesign opened this issue Nov 14, 2024 · 7 comments
Open

Stop using Xerces-C #99

ryandesign opened this issue Nov 14, 2024 · 7 comments

Comments

@ryandesign
Copy link

Enigma requires Xerces-C:

checking for current Xerces release 3.0... not found
checking for old Xerces release >=2.4... configure: error: Xerces >= 2.4 not found.

​The developer of Xerces-C gives notice that it should not be used anymore.

Can Enigma switch to a different XML library that is still maintained, such as libxml2 or expat?

@sidney
Copy link
Contributor

sidney commented Nov 14, 2024

I agree. From my standpoint of trying to keep enigma building on MacOS, Xerces-C has always been a problem that keeps getting worse.

I propose libxml2 rather than expat. In searching for discussions in open source projects choosing between the two, the only ones I found leaning towards expat were quite old. The only more recent discussion I found (2022) lays out reasons for libxml2 in detail here: https://lists.apache.org/thread/r8gvbsxr22go29z0z8tzbpj56f5qxfb1

In particular, this quote: "expat has been difficult to upgrade to newer versions due to compiler issues. Since expat only implements a small subset of libxml2's features, could we replace it with libxml2?"

In addition, this link is an automated comparison of the two libraries https://cpp.libhunt.com/compare-expat-vs-libxml2
It shows that libxml2 has much higher popularity and usage, that expat was last updated 7 years ago, while libxml2 is still under active development.

@ryandesign
Copy link
Author

ryandesign commented Nov 14, 2024

expat was last updated 7 years ago

I'm not familiar with Enigma's needs in an XML library nor how the features of the various libraries compare, so I can't suggest which library would be best, however I would like to refute this statement. Like libxml2, expat remains under active development; there have been five releases this year. See http://www.libexpat.org.

The comparison page you linked to appears to be getting its data from expat's old home on SourceForge. It moved to GitHub since then.

I'm the maintainer of both of these ports in MacPorts and they compile fine on the past fifteen years' worth of macOS versions for which we have automated build machines. Xerces-C also builds fine in MacPorts on these macOS versions.

There's nothing wrong with choosing libxml2, I just want to make sure you're not discounting expat for the wrong reasons. I'm sure there are many other libraries besides just these three in fact. But choosing one that is popular is probably a good safeguard against it becoming unmaintained.

One point in libxml2's favor might be that at the top of expat's readme it says:

Caution

Expat is understaffed and without funding. There is a call for help with details at the top of the Changes file.

On the other hand, libxml2 has raised over $10,500 on opencollective so they seem to have a source of funding.

@sidney
Copy link
Contributor

sidney commented Nov 15, 2024

Thanks for the correction. Note that the link to libexpat you gave is incorrect, as the host name www.libexapt.org is a cname to sourceforge that doesn't work and there is no DNS host record for libexpat.org. I found their repo on https://github.com/libexpat/libexpat

So expat is not inactive, but that call for help is concerning, as is the fact that their libexpat.org domain has been allowed to languish. If migrating to libxml2 is not significantly harder than migrating to expat, then I would still be in favor of libxml2. Those do seem to be the two most popular and safe choices.

@ryandesign
Copy link
Author

http://www.libexpat.org works fine for me. It redirects to https://libexpat.github.io. Nothing has languished.

I inadvertently used an https URL in my comment originally, which doesn't work; I've corrected it.

@alochmann
Copy link
Contributor

Hi, and many thanks for that link and notice and your suggestions!

Daniel and I needed years to port Enigma from SDL 1 to 2. I needed months to switch from gettext to tinygettext, and two more years to get rid of zipios. Xerces is used in so, so many different files and methods of Enigma ... switching to another xml lib ... I don't know if I could do that at all. I don't know if I would have the strength to do it! The knowledge I certainly don't have, and would need to learn this from the beginning, just like with SDL, gettext and the zip file format. And it would not lead Enigma forward.

Sidney, I really, really value your work and opinion! Are you sure this is the way to go? Xerces is part of the Apache Software Foundation ... they sure have someone to take over a library so central to their core, right? libxml2 is active now, yes, but will it still be active in 10 years time?

@sidney
Copy link
Contributor

sidney commented Dec 9, 2024

I've digged in deeper to the issue that Ryan linked to, browsed the commit history and mailing lists of Xerces-C, and looked at what I could find about the libxml2 project. I've changed my mind about what I think is going on and what we should do.

ASF projects are all independent, each run by the members of its Project Management Committee (PMC), though they are required to follow ASF guidelines. ASF provides resources such as those guidelines, technical infrastructure, and some oversight that is almost always hands-off. The Xerces-C project will stand or fall on their own actions. ASF will not appoint someone to replace an essential developer who chooses to stop being active. The project, which means the other existing Xerces-C developers, has to find people to step up. After reading all the comments in that issue and looking at the commit history of the project, I have some conclusions. First, though, I should say that Ryan has more experience with all three XML libraries on macOS than any of us, and his opinions should have weight.

Xerces-C has had one person on the project doing almost all the commit activity for the past few years. Last year he announced that he expected to stop doing that. In this more recent issue he said he no longer does anything on Mac and he also said he expects to be gone from project development around mid-2025. However, to counter his doom and gloom prediction, another maintainer on the project jumped in to say that he is willing and able to do what is necessary to make sure the project keeps going. In addition, this maintainer works for a company as the developer of an open source product that uses Xerces-C. Their web site advertises a long customer list for that product. This makes it quite likely that the project will stay alive. The doom and gloom developer seems concerned that there are undiscovered security problems, but that seems to be just fears of new exploits to be found with processing of untrusted input. That is not a problem for people who use Xerces-C to parse their own generated files. Am I correct that is the only use case for the library in Enigma, there is nothing that tries to handle untrusted XML input?

Any project that has only one active developer is fragile. But any project only needs one developer at a time to be active if there isn't much activity needed. XML itself is not changing, having been pretty much replaced in future spec development by JSON and YAML. For our purposes, it would be fine if the only development that happens in Xerces-C is whatever it needs to keep building on newer versions of OS and compiler runtimes, and any important security updates. That appears all that has been done to it for the past few years.

As for libxml2, having much more development activity is a mixed bag. Yes, bugs are more likely to be fixed, but also there is a higher likelihood that new bugs will be created, or that the project will come up with an API breaking change that we would have to deal with. Also, I don't know how much connection the libxml2 project has with the GNOME project. It started with them and is under their umbrella, but I don't know what that means from a development cultural perspective. The GNOME project has a reputation for making development decisions that cause breaking changes, removing functionality that users want, and making excuses instead of fixing problems that are pointed out to them. I haven't seen the same things said about libxml2, but I would not want to find out the hard way in a few years.

So judgement call: I'm not worried about the imminent demise of Xerces-C. If someone has time, it might be a good idea to start writing a layer for XML processing so that we end up with just one file of well-specified functions that are implemented using Xerces-C and can have a thorough test suite. Then someone can come up with an implementation that uses libxml2 and one for expat. But none of that seems urgent to me.

@ryandesign
Copy link
Author

I agree with most of @sidney's summary and I'll add a little more:

I should say that Ryan has more experience with all three XML libraries on macOS than any of us, and his opinions should have weight.

You give me too much credit. I have nearly no experience with Xerces-C. I maintain libxml2 and expat for MacPorts so I have experience with their build systems and with fielding bug reports about build failures caused by updates of those libraries. I don't recall any expat updates causing build failures in other ports but it has definitely happened after libxml2 updates.

Here's my side of the story. Xerces-C 3.3.0 was recently released and someone committed the update to MacPorts, since the maintainers of the port, one of whom is the developer of Xerces-C, had not done so. Because the soname of the library changed (by its major version changing from 3.2 to 3.3), all software that linked with the library had to be rebuilt. Many of those rebuilds failed. I investigated and found the problem (the developer forgot to change 3_2 to 3_3 in one source file), and I reported it to the developer along with the fix, which I applied to MacPorts, and I re-scheduled the failed MacPorts builds. Prior to this issue, I had not paid attention to or involved myself with Xerces-C.

The developer's response was that the problem had already been reported (a month prior but it had not been fixed until after I filed my bug) and to ask me to remove all software from MacPorts that depended on Xerces-C. That seemed like a severe overreaction to me, so instead I created a ticket in MacPorts, assigned to the maintainers of all ports that use Xerces-C, asking them to investigate the feasibility of no longer using it in those ports. I myself investigated those ports that had no designated maintainer, including Enigma. When I found that Enigma's use of Xerces-C was not optional but required, I filed this issue here.

Xerces-C has no CI system which might have caught this regression before the release shipped. I offered to set up GitHub Actions for them, but he didn't want it unless I agreed to stick around to maintain it forever, which I won't do. As a manager of MacPorts, I loosely oversee tens of thousands of packages, not to mention our infrastructure, and I don't have the bandwidth to be that closely involved in individual projects. I suggested that the build system could be changed to insert the correct version number into this file at build time; this suggestion was ignored.

The developer of Xerces-C believes he has clearly communicated the demise of Xerces-C for the past five years and that anyone unaware of the situation hasn't paid attention. I countered that there is no evidence on the repository or web site of his position on this issue. He explained that he is not allowed to make such changes without approval from the Apache Project Management Committee and that they have not so far given such approval. I suggested he should notify projects that use Xerces-C that they should use something else, and he said he doesn't consider that to be his responsibility. So hopefully the MacPorts ticket on the matter will get some traction and the port maintainers will begin a dialog with the developers of those ports about moving away from Xerces-C or at least making sure they are aware of the situation.

As for libxml2, having much more development activity is a mixed bag. Yes, bugs are more likely to be fixed, but also there is a higher likelihood that new bugs will be created, or that the project will come up with an API breaking change that we would have to deal with.

Some particular pain points with recent libxml2 updates have been:

https://gitlab.gnome.org/GNOME/libxml2/-/issues/622 (constness of some parameters has changed)

https://gitlab.gnome.org/GNOME/libxml2/-/issues/642 (some headers no longer include other headers)

https://gitlab.gnome.org/GNOME/libxml2/-/issues/751 (breaking changes were made without changing the soname)

If someone has time, it might be a good idea to start writing a layer for XML processing so that we end up with just one file of well-specified functions that are implemented using Xerces-C and can have a thorough test suite. Then someone can come up with an implementation that uses libxml2 and one for expat.

Writing an XML library abstraction layer seems like a lot of unnecessary work. I think picking one XML library, be it Xerces-C or something else, and using that directly is perfectly reasonable.

That is not a problem for people who use Xerces-C to parse their own generated files. Am I correct that is the only use case for the library in Enigma, there is nothing that tries to handle untrusted XML input?

XML itself is not changing, having been pretty much replaced in future spec development by JSON and YAML.

If Enigma only uses Xerces-C to process its own XML files, and if people now use JSON or YAML instead of XML, is it conceivable that Enigma might be changed to use a JSON or YAML library and files instead of XML?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants