Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Please version the ABI for C shared libraries #320

Open
musicinmybrain opened this issue Nov 15, 2024 · 10 comments
Open

Please version the ABI for C shared libraries #320

musicinmybrain opened this issue Nov 15, 2024 · 10 comments

Comments

@musicinmybrain
Copy link
Contributor

🤔 What's the problem you're trying to solve?

On Linux, system-wide shared libraries are expected to be versioned via SONAME.

✨ What's your proposed solution?

Set SOVERSION in c/CMakeLists.txt; this can be as simple as:

set_target_properties(gherkin PROPERTIES SOVERSION 1)

Commit to incrementing the value of SOVERSION each time there is an ABI-incompatible change.

⛏ Have you considered any alternatives or workarounds?

Downstream .so versioning is an option.

📚 Any additional context?

It looks like the C++ shared library already does this correctly:

set_target_properties(
cucumber_gherkin_lib
PROPERTIES
CXX_STANDARD 17
VERSION 0.1.0
SOVERSION 0.1
EXPORT_NAME gherkin
OUTPUT_NAME cucumber_gherkin
)

@mpkorstanje
Copy link
Contributor

mpkorstanje commented Nov 15, 2024

I don't think we can commit to only incrementing the SOVERSION on ABI-incompatible changes.

Would it be possible to use the version number from c/VERSION? It is incremented more often - the downside of having 13 languages in a repository that are released in lock step, is that for individual languages there are spurious releases.

@chybz, @jenisys, @ursfassler I'd welcome your opinion here too.

@musicinmybrain
Copy link
Contributor Author

I guess using VERSION (or just the major component of VERSION, if you don’t plan to ever break the ABI in minor or patch releases) is an option. Technically the minimum commitment is to always (rather than only) increment the SOVERSION on ABI-incompatible changes.

The VERSION approach would be better than not versioning the library at all or missing an SOVERSION bump that should have happened, but it’s not quite harmless: unnecessary ABI version bumps create extra downstream packaging work and can prevent otherwise-eligible updates from shipping in stable releases, especially considering that it looks like the C ABI has rarely changed so far.

What about the C++ ABI version in cpp/src/lib/gherkin/CMakeLists.txt, currently at 0.1? Is there a strategy for making sure that it’s updated when the C++ ABI changes?

@chybz
Copy link
Contributor

chybz commented Nov 15, 2024

While I'm totally in for having this automated and consistent across the entire repository, we can't just set VERSION and SOVERSION to the same values (please see here and here).

It's on my internal todo list of things to have this fixed, though, but we have two things here:

  • a version for the specification/language tools and libraries conform to
  • a "technical" version of shared libraries, mostly related to ABI and link/call/compile compatibility

@musicinmybrain The C++ ABI version is automatically updated (all CMakeLists.txt are generated files), based on the value in cpp/project.yaml.

My suggestion would be to have a repository wide VERSION that each individual language can refer to and adjust/set "internal" values to (such as SOVERSION for those who produce shared libraries). I can certainly do that easily for the C++ parts.

@mpkorstanje
Copy link
Contributor

mpkorstanje commented Nov 15, 2024

Would it be possible to have an automated check that detects ABI incompatible changes? If we can catch this during a PR it will be much easier to flag the need for an ABI version change.

Would it be correct to assume that when the ABI version changes that is the same as a major versIon change under Semver?

@chybz
Copy link
Contributor

chybz commented Nov 15, 2024

Sorry if you already know all of the following, but I need to ensure it was said.

TLDR - The gory details

(maybe better described here)

I don't think so and frankly it's even more (cu)cumbersome than that.
As an example, let's pretend we have a function int foo(int a) taking an integer argument and returning another integer.
In C and C++ (and other compiled languages in general), each int has a very specific storage which can differ from one platform/OS to the other.
For x86_64 you are fairly safe to assume a 4 bytes (32 bits) integer.
A single change to int foo(float a) would break the ABI (but still compile ok, though with warnings), because while float/int occupy the same storage space (4 bytes), they're not binary compatible (the B in ABI) and have different storage ordering and layout.
--> There are heaps of other cases (!)

Detecting ABI changes

Could be feasible but would be a vastly disproportionate effort and would certainly involve caching the previous build and calling every function or inspecting mangled symbols inside compiled code and look for potential changes. I doubt anyone sane enough would like to work on that.

Developer responsibility

It's been for ages, each developer involved in compiled languages producing shared objects (.so, .dll, .dylib, ...) know that changing function signatures and data types may have a great impact on backward compatibility (the ABI stuff).

Seen in the field

Some people choose to completely ignore the original numbers and their meaning (from the linked documents in my previous comment) and directly map Semver of the software to SOVERSION, and mostly don't care about ABI (in the sense of the linked document). If you take great care it can be viable, but with multiple languages this is going to be tough (what version X.Y.Z of the Cucumber specification has to do with ABI of the C and C++ implementations is unknown to me).
Some people use some combination of library name/version and SOVERSION, for example libcucumber_message14.so.2.7.1 to tell "this is Cucumber specification 1.4", current 2, revision 7, age 1.

My 2 cents

That libtool convention is less and less followed, so the simple Semver --> VERSION might do the trick and people be happy with it. I used to enforce the ("more correct" at the time [last century]) libtool scheme but nobody seem to care really much these days.

If some contributors with an idea on this could share it, that'd be cool.

@mpkorstanje
Copy link
Contributor

mpkorstanje commented Nov 15, 2024

Thanks for the explanation. I know just enough C.* to shoot myself in the foot, reliably.

The challenge that we (the maintainers of this repository) have to solve, is accepting and releasing contributions for thirteen Gherkin implementations with enough responsiveness so as to not discourage contributions.

With the added constraint that nobody can know the intricacies of thirteen different languages, frameworks, distribution systems, ect. And while I'm happy that I can currently ask for advice, this too varies over time and per language as people's interests and availability waxes and wanes.

So the current process is optimized to reduce the time and knowledge demands on the maintainers. It doesn't produce the most optimal results, far from it, but it keeps things going.

Currently we ask contributors about the type of their changes. Which seems to work well enough to keep us from breaking Semver for the different languages. I reckon we can update that to include some words about breaking an ABI. Then that will trigger a new major release.

I'll leave some time for others to comment but I think I can summarize the current consensus as somewhat favourable towards using the release version. "better to have something than nothing".

@musicinmybrain
Copy link
Contributor Author

Detecting ABI changes

Could be feasible but would be a vastly disproportionate effort and would certainly involve caching the previous build and calling every function or inspecting mangled symbols inside compiled code and look for potential changes. I doubt anyone sane enough would like to work on that.

There are tools that can help with this. In Fedora, we tend to use abidiff from libabigail, usually via abipkgdiff fedabipkgdiff.

Interpreting the results still takes a bit of knowledge, especially for C++ where the rules for ABI compatibility are a lot messier than in C, but it’s a lot better than just trying to audit source code for ABI changes.

@chybz
Copy link
Contributor

chybz commented Nov 15, 2024

Yes, some helpers exist here and there, but no good cross platform support.
My point wasn't to confuse anyone but to explain that a Semver change might not change ABI and vice-versa, and that ABI only matters on compiled deliverables (such as OS packages) since only the one compiling software has the exact knowledge of platform and compiler flags that may indeed modify the ABI.

And keep in mind that a small change in C or C++ could break the ABI and thus require a Semver change..

So if you ask me, we should not complicate things with ABI (the old way) and use the project Semver X.Y.Z and use that for SOVERSION (that's what CMake does).

So if there's a global VERSION file somewhere, I can use that in the C++ part and maybe in the C part, if the maintainer is busy elsewhere.

What do you think ?

@mpkorstanje
Copy link
Contributor

So if there's a global VERSION file somewhere, I can use that in the C++ part and maybe in the C part, if the maintainer is busy elsewhere.

The file you can use is cpp/VERSION, same as c/VERSION. The release process will update it once I add it to polyglot-release.

@chybz
Copy link
Contributor

chybz commented Nov 15, 2024

cool. I'll try to make a PR tomorrow (it's late here). thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants