Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tool to compare public API surface with Lucene #1022

Open
1 task done
paulirwin opened this issue Nov 15, 2024 · 3 comments
Open
1 task done

Tool to compare public API surface with Lucene #1022

paulirwin opened this issue Nov 15, 2024 · 3 comments
Assignees
Labels
is:task A chore to be done pri:normal

Comments

@paulirwin
Copy link
Contributor

Is there an existing issue for this?

  • I have searched the existing issues

Task description

We should create a tool that compares the public API surface of Lucene (at a specified version) to Lucene.NET. See #1018 (comment) for context.

ChatGPT suggested comparing the metadata generated by Docfx and javadoc. Another alternative might be creating a Java tool to export JSON or XML of the public API surface via reflection, and then create a .NET tool that compares that via .NET reflection to Lucene.NET's assemblies.

This will require mapping Java naming conventions to .NET, amongst other challenges. We'd likely need the ability to create a manual mapping/exclusions file to handle discrepancies. But this will help us confirm the public API of Lucene.NET matches Lucene, as well as aid future porting efforts.

@paulirwin paulirwin added the is:task A chore to be done label Nov 15, 2024
@paulirwin paulirwin added this to the 4.8.0-beta00018 milestone Nov 15, 2024
@paulirwin paulirwin mentioned this issue Nov 15, 2024
4 tasks
@paulirwin paulirwin self-assigned this Nov 17, 2024
@paulirwin
Copy link
Contributor Author

I wanted to provide an update on this. I've been experimenting with creating a Java tool called lucene-api-extractor, which will live in the Lucene.NET repo, that downloads the specified Lucene jars you wish to extract, loads them and reflects over them, and outputs the API surface as JSON. Then, I've got a new .NET console app (Lucene.Net.ApiCheck) that calls this tool, and loads in the JSON. So far all of that is working. Next up, ApiCheck will load and reflect over the matching .NET assemblies and compare the API surface to what it loaded from Java. This will of course be the hardest part.

My current thinking is once the diff between Lucene and Lucene.NET is generated, it will support saving this diff as JSON for programmatic/tool analysis, as well as generating an HTML report from this JSON.

There will be a config file to handle known mapping discrepancies, such as Int32Field vs IntField, along with a justification for each, and with enough massaging this config file will effectively represent the known differences between Lucene and Lucene.NET, and should be checked into git to evolve and be versioned alongside the code. Many discrepancies we can handle via convention, such as starting interfaces with "I," capitalizing method names, IDisposable vs ICloseable, etc. It will be interesting to see how the early results look once I get the diff logic working.

@NightOwl888
Copy link
Contributor

Thanks for putting this tool together. I am glad to see you picking up the torch and running with it. It will definitely help with the long-term maintenance of the project. We will need an arsenal of automation and this is a good addition to our war chest.

@paulirwin
Copy link
Contributor Author

paulirwin commented Feb 21, 2025

An update on this, since it's a big item and taking a while. So far, the following are implemented:

  • A Java tool to load Lucene packages from Maven and extracting their public API to a JSON file
  • A .NET CLI tool (and Powershell script to facilitate it) to load the aforementioned JSON file, reflect over the specified Lucene.NET libraries, compare differences, and generate a diff report
  • The generation of both a JSON diff file (for programmatic use, if desired) as well as an HTML report generated using Handlebars (this is my preferred way to use this)
  • Analysis of types in Lucene missing from Lucene.NET
  • Analysis of types in Lucene.NET missing from Lucene
  • Analysis of differences in modifiers of types between Lucene and Lucene.NET, accounting for Java vs .NET differences (i.e. final vs sealed, different meanings of static classes)
  • Analysis of differences in base types
  • A new Lucene.Net.Reflection namespace containing attributes and extension methods to make it easier to reflect over Lucene.NET types. This includes:
    • A NoLuceneEquivalent attribute to suppress things in Lucene.NET that aren't in Lucene
    • A LuceneType attribute to map a Lucene.NET type to a Lucene package and type when the mapping can't be inferred by convention
    • A LuceneMavenMapping assembly-level attribute to specify Maven coordinates for the corresponding Lucene artifact
    • A LucenePackageMapping assembly-level attribute to manually override namespace-to-package mappings that can't be inferred by convention (i.e. org.apache.lucene can be mapped to Lucene.Net by default, but pluralization differences require this attribute)
    • Extension methods on Assembly to get the package mappings
    • A GetLuceneTypeInfo extension method on Type to get the corresponding Lucene type info (either inferred or manually overridden via those attributes)

The following work remains:

  • Attribute to suppress modifier differences
  • Attribute to suppress base type differences
  • Analysis of differences in interfaces
  • Analysis of differences in fields and enum members
  • Analysis of differences in constructors
  • Analysis of differences in non-property methods
  • Analysis of differences in .NET properties to Java getter/setter methods
  • CI build support for running this utility
  • Determine feasibility of analyzing the Analysis.OpenNLP library (targets a different Lucene version)

While this new utility certainly could go further than that longer-term (i.e. analyzing attributes/annotations, unit tests, etc.) I think this should be sufficient for our needs to get a feeling for how close our 4.8.0 release will be to Lucene. Future work could be split out as separate issues.

And while I'm adding some manual mappings via attributes as part of testing this work, I do not intend to address every API difference in this PR. Instead, it will help inform what work remains in either finishing adding remaining mappings, or, if there are things missing in Lucene.NET, what needs to be done in separate issues to close the gap.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
is:task A chore to be done pri:normal
Projects
None yet
Development

No branches or pull requests

2 participants