Refactor `Linguist::Repository` to isolate Rugged usage #7094

vdye · 2024-10-16T17:37:45Z

Description

The goal of this change is to add flexibility to how repository data is accessed by Linguist::Repository & Linguist::LazyBlob, allowing users to easily configure an alternative to Rugged.

Internally, Linguist::Repository and Linguist::LazyBlob use Rugged to read Git repository data, including diff, attribute, and blob information. While this works for most repositories, it has limits:

Rugged/libgit2 can lag behind feature support in Git (e.g. reftable, previously SHA-256).
Rugged is a Git API, which makes using Linguist with other SCMs challenging.

The approach taken here is to replace the Rugged::Repository instance in the Linguist::Repository with a new Linguist::Source::Repository instance. The "source" repository contains functions wrapping what were previously Rugged operations (diff, attribute lookup, etc.). Users can then write their custom implementations of those functions and pass their Linguist::Source::Repository into Linguist::Repository to use them seamlessly.

This isn't intended to be a breaking change, so there are a few extra things done to avoid compatibility issues with existing usage:

If a Rugged::Repository is passed in as the first argument to either the Linguist::Repository or LazyBlob initializer, it is wrapped in a Linguist::Source::RuggedRepository internally.
GIT_ATTR_OPTS & GIT_ATTR_FLAGS are Rugged-specific so they're moved to RuggedRepository, but the LazyBlob constants are not removed and instead point to their RuggedRepository counterparts.
current_tree and read_index don't make sense for non-Rugged repos (the former returns a Rugged tree instance, the latter is specific to how Rugged needs to look up attributes). They raise NotImplementedError with a message referencing deprecation only if called on a non-Rugged repository instance; otherwise they behave the same way as before.
A method_missing implementation is added to Linguist::RuggedRepository to delegate any unmatched method calls to the internal Rugged::Repository instance (in case users are calling Linguist::Repository.repository directly).

The only possible compatibility issue I can imagine is if a user does some kind of type check on Linguist::Repository.repository (previously it was a Rugged::Repository, now it'll be a Linguist::Source::RuggedRepository). That seems highly unlikely, though, and should be a simple fix if needed.

The commits on this branch are organized to be atomic and incrementally reviewable:

Commit 1 adds the generic Linguist::Source:Repository and Linguist::Source::Diff interfaces, with all methods raising NotImplementedError to ensure they are overridden by a subclass implementation.
Commit 2 adds a Rugged implementation of Linguist::Source::Repository matching existing usage in compute_stats and Linguist::LazyBlob.
Commit 3 updates Linguist::Repository to use a Linguist::Source::Repository instead of a Rugged::Repository to read repository content.
Commit 4 adds the method_missing implementation to RuggedRepository.

Checklist:

Add interfaces representing a generic "Repository" and "Diff", containing functions currently handled by the Rugged repository instance in 'Linguist::Repository'. Inheriting from these interfaces will allow for alternative implementations of the functions used to traverse and analyze a repository, e.g. using a different API for Git storage or a different SCM altogether. For now, the interfaces are unused. Signed-off-by: Victoria Dye <[email protected]>

Add Rugged implementations of the 'Repository', 'Diff', and 'Diff::Delta' interfaces matching existing usage in 'Linguist::Repository' & 'Linguist::LazyBlob'. In a subsequent commit, this will allow us to substitute an instance of the 'Repository' interface for what is currently direct usage of a 'Rugged::Repository'. Signed-off-by: Victoria Dye <[email protected]>

Change the 'repository' argument to 'Linguist::Repository' && 'Linguist::LazyBlob' from a 'Rugged::Repository' to an instance of 'Linguist::Source::Repository'. This will allow users of Linguist to easily configure and use a custom repository interface. There are two methods that don't have a clear or useful parallel in a generic repository interface and are more specific to Rugged: 'read_index' and 'current_tree'. For both of these methods, raise a 'NotImplementedError' if any repository instance that's not a 'RuggedRepository' calls them, and return the legacy value for ones that are 'RuggedRepository'. Also for backward-compatibility purposes, users can still initialize a 'Linguist::Repository' or 'Linguist::LazyBlob' with a 'Rugged::Repository'; it will be wrapped in the 'Linguist::Source::RuggedRepository' in the initialization method. Finally, update 'test_repository.rb' to test both a 'RuggedRepository' instance and a mocked always-empty repository. Signed-off-by: Victoria Dye <[email protected]>

Add a 'method_missing' implementation to 'RuggedRepository' to delegate all unmatched methods to the internal Rugged objects of each. This is done for backward-compatibility purposes; users of Linguist can access the 'repository' member of 'Linguist::Repository', so they may rely on interacting directly with the 'Rugged::Repository'. The 'method_missing' delegate ensures that those interactions will generally continue to work (the main exception being explicit type checking performed on 'Linguist::Repository.repository'). Signed-off-by: Victoria Dye <[email protected]>

vdye added 4 commits October 15, 2024 18:55

vdye marked this pull request as ready for review October 18, 2024 15:14

vdye requested a review from a team as a code owner October 18, 2024 15:14

lildude approved these changes Nov 25, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor `Linguist::Repository` to isolate Rugged usage #7094

Refactor `Linguist::Repository` to isolate Rugged usage #7094

vdye commented Oct 16, 2024

Refactor Linguist::Repository to isolate Rugged usage #7094

Are you sure you want to change the base?

Refactor Linguist::Repository to isolate Rugged usage #7094

Conversation

vdye commented Oct 16, 2024

Description

Checklist:

Refactor `Linguist::Repository` to isolate Rugged usage #7094

Refactor `Linguist::Repository` to isolate Rugged usage #7094