Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add a property to limit plugin to a list of files #876

Conversation

mathieu
Copy link
Contributor

@mathieu mathieu commented Jan 7, 2025

The idea is to be able to use this plugin from a pre-commit hook (only run on a set of files)

usage: mvn license:check -Dlicense.filesToCheck="src/main/org/acme/MyClass.java,src/main/org/acme/MyEnum.java"

ref #536

…iles

usage: mvn license:check -Dlicense.filesToCheck="src/main/org/acme/MyClass.java,src/main/org/acme/MyEnum.java"
@mathieu mathieu marked this pull request as draft January 7, 2025 08:54
@mathieucarbou mathieucarbou marked this pull request as ready for review January 7, 2025 08:59
@mathieucarbou
Copy link
Owner

Thanks! We'll look!

@mathieucarbou mathieucarbou linked an issue Jan 7, 2025 that may be closed by this pull request
@mathieucarbou
Copy link
Owner

mathieucarbou commented Jan 7, 2025

From a quick reading I see that the property impacts all goals so I would rename the property to something like: globalFileFilter or an equivalent name not bound to any goal but more to the fact that this actually replaces the file scan.

Also, after reading #536, it feels like this is much more a big fix ? Right @hazendaz ?

Passing properties as proposed here #536 (comment) should work... It is also documented in the plugin javadoc (from memory)

@mathieu
Copy link
Contributor Author

mathieu commented Jan 8, 2025

From a quick reading I see that the property impacts all goals so I would rename the property to something like: globalFileFilter or an equivalent name not bound to any goal but more to the fact that this actually replaces the file scan.

As you understood, the intent is to replace the file scan by giving a list of files to run the plugin goals on. The property should reflect this, it is not a filter per se.

Also, after reading #536, it feels like this is much more a big fix ? Right @hazendaz ?

Passing properties as proposed here #536 (comment) should work... It is also documented in the plugin javadoc (from memory)

The proposed solutions did not work for me

@mathieucarbou
Copy link
Owner

mathieucarbou commented Jan 8, 2025

The property should reflect this, it is not a filter per se.

It's neither a "check" (ref: filesToCheck)...

Like said: or an equivalent name not bound to any goal but more to the fact that this actually replaces the file scan.

The proposed solutions did not work for me

That is my real concerne here.... Before merging this PR, I would like to understand why this is not working and whether we should make it work... These properties being documented, I would expect them to work when passed as parameter. They were also working before the move to a list structure.

@mathieu
Copy link
Contributor Author

mathieu commented Jan 8, 2025

That is my real concerne here.... Before merging this PR, I would like to understand why this is not working and whether we should make it work... These properties being documented, I would expect them to work when passed as parameter. They were also working before the move to a list structure.

I'll try to create a test to reproduce this but this is the debug redacted output of a local license:check configuration

[DEBUG] Loading mojo com.mycila:license-maven-plugin:4.6:check from plugin realm ClassRealm[plugin>com.mycila:license-maven-plugin:4.6, parent: jdk.internal.loader.ClassLoaders$AppClassLoader@76ddd772]
[DEBUG] Configuring mojo execution 'com.mycila:license-maven-plugin:4.6:check:default-cli' with basic configurator -->
[DEBUG]   (f) aggregate = true
[DEBUG]   (f) concurrencyFactor = 1.5
[DEBUG]   (f) defaultBasedir = [...]
[DEBUG]   (f) defaultProperties = {inceptionYear=2018, license.git.maxCommitsLookup=1, license.warnIfShallow=false}
[DEBUG]   (f) defaultUseDefaultExcludes = true
[DEBUG]   (f) dependencyEnforce = false
[DEBUG]   (f) dependencyExceptionMessage = Some licenses were denied by policy:
[DEBUG]   (f) dependencyPolicies = []
[DEBUG]   (f) dependencyScopes = [runtime]
[DEBUG]   (f) dryRun = false
[DEBUG]   (f) encoding = UTF-8
[DEBUG]   (f) errorMessage = Some files do not have the expected license header. Run license:format to update them.
[DEBUG]   (f) failIfMissing = true
[DEBUG]   (f) failIfUnknown = true
[DEBUG]   (f) legacyConfigExcludes = []
[DEBUG]   (f) legacyConfigIncludes = []
[DEBUG]   (f) inlineHeader = [...]
[DEBUG]   (f) includes = [**/src/**, **/cli-gen/**, **/i18n/**, *.sh]
[DEBUG]   (f) excludes = [**/src/main/resources/**/*.key, **/src/main/resources/**/*.crt, **/src/test/resources/**, **/src/main/resources/selectors/**, **/src/main/resources/*.avsc, **/src/main/resources/*.sql]
[DEBUG]   (f) licenseSets = [com.mycila.maven.plugin.license.LicenseSet@855f5953]
[DEBUG]   (f) mapping = {bats=SCRIPT_STYLE, conf=DOUBLESLASH_STYLE, jvm.options=SCRIPT_STYLE, less=DOUBLESLASH_STYLE, ltpa.keys=SCRIPT_STYLE, proto=DOUBLESLASH_STYLE, scss=SLASHSTAR_STYLE, ts=SLASHSTAR_STYLE, tsx=SLASHSTAR_STYLE}
[DEBUG]   (f) nThreads = 0
[DEBUG]   (f) prohibitLegacyUse = false
[DEBUG]   (f) project = MavenProject: [...]/LATEST-SNAPSHOT @ [...]/pom.xml
[DEBUG]   (f) quiet = false
[DEBUG]   (f) reportLocation = [...]/target/site/license-plugin-report.xml
[DEBUG]   (f) reportSkipped = false
[DEBUG]   (f) session = org.apache.maven.execution.MavenSession@a1e84b7a
[DEBUG]   (f) settings = org.apache.maven.execution.SettingsAdapter@7315678a
[DEBUG]   (f) skip = false
[DEBUG]   (f) skipExistingHeaders = true
[DEBUG]   (f) strictCheck = true
[DEBUG]   (f) useDefaultMapping = true
[DEBUG]   (f) warnIfShallow = true
[DEBUG] -- end configuration --

This line is triggering: [DEBUG] (f) licenseSets = [com.mycila.maven.plugin.license.LicenseSet@855f5953]

The call was made with -DlicenseSets.licenseSet.includes=pom.xml

@mathieucarbou
Copy link
Owner

Right, there is no toString() representation sadly - one should be added, which displays the object content.

@mathieu
Copy link
Contributor Author

mathieu commented Jan 8, 2025

@mathieucarbou, I've created a new PR with a failing IT test: #878, finding a way to make this test work would eliminate the need for this PR.

@mathieu
Copy link
Contributor Author

mathieu commented Jan 10, 2025

Proposition for the name of the property to set the files to replace the scan of the baseDir:

  1. files, simple and straightforward, indicating a collection of files.
  2. filePaths, indicates the list holds the file paths of the files
  3. fileList, Clearly communicates that this is a list of files
  4. filesToCheck, Already discarded because it is not a check
  5. globalFileFilter, Misleading as this property holds list of files not a filter
  6. other propositions

@mathieucarbou and @hazendaz, do you have any preferences ?

@hazendaz
Copy link
Collaborator

@mathieu I haven't looked at this yet but seems like something we are looking for in mybatis. We are looking to add the git commit hook plugin for any formatting tools because its too frequent that folks use the IDE to make changes, test them, etc but then never run the full build and miss formatting only to make junk commits later. I've done extensive profiling on more than 100 plugins and this by far when messing with the copy right dates is the slowest plugin that exists. That said, this seems interesting to get working so it can work off a diff only as no reason to check all files constantly. Another PR currently out to better handle the slow git commands seems like a double punch to make this plugin go from last to somewhere reasonable. I'm going to try to review all you have up including the alternative this weekend but may take a little longer to fully understand all the changes.

I think 3 in your list seems appropriate in case I'm thinking but before doing any more, give us some time to review fully so we better understand and test this against mybatis where we already have an open PR to add in the git commit hook plugin (without license right now). Currently doing that there ends up taking 8 minutes on the contributors machine, mine takes far less but I have a bulked up machine that is serious overkill for this. At work though, I'm on a dev ops team supporting some 2k+ repos and due to performance on this plugin, we have not really promoted its usage but want to. I think if all these current pull requests work out, it will allow us to further promote that. It may not be a silver bullet but hopefully its enough to make it viable for using a hook to address as code is committed without killing developer experience.

@mathieucarbou
Copy link
Owner

@mathieu @hazendaz : I was thinking about this feature this morning.

Like @mathieu said, the goal is to globally pre-filter the set of files that the plugin will use to work with (pre-commit, etc), which is current done by replacing the scanner.

I think we can do better.

The main idea is to reduce the selection on which the plugin will work, but the way to define this selection, we already have something we could re-use in terms of plugin: the concept of includes/excludes with ant-like patterns, which is even more flexible than defining specific file paths.

This concept is also hidden in the name proposals here: #876 (comment):

In maven, we already have the concepts in other plugins (i.e. fileSets and source for assembly - https://maven.apache.org/plugins/maven-assembly-plugin/assembly.html). So why not re-use this concept and introduce something like this.

It cannot be a global <includes> or <excludes> since they already exist and have been deprecated.

But it could be something really conveying the fact that:

  1. this is filtering globally the sources / workspace
  2. this is using nat patterns (which exclude file proposals because they are not files but patters)

If set, then, we can use the scanner as usual to list the files we should work with, but this list, which forms a virtual workspace, then has to be filtered again using the include/exclude of each license set.

Some thing like (adapting an example from the project and using <workspace>)

  <build>
    <plugins>
      <plugin>
        <groupId>com.mycila</groupId>
        <artifactId>license-maven-plugin</artifactId>
        <version>@project.version@</version>
        <configuration>

          <workspace>
            <includes>
                <include>specific/file.java</include>
                <include>all/in/this/folder/.java</include>
            <includes>
            <excludes>
                <exclude>all/in/this/folder/DoNotTouch.java</exclude>
            <excludes>
          <workspace>
          
          </fileSets>
          <licenseSets>
            <licenseSet>
              <header>mock-license-1.txt</header>
              <excludes>
                <exclude>invoker.properties</exclude>
                <exclude>pom.xml</exclude>
                <exclude>*.groovy</exclude>
                <exclude>**/*.bak</exclude>
                <exclude>*.log</exclude>
                <exclude>mock-license-*</exclude>
                <exclude>**/Unformatted2.java</exclude>
                <exclude>**/Unformatted3.java</exclude>
              </excludes>
            </licenseSet>
            <licenseSet>
              <header>mock-license-2.txt</header>
              <excludes>
                <exclude>invoker.properties</exclude>
                <exclude>pom.xml</exclude>
                <exclude>*.groovy</exclude>
                <exclude>**/*.bak</exclude>
                <exclude>*.log</exclude>
                <exclude>mock-license-*</exclude>
                <exclude>**/Unformatted1.java</exclude>
                <exclude>**/Unformatted3.java</exclude>
              </excludes>
            </licenseSet>
            <licenseSet>
              <header>mock-license-3.txt</header>
              <includes>
                <include>**/Unformatted3.java</include>
              </includes>
            </licenseSet>
          </licenseSets>
        </configuration>
      </plugin>
    </plugins>
  </build>

@mathieucarbou mathieucarbou marked this pull request as draft January 11, 2025 15:22
@mathieu
Copy link
Contributor Author

mathieu commented Jan 11, 2025

So using this proposal from pre-commit would mean something like this ?

mvn license:check -Dlicense.workspace.includes=src/file1.txt,src/file2.txt

@mathieucarbou
Copy link
Owner

mathieucarbou commented Jan 11, 2025

So using this proposal from pre-commit would mean something like this ?

mvn license:check -Dlicense.workspace.includes=src/file1.txt,src/file2.txt

Yes! it would be easier to use it like that and also with patterns.

Also, if you look at #878, it works actually, but more a pain to setup in a commit hook.

I think there is value in being able to reduce the scope of the global workspace, I just don't know yet how to efficiently implement that considering that there are 2 scans involved or 1 big scan and a post-scan filtering.

@mathieucarbou
Copy link
Owner

@hazendaz : what do you think ? Should we release with the latest pr merged or we wait to have the work for this one done ?
I like the idea of constraining the work to a more limited workspace.

But on the other end, as shown, there is currently a way to use a commit hook to only format the committed files, although not efficient.

@hazendaz
Copy link
Collaborator

@hazendaz : what do you think ? Should we release with the latest pr merged or we wait to have the work for this one done ? I like the idea of constraining the work to a more limited workspace.

But on the other end, as shown, there is currently a way to use a commit hook to only format the committed files, although not efficient.

I think we should wait for this. I'm also refocusing over here to do some other cleanup. Since we are going into our first version 5 release on java 11 now its already going to show as a breaking change in terms of the 4.x stream so multiple bigger feature improvements seems appropriate in this particular case even if we are mostly sticking as close to underlying current api as possible.

@mathieucarbou
Copy link
Owner

Since we are going into our first version 5 release on java 11

You're right! Forgot about that!

@mathieucarbou
Copy link
Owner

mathieucarbou commented Jan 13, 2025

@mathieu : are you willing to update your PR to try the proposal about with the workspace idea ?

Also, defaultBasedir could be moved into this new workspace object and the existing one be deprecated and its defaukt value removed.

          <workspace>
            <baseDir></baseDir>
            <includes>
                <include>specific/file.java</include>
                <include>all/in/this/folder/.java</include>
            <includes>
            <excludes>
                <exclude>all/in/this/folder/DoNotTouch.java</exclude>
            <excludes>
          <workspace>

@mathieu
Copy link
Contributor Author

mathieu commented Jan 13, 2025

@mathieucarbou , I'll need some more context on your vision of the <workspace>. What does it do compared to LicenceSet's <includes> and <excludes> ?

@mathieucarbou
Copy link
Owner

mathieucarbou commented Jan 13, 2025

@mathieucarbou , I'll need some more context on your vision of the <workspace>. What does it do compared to LicenceSet's <includes> and <excludes> ?

workspace would define the global workspace to work one

  • workspace.baseDir == project root directory
  • workspace.includes defaulted to null
  • workspace.excludes defaulted to null

This concept exists already but is implicit:

  • the workspace.baseDir is currently defaultBasedir,
  • we do not constrain to any include (the whole workspace is visible)
  • and we do not constrain to some excludes (nothing is excluded by default).

The idea is not make it explicitly configurable.

LicenceSet are used to "split" the workspace into several groups of file and assigning them a license. So they inherit the workspace baseDir, includes, excludes. licensSet.baseDir can be overridden but must be a sub directory of the workspace (@hazendaz : I don't know if we make sure of that somewhere, but with the workspace concept we will be)

As I see things for includes/excludes:

  • if workspace.includes is defined in POM, then it will be added to all licenseSet.includes
  • if workspace.excludes is defined in POM, then it will be added to all licenseSet.excludes
  • if workspace.baseDir is defined in POM, then it will be used to set licenseSet.baseDir (if not set) - like what happens now with defaultBasedir, and if licenseSet.baseDir is set, we have to check that it is same or a subfolder of workspace.baseDir.

For backward compat', defaultBasedir must be marked deprecated, without any default now ,and if set by the user, its value will fill "workspace.baseDir". The property license.basedir should be moved to workspace.baseDir

In the case of a commit hook, then the only property to add would be:

-D license.workspace.includes=<list of file paths>

This behind will be transferred to licenseset.includes and will have at the end the same effect of my comment in this PR: #878 (comment)

Also, this "design" does not change the scanner logic, which is still taking place but based on a refinement of the baseDir / includes / excludes.

@mathieucarbou @hazendaz @frawa : let me know what you think of that ;-)

@mathieu
Copy link
Contributor Author

mathieu commented Jan 13, 2025

In other words, we do not bypass the DirectoryScanner anymore ?

@mathieucarbou
Copy link
Owner

mathieucarbou commented Jan 13, 2025

In other words, we do not bypass the DirectoryScanner anymore ?

No because if you bypass it, you cannot support ant patterns.

Example: -D license.workspace.includes=specific-dir/** for a commit hook that one would like to always apply to only a specific folder whatever the files committed.

And it is also less code to not bypass it and reuse the include / exclude mechanism.

Also, if the file list is given as complete path, the directory scanner anyway will be pretty fast.

@mathieucarbou mathieucarbou linked an issue Jan 13, 2025 that may be closed by this pull request
@mathieu
Copy link
Contributor Author

mathieu commented Jan 15, 2025

I'll close this PR and open an issue (#887) for the workspace feature. I'll open a new PR with updates for it

@mathieu mathieu closed this Jan 15, 2025
@mathieu mathieu deleted the feature/536-property-for-files-to-check branch January 15, 2025 09:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Allow plugin to run only on changed files Command Line flag to alter the list of included files to check
3 participants