Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed cache misses caused by inspectRuntimeClasspath tasks. #918

Merged
merged 6 commits into from
Jan 30, 2024

Conversation

ribafish
Copy link
Contributor

@ribafish ribafish commented Jan 17, 2024

Fixed cache misses caused by inspectRuntimeClasspath tasks.

The issue was that when the runtimeClasspath input was defined just as @InputFiles, the files had bytewise differences because of timestamps. When using @Classpath annotation, the files contents are used for cache fingerpriting - one caveat is that with @Classpath, the order of files also matters, that's why I also sorted the files before adding them to the runtimeClasspath.

These issues were affecting the cacheability of all users of these plugins, including the micronaut-data project.

import org.gradle.api.tasks.PathSensitive;
import org.gradle.api.tasks.PathSensitivity;
import org.gradle.api.tasks.TaskAction;
import org.gradle.api.tasks.*;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this in accordance with the Micronaut project style?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, this should be restored as individual imports

task.getRuntimeClasspath().from(project.getConfigurations().getByName(JavaPlugin.RUNTIME_CLASSPATH_CONFIGURATION_NAME));
// Sort the runtimeClasspath before setting it since the order doesn't actually matter for inspection, however it does matter for caching.
task.getRuntimeClasspath().from(project.getConfigurations()
.getByName(JavaPlugin.RUNTIME_CLASSPATH_CONFIGURATION_NAME).resolve()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this cause eager dependency configuration? that could be bad for performance.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❌ This has indeed to be changed, configurations should never be resolved eagerly. For this check it doesn't really matter anyway to have sorted entries.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is correct, however order matters for calculating the task inputs fingerprint, which is used for caching. I'll fix this ofc.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the order is different then it means that the graph is different, so to me it makes sense that the task is out of date.

import org.gradle.api.tasks.PathSensitive;
import org.gradle.api.tasks.PathSensitivity;
import org.gradle.api.tasks.TaskAction;
import org.gradle.api.tasks.*;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, this should be restored as individual imports

task.getRuntimeClasspath().from(project.getConfigurations().getByName(JavaPlugin.RUNTIME_CLASSPATH_CONFIGURATION_NAME));
// Sort the runtimeClasspath before setting it since the order doesn't actually matter for inspection, however it does matter for caching.
task.getRuntimeClasspath().from(project.getConfigurations()
.getByName(JavaPlugin.RUNTIME_CLASSPATH_CONFIGURATION_NAME).resolve()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❌ This has indeed to be changed, configurations should never be resolved eagerly. For this check it doesn't really matter anyway to have sorted entries.

@ribafish ribafish requested review from melix and runningcode January 18, 2024 10:24
Copy link
Collaborator

@melix melix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I think this solution doesn't work either. The initial javadoc comment says it all: the only thing this task cares about is the name of the files on classpath. It doesn't care about the content of the files.

Therefore, I think a better way to fix it would be to make the getRuntimeClasspath method @Internal, then have an @Input property which is actually computed from the list of files.

@ribafish
Copy link
Contributor Author

Actually I think this solution doesn't work either. The initial javadoc comment says it all: the only thing this task cares about is the name of the files on classpath. It doesn't care about the content of the files.

Therefore, I think a better way to fix it would be to make the getRuntimeClasspath method @Internal, then have an @Input property which is actually computed from the list of files.

In order to get the names of files on classpath, that classpath and its dependencies need to be resolved. With the current implementation that happens in execution time. We can't simply reference the files themselves, as this would have the same problem as explained in the PR description because file metadata (namely timestamp) will change. If we were to have it be a String input with a list of filenames we'd need to resolve the dependencies in configuration time.

As such, I see only two solutions:

  1. Have the input defined as @Classpath - current solution
  2. Make the input accept a list of (unresolved) dependencies and search those for snakeyaml. We could get those by Configuration.getAllDependencies(), which doesn't resolve the configuration, with the downside of not getting transitive dependencies.

@melix
Copy link
Collaborator

melix commented Jan 25, 2024

n order to get the names of files on classpath, that classpath and its dependencies need to be resolved. With the current implementation that happens in execution time.

Yes, but we should change that to happen at configuration time, using a new @Input which uses the configuration resolved component api (which should be compatible with config cache too).

@ribafish
Copy link
Contributor Author

n order to get the names of files on classpath, that classpath and its dependencies need to be resolved. With the current implementation that happens in execution time.

Yes, but we should change that to happen at configuration time, using a new @Input which uses the configuration resolved component api (which should be compatible with config cache too).

Why would we want to have dependency resolution happen in configuration time? What are we trying to solve with this suggestion?

@melix
Copy link
Collaborator

melix commented Jan 25, 2024

Why would we want to have dependency resolution happen in configuration time? What are we trying to solve with this suggestion?

Correctness. With runtime classpath normalization, the task will be out of date if any if the files change. We don't care that they change, we only need the file names.

@ribafish
Copy link
Contributor Author

Why would we want to have dependency resolution happen in configuration time? What are we trying to solve with this suggestion?

Correctness. With runtime classpath normalization, the task will be out of date if any if the files change. We don't care that they change, we only need the file names.

That wouldn't make the build incorrect though.

By forcing configuration time dependency resolution you will do that on every build of every project that uses the plugin, even if the task is not executed. Having the input remain a classpath will only re-execute the task when there's a change in the classpath, which is comparatively quite rare.

@melix
Copy link
Collaborator

melix commented Jan 25, 2024

By forcing configuration time dependency resolution you will do that on every build of every project that uses the plugin, even if the task is not executed

That is not correct. It would be done only if the task is on the task graph, thanks, hopefully, to lazy task configuration. If a task is on the task graph, then it needs to be executed hence configured. If the task is configured and not executed, then it's an error which needs fixing.

@ribafish
Copy link
Contributor Author

ribafish commented Jan 25, 2024

By forcing configuration time dependency resolution you will do that on every build of every project that uses the plugin, even if the task is not executed

That is not correct. It would be done only if the task is on the task graph, thanks, hopefully, to lazy task configuration. If a task is on the task graph, then it needs to be executed hence configured. If the task is configured and not executed, then it's an error which needs fixing.

I worded that badly - dependency resolution in configuration time will run every time the task is on the task graph, even if it doesn't need to be executed due to UP-TO-DATE checks or being retrieved FROM-CACHE. The task itself is attached to all Test tasks, so anytime any test is executed, the task will be on the task graph. Most of the time, the classpath wouldn't be changed, but the classpath would still be resolved in config time, even if the task execution could be/ is avoided.

@melix
Copy link
Collaborator

melix commented Jan 25, 2024

I don't see how this changes anything, but I may be missing something. Today we have:

    @InputFiles
    @PathSensitive(PathSensitivity.NAME_ONLY)
    public abstract ConfigurableFileCollection getRuntimeClasspath();

In order to know if the task is up-to-date, or cached, getRuntimeClasspath() has to be called, because it's an input and because it references the runtimeClasspath configuration, it will be resolved at configuration time. This will not change if you use @Classpath.

Using runtime classpath snapshotting will not prevent the task from being configured if it's in the task graph.

ribafish and others added 2 commits January 29, 2024 12:35
…pathNames are calculated and retreived as inputs.
…nalInput

Make runtimeclasspath an internal input for InspectClasspath
@ribafish ribafish requested review from melix and runningcode January 30, 2024 07:56
@ribafish
Copy link
Contributor Author

ribafish commented Jan 30, 2024

@melix, I updated the PR as requested.

On a side-note, I was thinking that perhaps the best solution for this task would be to make it not cacheable. The execution is super fast since it just goes over the filenames.

For reference, this is a real execution on ge.micronaut.io - a lot of the time (relatively speaking) is spent dealing with remote cache and so all the tasks together take 14s.

@melix
Copy link
Collaborator

melix commented Jan 30, 2024

Interesting numbers, indeed since most of the time will be spent locally dealing with dependency resolution, it's not worth caching.

melix added a commit that referenced this pull request Jan 30, 2024
@melix melix merged commit 97bbef8 into micronaut-projects:master Jan 30, 2024
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants