Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

improvement: use pc for finding references of local symbols and when semanticdb is missing #5940

Merged
merged 18 commits into from
May 22, 2024

Conversation

kasiaMarek
Copy link
Contributor

@kasiaMarek kasiaMarek commented Dec 13, 2023

semanticdb only:
time: found 459 references to symbol 'scala/Boolean#`&&`().' in 0.25s
bloom filters + pc (that's a bit nonsensical, since bloom filters are build based on semanticdb):
time: found 429 references to symbol 'scala/Boolean#`&&`().' in 6.47s

Note: This won't work for definitionOrReferences if semanticdb is missing. But it requires a bit more effort to make it work reasonably, so I'd rather leave it as a followup.
connected to: #5759

@kasiaMarek kasiaMarek force-pushed the pc-references branch 5 times, most recently from a4d5d77 to af2ba5a Compare December 20, 2023 11:46
@kasiaMarek kasiaMarek force-pushed the pc-references branch 3 times, most recently from ec87650 to 64412d6 Compare December 21, 2023 09:50
@kasiaMarek kasiaMarek marked this pull request as ready for review December 21, 2023 15:34
Copy link
Member

@jkciesluk jkciesluk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work!
I have some questions, but nothing major

set.foreach(bloom.put)
}

private def collectIdentifiers(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this? Isn't the referencesProvider().addIdentifiers(source, identifiers) in Indexer.scala enough?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It updates identifiers on didSave.

Copy link
Contributor

@tgodzik tgodzik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is some great work, would be awesome if we really just allow user only a bit slower experience when semanticdb is missing.

/**
* Returns the references of the symbol under the current position in the target files.
*/
public CompletableFuture<java.util.List<DefinitionResult>> references(OffsetParams params, java.util.List<VirtualFileParams> targetFiles, boolean includeDefinition) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably add an additional ReferenceResults class just for the sake of clarity. And also we could have MultipleFileOffsetParams or ReferenceOffsetParams, so we would not need more params


case class IndexEntry(
id: BuildTargetIdentifier,
bloom: BloomFilter[CharSequence],
)
val index: TrieMap[Path, IndexEntry] = TrieMap.empty
val tokenIndex: TrieMap[Path, IndexEntry] = TrieMap.empty
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have an idea how much more memory this takes up? And maybe how much longer indexing takes? Or are the indexing times negligible since we already tokenize the files?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll check both. Indexing should be similar but there may be additional cost to building the bloom filter.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In metals repo:
memory: identifier index using 1.37M (77,536 elements)
Benchmarks:

Benchmark                                                   Mode  Cnt  Score   Error  Units
MetalsBench.alltoplevelsScalaIndex                            ss   10  0.419 ± 0.003   s/op
MetalsBench.alltoplevelsScalaIndexWithCollectIdents           ss   10  0.499 ± 0.011   s/op
MetalsBench.alltoplevelsScalaIndexWithBuildIdentifierIndex    ss   10  0.627 ± 0.016   s/op

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should be fine then, that's not a lot 👍

The only issue I can think of is anything that is not named like apply methods, but we can try and handle that later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. I guess the best way to go about it would be not to send targetUris right away but have pc resolve the name. We could also use untyped trees. Anyway it will quickly get hairy for things like:

case class Foo() {
  def apply()
 }
 val f = Foo()
 val h = f()

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could also look at synthetic symbols as we already have some logic for it (or at least the apply one) and we search for apply, it should be found properly

}
if targetFiles.nonEmpty
} yield compilers
.references(params, targetFiles, EmptyCancelToken)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be great to forward an actual cancel token so if it takes longer to find references this can actually be cancelled.

class PcReferencesProvider(
compiler: MetalsGlobal,
params: OffsetParams,
targetFiles: List[VirtualFileParams],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we shouldn't read the files on demand instead of sending them in the params here. There are cases when we will not need targetFiles to be read at all. So maybe we should just read it here. Also sending too many large files might cause OOM maybe?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think it's reasonable to ask for batches instead of giving all at once.

@kasiaMarek kasiaMarek changed the title improvement: use pc for finding references of local symbols improvement: use pc for finding references of local symbols and when sementicdb is missing Jan 18, 2024
@kasiaMarek
Copy link
Contributor Author

Maybe we should also use pc for files with stale semanticDB?

@kasiaMarek kasiaMarek changed the title improvement: use pc for finding references of local symbols and when sementicdb is missing improvement: use pc for finding references of local symbols and when semanticdb is missing Feb 20, 2024
@tgodzik
Copy link
Contributor

tgodzik commented Feb 20, 2024

Maybe we should also use pc for files with stale semanticDB?

That makes sense, would reduce the need to calculate the diff, which isn't 100% sure anyway. We can do that later if needed.

params.params().uri().toString(),
highlight.getRange(),
)
val adjust =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we adjsut things in Compilers as with all other requests?

@@ -1362,3 +1401,11 @@ object Compilers {
case object Default extends PresentationCompilerKey
}
}

case class PcAdjustFileParams(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would not add these new params if possible to just adjsut results afterward.


check(
"basic3",
"""|/a/src/main/scala/O.scala
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be worth adding a few mroe tests here

@tgodzik tgodzik self-requested a review April 3, 2024 09:31
@@ -703,6 +719,28 @@ class Compilers(
}
}.getOrElse(Future.successful(Nil.asJava))

def references(
params: ReferenceParams,
targetFiles: List[AbsolutePath],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
targetFiles: List[AbsolutePath],
searchInFiles: List[AbsolutePath],

.map(
_.asScala.toList.map { defRes =>
val locations = defRes.locations().asScala.toList
ReferencesResult(defRes.symbol(), locations)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we run adjust on locations here?

val request = PcReferencesRequest(
CompilerOffsetParamsUtils.fromPos(pos, token),
params.getContext().isIncludeDeclaration(),
targetFiles.map(_.toURI).asJava,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if instead, we first run references on the main file, which then returns the symbol and then we use that symbol on each of the other files? I think we were talking about it at some point, but it seems much less added mental overhead.

We would need to maybe create ReferenceParams that contain:
Either[Position, Symbol]

If it's a position then we search at the specific position and all references, otherwise we search for symbol. We don't need to have the additional Buffers etc.

Comment on lines 569 to 590

private val compilers: Compilers = register(
val compilers: Compilers = register(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why remove private?

Copy link
Contributor

@tgodzik tgodzik May 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't find it being used currently, could be private

Copy link
Contributor

@tgodzik tgodzik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor comments, otherwise a see a few things for a follow up:

  • we should close presentation compiler for non active targets, otherwise we might easily go OOM in for example Bazel
  • we don't search for references if we do go to def on definition
  • it seems like sometimes we get duplicated location, where one location is empty, not sure if I can reproduce, but will try

Comment on lines 569 to 590

private val compilers: Compilers = register(
val compilers: Compilers = register(
Copy link
Contributor

@tgodzik tgodzik May 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't find it being used currently, could be private

import org.eclipse.{lsp4j => l}

trait PcReferencesProvider {
_: WithCompilationUnit with PcCollector[(String, Option[l.Range])] =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally WithCompilationUnit would also be a trait so that we can use normal inheritance, but if possible it can be fixed later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure yet how to do that nicely.

@kasiaMarek
Copy link
Contributor Author

kasiaMarek commented May 9, 2024

we should close presentation compiler for non active targets, otherwise we might easily go OOM in for example Bazel

👍

we don't search for references if we do go to def on definition

included in the PR description

it seems like sometimes we get duplicated location, where one location is empty, not sure if I can reproduce, but will try

I purposefully return ReferencesResult from pc even if there are no locations (can be only def, that we don't care about), so we get the full list of symbols, that we want to search for (

compilers.references(params, EmptyCancelToken).flatMap { foundRefs =>
). I added a filter to the final result. EDIT: Can't do that actually, is that a problem? Don't results get mapped to locations anyway?

@kasiaMarek kasiaMarek force-pushed the pc-references branch 2 times, most recently from 04c2c9f to 3321128 Compare May 20, 2024 11:03
@kasiaMarek kasiaMarek merged commit d12e21e into scalameta:main May 22, 2024
22 of 24 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants