Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Overhaul worker concurrency helpers with mill.api.CacheFactory #4001

Merged
merged 12 commits into from
Nov 22, 2024

Conversation

lihaoyi
Copy link
Member

@lihaoyi lihaoyi commented Nov 21, 2024

Fixes #3730

The old FixSizedCache/KeyedLockedCache/RandomBoundedCache were all a bit of a mess, and none of them provided one crucial component to long-lived cache entries: the ability to tear them down after you are done with them. This is necessary for a lot of long-lived components you may want to cache: classloaders, subprocesses, open file handles, etc. all need to be torn down after use. Also, their implementation was weird, probably overly convoluted, and not used consistently throughout Mill

This PR introduces a new CachedFactory class meant to replace the old ones. I use it in four places (ZincWorker Scala and Java compilers, ScalaJsWorker, ScalanativeWorker), and added docs so users can discover and use it too. It provides a hook for def teardown, which should hopefully fix some of the classloader leaks I've seen in our prior implementations (these normally manifest as CODE CACHE FULL, DISABLING JIT COMPILER warnings when doing large builds). The last previously-problematic worker defsite VisualizeModule.worker cannot be changed without breaking bincompat so I left it in place for now

CachedFactory also differs from the original implementation is that the cache contains the logic for def setup, rather than relying on the callsite of .withCachedValue to provide the setup and key together. This should be easier to understand and less likely to be mis-used, as the mapping from key => value is kept in one place, whereas previously every withCachedValue could have a different mapping.

Hopefully this new CachedFactory helper class can form the basis of how people define concurrency-friendly stateful workers going forward

Covered by some unit tests and example tests, as well as usage in the main ZincWorker/ScalaJSWorker/ScalaNativeWorker code paths

@lihaoyi lihaoyi changed the title [WIP] Overhaul worker concurrency helpers with mill.api.CacheFactory Overhaul worker concurrency helpers with mill.api.CacheFactory Nov 21, 2024
@lihaoyi lihaoyi requested review from lolgab and lefou and removed request for lolgab November 21, 2024 12:17
@lihaoyi
Copy link
Member Author

lihaoyi commented Nov 21, 2024

CC @jodersky @alexarchambault if you guys want to take a look

@lihaoyi lihaoyi requested a review from lolgab November 21, 2024 12:18
@@ -30,20 +36,25 @@ class CompressWorker(dest: os.Path) {
val cache = collection.mutable.Map.empty[Int, Array[Byte]]
def compress(name: String, bytes: Array[Byte]): Array[Byte] = {
val hash = Arrays.hashCode(bytes)
if (!cache.contains(hash)) {
if (!synchronized(cache.contains(hash))) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Naive 'user level' complaint:

I'm not a fan the 'synchronized' in an example. Its so easy to get wrong and I'm always having a hard time to convince myself that it is correct.

What I would love instead is having the concurrent map operations on it:
cache.computeIfAbsent(key, (key)=>createTheExpensiveThing())

Note again: This is from a doc/end user perspective and not the internal use.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh...this is a negative example / problem statement =). So,yes, then it makes sense.

@lihaoyi lihaoyi marked this pull request as ready for review November 22, 2024 10:36
@lihaoyi lihaoyi merged commit a900a9b into com-lihaoyi:main Nov 22, 2024
26 of 27 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve framework support for cached multi-thread-safe workers
2 participants