Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make StringConverter threadsafe #3157

Merged
merged 1 commit into from
Mar 11, 2024

Conversation

steffenaxer
Copy link
Collaborator

This PR makes the StringConverter threadsafe. There are currently hints that indicate a potential race condition as the same AttributeConverter references are used by all worker threads within the ParallelPopulationReaderMatsimV6

@steffenaxer steffenaxer enabled auto-merge March 11, 2024 22:00
@steffenaxer steffenaxer merged commit 50dfe1a into matsim-org:master Mar 11, 2024
46 checks passed
@@ -29,7 +30,7 @@
* @author mrieser
*/
public class StringConverter implements AttributeConverter<String> {
private final Map<String, String> stringCache = new HashMap<String, String>(1000);
private final Map<String, String> stringCache = new ConcurrentHashMap<>(1000);
@Override
public String convert(String value) {
String s = this.stringCache.get(value);
Copy link
Member

@Janekdererste Janekdererste Mar 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With ConcurrentHashMap you also have to change the insertion logic, as you might have a race condition here, with two processes getting a null value at the same time and then updating the cache. You could use:

return map.computeIfAbsent(value, k -> k);

@Janekdererste
Copy link
Member

In general, I would expect an attribute converter to be stateless. Especially here: Why is there a cache if String convert(String value) returns the same value?

If the state is really necessary, I would think that the workers should have their own instance of these things, instead of sharing references. I.e., taking care of parallelism and independent data structures should probably be solved on the level where parallel execution is introduced.

@mrieser
Copy link
Contributor

mrieser commented Mar 12, 2024

The (Concurrent)HashMap here is used for (manual) deduplication of strings. Reading attributes from XML returns different Strings, even if the string-content is always the same. Having the map ensures we store identical strings only once in memory, typically reducing memory consumption by a large margin.

@Janekdererste
Copy link
Member

Janekdererste commented Mar 12, 2024

Reading attributes from XML returns different Strings

Interesting! I assumed that this was solved via interning, but of course, the XML parser allocates new Strings(bytes) I guess.

Could we use something like the following?

var cached = value.intern();

Then the JVM would take care of this for us. (Just being curious) (Javadoc intern())

@mrieser
Copy link
Contributor

mrieser commented Mar 12, 2024

At least in older versions of Java, the data structure used for the intern() had a fixed size, performing poorly when a large number of different Strings were interned. It was thus usually advised to rather use a ConcurrentHashMap for better performance.

E.g. see https://stackoverflow.com/questions/10624232/performance-penalty-of-string-intern, https://www.baeldung.com/java-string-pool#performance-and-optimizations (look for -XX:StringTableSize)

@Janekdererste
Copy link
Member

I see. Thank you @mrieser!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants