Make StringConverter threadsafe #3157

steffenaxer · 2024-03-11T21:48:20Z

This PR makes the StringConverter threadsafe. There are currently hints that indicate a potential race condition as the same AttributeConverter references are used by all worker threads within the ParallelPopulationReaderMatsimV6

Janekdererste · 2024-03-12T10:13:16Z

matsim/src/main/java/org/matsim/utils/objectattributes/attributeconverters/StringConverter.java

@@ -29,7 +30,7 @@
 * @author mrieser
 */
 public class StringConverter implements AttributeConverter<String> {
-	private final Map<String, String> stringCache = new HashMap<String, String>(1000);
+	private final Map<String, String> stringCache = new ConcurrentHashMap<>(1000);
 	@Override
 	public String convert(String value) {
 		String s = this.stringCache.get(value);


With ConcurrentHashMap you also have to change the insertion logic, as you might have a race condition here, with two processes getting a null value at the same time and then updating the cache. You could use:

return map.computeIfAbsent(value, k -> k);

Janekdererste · 2024-03-12T10:18:07Z

In general, I would expect an attribute converter to be stateless. Especially here: Why is there a cache if String convert(String value) returns the same value?

If the state is really necessary, I would think that the workers should have their own instance of these things, instead of sharing references. I.e., taking care of parallelism and independent data structures should probably be solved on the level where parallel execution is introduced.

mrieser · 2024-03-12T10:21:03Z

The (Concurrent)HashMap here is used for (manual) deduplication of strings. Reading attributes from XML returns different Strings, even if the string-content is always the same. Having the map ensures we store identical strings only once in memory, typically reducing memory consumption by a large margin.

Janekdererste · 2024-03-12T10:45:32Z

Reading attributes from XML returns different Strings

Interesting! I assumed that this was solved via interning, but of course, the XML parser allocates new Strings(bytes) I guess.

Could we use something like the following?

var cached = value.intern();

Then the JVM would take care of this for us. (Just being curious) (Javadoc intern())

mrieser · 2024-03-12T13:34:56Z

At least in older versions of Java, the data structure used for the intern() had a fixed size, performing poorly when a large number of different Strings were interned. It was thus usually advised to rather use a ConcurrentHashMap for better performance.

E.g. see https://stackoverflow.com/questions/10624232/performance-penalty-of-string-intern, https://www.baeldung.com/java-string-pool#performance-and-optimizations (look for -XX:StringTableSize)

Janekdererste · 2024-03-12T14:23:45Z

I see. Thank you @mrieser!

Make StringConverter threadsafe

03487d5

steffenaxer enabled auto-merge March 11, 2024 22:00

steffenaxer merged commit 50dfe1a into matsim-org:master Mar 11, 2024
46 checks passed

Janekdererste reviewed Mar 12, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make StringConverter threadsafe #3157

Make StringConverter threadsafe #3157

steffenaxer commented Mar 11, 2024

Janekdererste Mar 12, 2024 •

edited

Loading

Janekdererste commented Mar 12, 2024

mrieser commented Mar 12, 2024

Janekdererste commented Mar 12, 2024 •

edited

Loading

mrieser commented Mar 12, 2024

Janekdererste commented Mar 12, 2024

Make StringConverter threadsafe #3157

Make StringConverter threadsafe #3157

Conversation

steffenaxer commented Mar 11, 2024

Janekdererste Mar 12, 2024 • edited Loading

Choose a reason for hiding this comment

Janekdererste commented Mar 12, 2024

mrieser commented Mar 12, 2024

Janekdererste commented Mar 12, 2024 • edited Loading

mrieser commented Mar 12, 2024

Janekdererste commented Mar 12, 2024

Janekdererste Mar 12, 2024 •

edited

Loading

Janekdererste commented Mar 12, 2024 •

edited

Loading