-
Notifications
You must be signed in to change notification settings - Fork 453
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make StringConverter threadsafe #3157
Conversation
@@ -29,7 +30,7 @@ | |||
* @author mrieser | |||
*/ | |||
public class StringConverter implements AttributeConverter<String> { | |||
private final Map<String, String> stringCache = new HashMap<String, String>(1000); | |||
private final Map<String, String> stringCache = new ConcurrentHashMap<>(1000); | |||
@Override | |||
public String convert(String value) { | |||
String s = this.stringCache.get(value); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With ConcurrentHashMap
you also have to change the insertion logic, as you might have a race condition here, with two processes getting a null
value at the same time and then updating the cache. You could use:
return map.computeIfAbsent(value, k -> k);
In general, I would expect an attribute converter to be stateless. Especially here: Why is there a cache if If the state is really necessary, I would think that the workers should have their own instance of these things, instead of sharing references. I.e., taking care of parallelism and independent data structures should probably be solved on the level where parallel execution is introduced. |
The (Concurrent)HashMap here is used for (manual) deduplication of strings. Reading attributes from XML returns different Strings, even if the string-content is always the same. Having the map ensures we store identical strings only once in memory, typically reducing memory consumption by a large margin. |
Interesting! I assumed that this was solved via interning, but of course, the XML parser allocates Could we use something like the following?
Then the JVM would take care of this for us. (Just being curious) (Javadoc intern()) |
At least in older versions of Java, the data structure used for the E.g. see https://stackoverflow.com/questions/10624232/performance-penalty-of-string-intern, https://www.baeldung.com/java-string-pool#performance-and-optimizations (look for |
I see. Thank you @mrieser! |
This PR makes the StringConverter threadsafe. There are currently hints that indicate a potential race condition as the same AttributeConverter references are used by all worker threads within the ParallelPopulationReaderMatsimV6