-
Notifications
You must be signed in to change notification settings - Fork 414
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OAK-10921 : fixed race condition where fullGC database variables gets… #1562
Conversation
@@ -94,7 +94,6 @@ | |||
import static org.junit.Assert.assertNull; | |||
import static org.junit.Assert.assertTrue; | |||
import static org.junit.Assert.fail; | |||
import static org.junit.Assume.assumeThat; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed it, cause it was an unused import.
...ment/src/test/java/org/apache/jackrabbit/oak/plugins/document/VersionGarbageCollectorIT.java
Outdated
Show resolved
Hide resolved
public Iterable<NodeDocument> getModifiedDocs(long fromModified, long toModified, int limit, @NotNull String fromId, | ||
final @NotNull Set<String> includePaths, final @NotNull Set<String> excludePaths) { | ||
// reset fullGC variables externally while GC is running | ||
store1.getDocumentStore().remove(SETTINGS, SETTINGS_COLLECTION_ID); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like we have two variations of reset :
- [reset|https://github.com/apache/jackrabbit-oak/blob/65e2ac78fb618915d6783033c61cd724fac54b36/oak-store-document/src/main/java/org/apache/jackrabbit/oak/plugins/document/VersionGarbageCollector.java#L382-L384]
- [reset fullGC|https://github.com/apache/jackrabbit-oak/blob/65e2ac78fb618915d6783033c61cd724fac54b36/oak-store-document/src/main/java/org/apache/jackrabbit/oak/plugins/document/VersionGarbageCollector.java#L386-L391]
IIUC then this just tests one of the two. Might be good to actually test both (whether in same or different test method)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see the new resetFullGCFromOakRunWhileRunning
is virtually identical to resetGCFromOakRunWhileRunning
except for reset()
vs resetFullGC()
- which was the idea of course - but I would suggest to consider refactoring these two to avoid (test) code duplication - drafted this (without testing) in #1572 to illustrate
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, assuming there will be another test case as @stefan-egli already proposed.
@@ -191,7 +190,7 @@ public GCCounts(FullGCMode mode, int deletedDocGCCount, int deletedPropsCount, | |||
public DocumentStoreFixture fixture; | |||
|
|||
@Parameter(1) | |||
public FullGCMode fullGcMode; | |||
FullGCMode fullGcMode; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reduced the scope for FullGCMode
to avoid leaking this object.
0c80fd9
to
9f6680c
Compare
SETTINGS_COLLECTION_FULL_GC_TIMESTAMP_PROP, stats.oldestModifiedDocTimeStamp)); | ||
setStringSetting(SETTINGS_COLLECTION_FULL_GC_DOCUMENT_ID_PROP, stats.oldestModifiedDocId); | ||
setVGCSetting(SETTINGS_COLLECTION_OLDEST_TIMESTAMP_PROP, scope.toMs); | ||
updateVGCSetting(new HashMap<>() {{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
subclassing HashMap seems heavy, what about using Map.of()
as was done in some other cases?
private void setLongSetting(String propName, long val) { | ||
setLongSetting(of(propName, val)); | ||
private void setVGCSetting(final String propName, final Object val) { | ||
setVGCSetting(new HashMap<>() {{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(same as previous)
subclassing HashMap seems heavy, what about using Map.of()
as was done in some other cases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Map.of()
doesn't allow null
values and during some test cases, some values happen to be null
and cause the test failures.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. Alternatively, you could of course create the hasmap first, then add the key value. Would be slightly more code but avoid the subclass.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, didn't know the performance aspect of this. Will make the change.
}); | ||
} | ||
|
||
private void updateVGCSetting(final Map<String, Object> propValMap) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Noticed a couple issues with this:
- it is not clear in which situation
updateVGCSetting
is used. It seems to be targeted forevaluate
. But that method has a complex (if-) structure and it's not easy to figure out in which casesupdateVGCSetting
is called. Maybe this is more of a readability comment. Perhaps adding a comment would be a good improvement thouhg. VersionGCInitTest.lazyInitialize
currently fails sinceupdateVGCSetting
is invokved but with a condition thatfullGCTimeStamp
andfullGCId
are set. Except it seems those are not initialized. So the initialization of these must somehow be better covered, I'd suggest first in a separate test case perhaps, and then in the code.
… overridden if they are reset by external components
Started rereviewing and must admit that it's perhaps not immediately obvious why we use Then I noticed that the jira as well doesn't much go into details as to what exactly the race condition is. The PR does :
What about adding a comment (or the actual description) to the jira that explains what goes wrong when "the race condition" happens? I guess it is : when you do a reset using oak-run while an oak instance is executing a regular fullGC (not dry run), then the reset would be ignored? And at the same time perhaps why dry run is different from wet run in this regard? Just might make code reading a bit easier. PS: also noticed that due to the force-push I can't review the diff between my last review and this one - so can't follow what has changed since. |
Related to that, currently Perhaps in combination with explaining why dry and wet are different when they race with the different resets, could make the code a bit easier to understand. |
The flow is like this:
To fix this, we need to replace the updateAndCreate API (to update full GC variables after each full GC cycle) with findAndModify, this will ensure that in case they are already removed, they won't get added again. Once the new cycle of GC starts, it will add them again with values of the oldest modified document & its ID. Note: This race condition won't come in case of dry run because dry run is not supported in normal flow. We can dry run with only the oak run command. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, thx for the explanation in the jira, helpful indeed.
… overridden if they are reset by external components