errors during WABulkReapingCache>>reap can lead to "corrupt" cache in GemStone #848

dalehenrich · 2015-07-16T21:01:06Z

As reported in this GemStone issue an error that occurs while removing the expiredEntries from the dictionary could cause unreapable entries to be stranded in the dictionary. The entry is unregistered when it is added to the expiredEntries, but if the expiredEntries processing isn't completed due to an error, the entries will appear to be expired without having been removed form the dictionary which means that they could be stuck forever ...

I haven't proposed a fix for GsDevKit/Seaside31#68 but I think that the following would be a bit better:

reap
    "Iterate through the cache and remove objects that have expired. Returns the number of expired objects."

    | now platform count expiredEntries |
    now := Time totalSeconds.
    expiredEntries := OrderedCollection new.

    dictionary keysAndValuesDo: [ :key :value |
        value hasBeenAccessedSinceLastReap
            ifTrue: [ value setAccessTime: now ].
        (self isExpired: value now: now)
            ifTrue: [
"move to        self cacheEntryRemoved: value value."
                expiredEntries add: key ] 
        ].

    "In GemStone, this method is performed by a separate maintenance VM, 
     so we are already in transaction (assumed to be running in #autoBegin 
     transactionMode) and do not have to worry about acquiring the TransactionMutex.
    Since we are using reducedConflict dictionaries in the first place, we will remove the keys
    and values from the existing dictionaries without using the mutex."

    count := 0.
    platform := GRPlatform current.
    expiredEntries do:[ :key |
        dictionary removeKey: key.
"here"      self cacheEntryRemoved: value value
        count := count + 1.
        (count \\ 100) isZero ifTrue: [
            platform doCommitTransaction ] ].
    count isZero ifTrue: [
        platform doCommitTransaction ].
    ^ count

That will keep the unregister and dictionary in synch ... didn't look to see if this approach is practical ...

Also, it looks like there might be a missing commit if the count doesn't end on an an even 100 boundary ...

Better would be:

    (count \\ 100) isZero ifFalse: [
        platform doCommitTransaction ].

I think ...

The text was updated successfully, but these errors were encountered:

dalehenrich · 2015-07-16T21:08:50Z

Here's the original (buggy code) for WACache>>gemstoneReap:

gemstoneReap
  "Iterate through the cache and remove objects that have expired."

  "In GemStone, this method is performed by a separate maintenance VM, 
     so we are already in transaction (assumed to be running in #autoBegin 
     transactionMode) and do not have to worry about acquiring the TransactionMutex.
    Since we are using reducedConflict dictionaries in the first place, we will remove the keys
    and values from the existing dictionaries without using the mutex."

  | expired count platform |
  expired := OrderedCollection new.
  objectsByKey
    associationsDo: [ :assoc | 
      (self expiryPolicy isExpiredUpdating: assoc value key: assoc key)
        ifTrue: [ 
          self notifyRemoved: assoc value key: assoc key.
          expired add: assoc ] ].
  count := 0.
  platform := GRPlatform current.
  expired
    do: [ :assoc | 
      count := count + 1.
      objectsByKey removeKey: assoc key.
      keysByObject removeKey: assoc value ifAbsent: [  ].
      count \\ 100 == 0
        ifTrue: [ platform doCommitTransaction ] ].
  count ~~ 0
    ifTrue: [ platform doCommitTransaction ].
  ^ expired size

and in my proposed fix, I move the notifyRemoved:key: into the bottom loop:

gemstoneReap
  "Iterate through the cache and remove objects that have expired."

  "In GemStone, this method is performed by a separate maintenance VM, 
     so we are already in transaction (assumed to be running in #autoBegin 
     transactionMode) and do not have to worry about acquiring the TransactionMutex.
    Since we are using reducedConflict dictionaries in the first place, we will remove the keys
    and values from the existing dictionaries without using the mutex."

  | expired count platform |
  expired := OrderedCollection new.
  objectsByKey
    associationsDo: [ :assoc | 
      (self expiryPolicy isExpiredUpdating: assoc value key: assoc key)
        ifTrue: [ 
          expired add: assoc ] ].
  count := 0.
  platform := GRPlatform current.
  expired
    do: [ :assoc | 
      count := count + 1.
      self notifyRemoved: assoc value key: assoc key.
      objectsByKey removeKey: assoc key.
      keysByObject removeKey: assoc value ifAbsent: [  ].
      count \\ 100 == 0
        ifTrue: [ platform doCommitTransaction ] ].
  count ~~ 0
    ifTrue: [ platform doCommitTransaction ].
  ^ expired size

but I hadn't tested that bugfix either...

marschall · 2015-07-17T08:15:05Z

I just committed Seaside-Core-pmm.861, can you have a look again?

dalehenrich · 2015-07-17T16:29:46Z

Um, you didn't commit it up on github?

dalehenrich · 2015-07-17T16:45:12Z

Thanks @marschall.

I still think the final commit logic is wrong:

        count := count + 1.
        (count \\ 100) isZero ifTrue: [
            platform doCommitTransaction ] ].
    (count \\ 100) isZero ifTrue: [
        platform doCommitTransaction ].
    ^ count

I think it needs to look like this:

        count := count + 1.
        (count \\ 100) isZero ifTrue: [
            platform doCommitTransaction ] ].
+   (count \\ 100) isZero ifFalse: [
-   (count \\ 100) isZero ifTrue: [
        platform doCommitTransaction ].
    ^ count

This (count \\ 100) isZero ifTrue: is the same logic used in loop, so if the loop commits on the last count, the final commit will also commit .... if the loop does not commit on the last count, the final commit will not commit as well.

The final commit is there to ensure that any partial work gets committed, so the lgic should invert the in loop logic...

marschall · 2015-07-18T07:43:00Z

You're correct Seaside-Core-pmm.865

dalehenrich mentioned this issue Jul 16, 2015

Seaside-Core, Seaside-Session and Seaside-Environment have gemstone-specific branches #845

Closed

marschall added this to the Seaside 3.2 milestone Jul 17, 2015

jbrichau modified the milestones: 3.3, 3.2 May 5, 2016

marschall closed this as completed Aug 21, 2016

marschall modified the milestones: 3.2, 3.3 Aug 21, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

errors during WABulkReapingCache>>reap can lead to "corrupt" cache in GemStone #848

errors during WABulkReapingCache>>reap can lead to "corrupt" cache in GemStone #848

dalehenrich commented Jul 16, 2015

dalehenrich commented Jul 16, 2015

marschall commented Jul 17, 2015

dalehenrich commented Jul 17, 2015

dalehenrich commented Jul 17, 2015

marschall commented Jul 18, 2015

errors during WABulkReapingCache>>reap can lead to "corrupt" cache in GemStone #848

errors during WABulkReapingCache>>reap can lead to "corrupt" cache in GemStone #848

Comments

dalehenrich commented Jul 16, 2015

dalehenrich commented Jul 16, 2015

marschall commented Jul 17, 2015

dalehenrich commented Jul 17, 2015

dalehenrich commented Jul 17, 2015

marschall commented Jul 18, 2015