Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

errors during WABulkReapingCache>>reap can lead to "corrupt" cache in GemStone #848

Closed
dalehenrich opened this issue Jul 16, 2015 · 5 comments
Milestone

Comments

@dalehenrich
Copy link
Member

As reported in this GemStone issue an error that occurs while removing the expiredEntries from the dictionary could cause unreapable entries to be stranded in the dictionary. The entry is unregistered when it is added to the expiredEntries, but if the expiredEntries processing isn't completed due to an error, the entries will appear to be expired without having been removed form the dictionary which means that they could be stuck forever ...

I haven't proposed a fix for GsDevKit/Seaside31#68 but I think that the following would be a bit better:

reap
    "Iterate through the cache and remove objects that have expired. Returns the number of expired objects."

    | now platform count expiredEntries |
    now := Time totalSeconds.
    expiredEntries := OrderedCollection new.

    dictionary keysAndValuesDo: [ :key :value |
        value hasBeenAccessedSinceLastReap
            ifTrue: [ value setAccessTime: now ].
        (self isExpired: value now: now)
            ifTrue: [
"move to        self cacheEntryRemoved: value value."
                expiredEntries add: key ] 
        ].

    "In GemStone, this method is performed by a separate maintenance VM, 
     so we are already in transaction (assumed to be running in #autoBegin 
     transactionMode) and do not have to worry about acquiring the TransactionMutex.
    Since we are using reducedConflict dictionaries in the first place, we will remove the keys
    and values from the existing dictionaries without using the mutex."

    count := 0.
    platform := GRPlatform current.
    expiredEntries do:[ :key |
        dictionary removeKey: key.
"here"      self cacheEntryRemoved: value value
        count := count + 1.
        (count \\ 100) isZero ifTrue: [
            platform doCommitTransaction ] ].
    count isZero ifTrue: [
        platform doCommitTransaction ].
    ^ count

That will keep the unregister and dictionary in synch ... didn't look to see if this approach is practical ...

Also, it looks like there might be a missing commit if the count doesn't end on an an even 100 boundary ...

Better would be:

    (count \\ 100) isZero ifFalse: [
        platform doCommitTransaction ].

I think ...

@dalehenrich
Copy link
Member Author

Here's the original (buggy code) for WACache>>gemstoneReap:

gemstoneReap
  "Iterate through the cache and remove objects that have expired."

  "In GemStone, this method is performed by a separate maintenance VM, 
     so we are already in transaction (assumed to be running in #autoBegin 
     transactionMode) and do not have to worry about acquiring the TransactionMutex.
    Since we are using reducedConflict dictionaries in the first place, we will remove the keys
    and values from the existing dictionaries without using the mutex."

  | expired count platform |
  expired := OrderedCollection new.
  objectsByKey
    associationsDo: [ :assoc | 
      (self expiryPolicy isExpiredUpdating: assoc value key: assoc key)
        ifTrue: [ 
          self notifyRemoved: assoc value key: assoc key.
          expired add: assoc ] ].
  count := 0.
  platform := GRPlatform current.
  expired
    do: [ :assoc | 
      count := count + 1.
      objectsByKey removeKey: assoc key.
      keysByObject removeKey: assoc value ifAbsent: [  ].
      count \\ 100 == 0
        ifTrue: [ platform doCommitTransaction ] ].
  count ~~ 0
    ifTrue: [ platform doCommitTransaction ].
  ^ expired size

and in my proposed fix, I move the notifyRemoved:key: into the bottom loop:

gemstoneReap
  "Iterate through the cache and remove objects that have expired."

  "In GemStone, this method is performed by a separate maintenance VM, 
     so we are already in transaction (assumed to be running in #autoBegin 
     transactionMode) and do not have to worry about acquiring the TransactionMutex.
    Since we are using reducedConflict dictionaries in the first place, we will remove the keys
    and values from the existing dictionaries without using the mutex."

  | expired count platform |
  expired := OrderedCollection new.
  objectsByKey
    associationsDo: [ :assoc | 
      (self expiryPolicy isExpiredUpdating: assoc value key: assoc key)
        ifTrue: [ 
          expired add: assoc ] ].
  count := 0.
  platform := GRPlatform current.
  expired
    do: [ :assoc | 
      count := count + 1.
      self notifyRemoved: assoc value key: assoc key.
      objectsByKey removeKey: assoc key.
      keysByObject removeKey: assoc value ifAbsent: [  ].
      count \\ 100 == 0
        ifTrue: [ platform doCommitTransaction ] ].
  count ~~ 0
    ifTrue: [ platform doCommitTransaction ].
  ^ expired size

but I hadn't tested that bugfix either...

@marschall
Copy link
Contributor

I just committed Seaside-Core-pmm.861, can you have a look again?

@dalehenrich
Copy link
Member Author

Um, you didn't commit it up on github?

@dalehenrich
Copy link
Member Author

Thanks @marschall.

I still think the final commit logic is wrong:

        count := count + 1.
        (count \\ 100) isZero ifTrue: [
            platform doCommitTransaction ] ].
    (count \\ 100) isZero ifTrue: [
        platform doCommitTransaction ].
    ^ count

I think it needs to look like this:

        count := count + 1.
        (count \\ 100) isZero ifTrue: [
            platform doCommitTransaction ] ].
+   (count \\ 100) isZero ifFalse: [
-   (count \\ 100) isZero ifTrue: [
        platform doCommitTransaction ].
    ^ count

This (count \\ 100) isZero ifTrue: is the same logic used in loop, so if the loop commits on the last count, the final commit will also commit .... if the loop does not commit on the last count, the final commit will not commit as well.

The final commit is there to ensure that any partial work gets committed, so the lgic should invert the in loop logic...

@marschall
Copy link
Contributor

You're correct Seaside-Core-pmm.865

@jbrichau jbrichau modified the milestones: 3.3, 3.2 May 5, 2016
@marschall marschall modified the milestones: 3.2, 3.3 Aug 21, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants