Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

STCOR-895 wait a loooong time for a "stale" rotation request #1547

Merged
merged 1 commit into from
Oct 15, 2024

Conversation

zburke
Copy link
Member

@zburke zburke commented Oct 14, 2024

As part of the RTR lifecycle, we write a rotation timestamp to local storage when the process starts and then remove it when it ends. This is a cheap way of making the rotation request visible across tabs, because all tabs read the same shared storage.

To avoid the problem of a cancelled request leaving cruft in storage, we inspect that timestamp and consider a request "stale" if it's too old. That was the problem here: our "too old" timeout was too short; on a busy server, or on a slow connection, or on a client far from its host (say, in New Zealand), two seconds was not long enough. The rotation request would still be active when stripes considered it "stale", allowing a second request to go through. But since the first request was just slow, not dead, the second one is treated as a token-replay attack by the backend, causing all active sessions for that user account to be immediately terminated.

Thus, waiting longer is a quick fix. A more detailed approach to tracking the rotation request is detailed in the comments for RTR_MAX_AGE.

Refs STCOR-895

As part of the RTR lifecyle, we write a rotation timestamp to local
storage when the process starts and then remove it when it ends. This
is a cheap way of making the rotation request visible across tabs,
because all tabs read the same shared storage.

To avoid the problem of a cancelled request leaving cruft in storage, we
inspect that timestamp and consider a request "stale" if it's too old.
That was the problem here: our "too old" timeout was too short; on a
busy server, or on a slow connection, or on a client far from its host
(say, in New Zealand), two seconds was not long enough. The rotation
request would still be active when stripes considered it "stale",
allowing a second request to go through. But since the first request
was just slow, not dead, the second one is treated as a token-replay
attack by the backend, causing all active sessions for that user account
to be immediately terminated.

Thus, waiting longer is a quick fix. A more detailed approach to
tracking the rotation request is detailed in the comments for
RTR_MAX_AGE.

Refs STCOR-895
@zburke zburke requested review from JohnC-80, ryandberger, aidynoJ and a team October 14, 2024 21:11
Copy link

github-actions bot commented Oct 14, 2024

Bigtest Unit Test Results

192 tests  ±0   187 ✅ ±0   6s ⏱️ ±0s
  1 suites ±0     5 💤 ±0 
  1 files   ±0     0 ❌ ±0 

Results for commit 6e6d252. ± Comparison against base commit 0e4d2b4.

♻️ This comment has been updated with latest results.

Copy link

github-actions bot commented Oct 14, 2024

Jest Unit Test Results

  1 files  ±0   56 suites  ±0   1m 34s ⏱️ +32s
339 tests ±0  339 ✅ ±0  0 💤 ±0  0 ❌ ±0 
343 runs  ±0  343 ✅ ±0  0 💤 ±0  0 ❌ ±0 

Results for commit 6e6d252. ± Comparison against base commit 0e4d2b4.

♻️ This comment has been updated with latest results.

Copy link

zburke added a commit that referenced this pull request Oct 15, 2024
As part of the RTR lifecyle, we write a rotation timestamp to local
storage when the process starts and then remove it when it ends. This
is a cheap way of making the rotation request visible across tabs,
because all tabs read the same shared storage.

To avoid the problem of a cancelled request leaving cruft in storage, we
inspect that timestamp and consider a request "stale" if it's too old.
That was the problem here: our "too old" timeout was too short; on a
busy server, or on a slow connection, or on a client far from its host
(say, in New Zealand), two seconds was not long enough. The rotation
request would still be active when stripes considered it "stale",
allowing a second request to go through. But since the first request
was just slow, not dead, the second one is treated as a token-replay
attack by the backend, causing all active sessions for that user account
to be immediately terminated.

Thus, waiting longer is a quick fix. A more detailed approach to
tracking the request is detailed in the code-comments attached to #1547.

Refs STCOR-895
@zburke zburke merged commit cc8ef65 into master Oct 15, 2024
26 checks passed
@zburke zburke deleted the STCOR-895 branch October 15, 2024 14:17
zburke added a commit that referenced this pull request Oct 21, 2024
As part of the RTR lifecycle, we write a rotation timestamp to local
storage when the process starts and then remove it when it ends. This
is a cheap way of making the rotation request visible across tabs,
because all tabs read the same shared storage.

To avoid the problem of a cancelled request leaving cruft in storage, we
inspect that timestamp and consider a request "stale" if it's too old.
That was the problem here: our "too old" timeout was too short; on a
busy server, or on a slow connection, or on a client far from its host
(say, in New Zealand), two seconds was not long enough. The rotation
request would still be active when stripes considered it "stale",
allowing a second request to go through. But since the first request
was just slow, not dead, the second one is treated as a token-replay
attack by the backend, causing all active sessions for that user account
to be immediately terminated.

Thus, waiting longer is a quick fix. A more detailed approach to
tracking the request is detailed in the code-comments attached to #1547.

Refs STCOR-895
zburke added a commit that referenced this pull request Oct 22, 2024
As part of the RTR lifecycle, we write a rotation timestamp to local
storage when the process starts and then remove it when it ends. This
is a cheap way of making the rotation request visible across tabs,
because all tabs read the same shared storage.

To avoid the problem of a cancelled request leaving cruft in storage, we
inspect that timestamp and consider a request "stale" if it's too old.
That was the problem here: our "too old" timeout was too short; on a
busy server, or on a slow connection, or on a client far from its host
(say, in New Zealand), two seconds was not long enough. The rotation
request would still be active when stripes considered it "stale",
allowing a second request to go through. But since the first request
was just slow, not dead, the second one is treated as a token-replay
attack by the backend, causing all active sessions for that user account
to be immediately terminated.

Thus, waiting longer is a quick fix. A more detailed approach to
tracking the request is detailed in the code-comments attached to #1547.

Refs STCOR-895

(cherry picked from commit b2083cc)
zburke added a commit that referenced this pull request Oct 30, 2024
As part of the RTR lifecycle, we write a rotation timestamp to local
storage when the process starts and then remove it when it ends. This
is a cheap way of making the rotation request visible across tabs,
because all tabs read the same shared storage.

To avoid the problem of a cancelled request leaving cruft in storage, we
inspect that timestamp and consider a request "stale" if it's too old.
That was the problem here: our "too old" timeout was too short; on a
busy server, or on a slow connection, or on a client far from its host
(say, in New Zealand), two seconds was not long enough. The rotation
request would still be active when stripes considered it "stale",
allowing a second request to go through. But since the first request
was just slow, not dead, the second one is treated as a token-replay
attack by the backend, causing all active sessions for that user account
to be immediately terminated.

Thus, waiting longer is a quick fix. A more detailed approach to
tracking the rotation request is detailed in the comments for
RTR_MAX_AGE.

Refs STCOR-895

(cherry picked from commit cc8ef65)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants