STCOR-895 wait a loooong time for a "stale" rotation request #1547

zburke · 2024-10-14T21:11:21Z

As part of the RTR lifecycle, we write a rotation timestamp to local storage when the process starts and then remove it when it ends. This is a cheap way of making the rotation request visible across tabs, because all tabs read the same shared storage.

To avoid the problem of a cancelled request leaving cruft in storage, we inspect that timestamp and consider a request "stale" if it's too old. That was the problem here: our "too old" timeout was too short; on a busy server, or on a slow connection, or on a client far from its host (say, in New Zealand), two seconds was not long enough. The rotation request would still be active when stripes considered it "stale", allowing a second request to go through. But since the first request was just slow, not dead, the second one is treated as a token-replay attack by the backend, causing all active sessions for that user account to be immediately terminated.

Thus, waiting longer is a quick fix. A more detailed approach to tracking the rotation request is detailed in the comments for RTR_MAX_AGE.

Refs STCOR-895

As part of the RTR lifecyle, we write a rotation timestamp to local storage when the process starts and then remove it when it ends. This is a cheap way of making the rotation request visible across tabs, because all tabs read the same shared storage. To avoid the problem of a cancelled request leaving cruft in storage, we inspect that timestamp and consider a request "stale" if it's too old. That was the problem here: our "too old" timeout was too short; on a busy server, or on a slow connection, or on a client far from its host (say, in New Zealand), two seconds was not long enough. The rotation request would still be active when stripes considered it "stale", allowing a second request to go through. But since the first request was just slow, not dead, the second one is treated as a token-replay attack by the backend, causing all active sessions for that user account to be immediately terminated. Thus, waiting longer is a quick fix. A more detailed approach to tracking the rotation request is detailed in the comments for RTR_MAX_AGE. Refs STCOR-895

github-actions · 2024-10-14T21:12:46Z

Bigtest Unit Test Results

192 tests ±0 187 ✅ ±0 6s ⏱️ ±0s
1 suites ±0 5 💤 ±0
1 files ±0 0 ❌ ±0

Results for commit 6e6d252. ± Comparison against base commit 0e4d2b4.

♻️ This comment has been updated with latest results.

github-actions · 2024-10-14T21:13:02Z

Jest Unit Test Results

1 files ±0 56 suites ±0 1m 34s ⏱️ +32s
339 tests ±0 339 ✅ ±0 0 💤 ±0 0 ❌ ±0
343 runs ±0 343 ✅ ±0 0 💤 ±0 0 ❌ ±0

Results for commit 6e6d252. ± Comparison against base commit 0e4d2b4.

♻️ This comment has been updated with latest results.

sonarqubecloud · 2024-10-14T21:14:46Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
100.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarCloud

As part of the RTR lifecyle, we write a rotation timestamp to local storage when the process starts and then remove it when it ends. This is a cheap way of making the rotation request visible across tabs, because all tabs read the same shared storage. To avoid the problem of a cancelled request leaving cruft in storage, we inspect that timestamp and consider a request "stale" if it's too old. That was the problem here: our "too old" timeout was too short; on a busy server, or on a slow connection, or on a client far from its host (say, in New Zealand), two seconds was not long enough. The rotation request would still be active when stripes considered it "stale", allowing a second request to go through. But since the first request was just slow, not dead, the second one is treated as a token-replay attack by the backend, causing all active sessions for that user account to be immediately terminated. Thus, waiting longer is a quick fix. A more detailed approach to tracking the request is detailed in the code-comments attached to #1547. Refs STCOR-895

As part of the RTR lifecycle, we write a rotation timestamp to local storage when the process starts and then remove it when it ends. This is a cheap way of making the rotation request visible across tabs, because all tabs read the same shared storage. To avoid the problem of a cancelled request leaving cruft in storage, we inspect that timestamp and consider a request "stale" if it's too old. That was the problem here: our "too old" timeout was too short; on a busy server, or on a slow connection, or on a client far from its host (say, in New Zealand), two seconds was not long enough. The rotation request would still be active when stripes considered it "stale", allowing a second request to go through. But since the first request was just slow, not dead, the second one is treated as a token-replay attack by the backend, causing all active sessions for that user account to be immediately terminated. Thus, waiting longer is a quick fix. A more detailed approach to tracking the request is detailed in the code-comments attached to #1547. Refs STCOR-895

As part of the RTR lifecycle, we write a rotation timestamp to local storage when the process starts and then remove it when it ends. This is a cheap way of making the rotation request visible across tabs, because all tabs read the same shared storage. To avoid the problem of a cancelled request leaving cruft in storage, we inspect that timestamp and consider a request "stale" if it's too old. That was the problem here: our "too old" timeout was too short; on a busy server, or on a slow connection, or on a client far from its host (say, in New Zealand), two seconds was not long enough. The rotation request would still be active when stripes considered it "stale", allowing a second request to go through. But since the first request was just slow, not dead, the second one is treated as a token-replay attack by the backend, causing all active sessions for that user account to be immediately terminated. Thus, waiting longer is a quick fix. A more detailed approach to tracking the request is detailed in the code-comments attached to #1547. Refs STCOR-895 (cherry picked from commit b2083cc)

As part of the RTR lifecycle, we write a rotation timestamp to local storage when the process starts and then remove it when it ends. This is a cheap way of making the rotation request visible across tabs, because all tabs read the same shared storage. To avoid the problem of a cancelled request leaving cruft in storage, we inspect that timestamp and consider a request "stale" if it's too old. That was the problem here: our "too old" timeout was too short; on a busy server, or on a slow connection, or on a client far from its host (say, in New Zealand), two seconds was not long enough. The rotation request would still be active when stripes considered it "stale", allowing a second request to go through. But since the first request was just slow, not dead, the second one is treated as a token-replay attack by the backend, causing all active sessions for that user account to be immediately terminated. Thus, waiting longer is a quick fix. A more detailed approach to tracking the rotation request is detailed in the comments for RTR_MAX_AGE. Refs STCOR-895 (cherry picked from commit cc8ef65)

zburke requested review from JohnC-80, ryandberger, aidynoJ and a team October 14, 2024 21:11

aidynoJ approved these changes Oct 15, 2024

View reviewed changes

zburke merged commit cc8ef65 into master Oct 15, 2024
26 checks passed

zburke deleted the STCOR-895 branch October 15, 2024 14:17

zburke mentioned this pull request Oct 15, 2024

STCOR-895 wait a loooong time for a "stale" rotation request #1548

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

STCOR-895 wait a loooong time for a "stale" rotation request #1547

STCOR-895 wait a loooong time for a "stale" rotation request #1547

zburke commented Oct 14, 2024 •

edited

Loading

github-actions bot commented Oct 14, 2024 •

edited

Loading

github-actions bot commented Oct 14, 2024 •

edited

Loading

sonarqubecloud bot commented Oct 14, 2024

STCOR-895 wait a loooong time for a "stale" rotation request #1547

STCOR-895 wait a loooong time for a "stale" rotation request #1547

Conversation

zburke commented Oct 14, 2024 • edited Loading

github-actions bot commented Oct 14, 2024 • edited Loading

Bigtest Unit Test Results

github-actions bot commented Oct 14, 2024 • edited Loading

Jest Unit Test Results

sonarqubecloud bot commented Oct 14, 2024

Quality Gate passed

zburke commented Oct 14, 2024 •

edited

Loading

github-actions bot commented Oct 14, 2024 •

edited

Loading

github-actions bot commented Oct 14, 2024 •

edited

Loading