puller stuck attempting to sync the same interval and bin from a peer #4337

istae · 2023-09-22T19:58:38Z

⚠️ Support requests in an issue-format will be closed immediately. For support, go to Swarm's Discord.

Context

some peers are failing to send the chunks requested as part of the pullsync protocol.
These chunk addresses were offered by the peer but, when they were requested to send these chunks, the stream is reset.

This can mean that the localstore of the sending peer is corrupted or certain localstore. more specifically reserve storer items are in an inconsistent state.

this line is potentially the cause of the issue https://github.com/ethersphere/bee/blob/master/pkg/pullsync/pullsync.go#L470

This issue is causing the node to request the same sync interval over and over again from the same peer.

Notice that the received offer is around 5K per second, because the puller is stuck in a loop.

"time"="2023-09-22 19:47:49.153181" "level"="debug" "logger"="node/pullsync" "msg"="error syncing peer" "peer_address"="8570dbc40de1d4cc18139eeaf00ae418e3111581410240b9ff75a4ee14e13705" "bin"=7 "start"=1 "error"="stream reset"       │
│ "time"="2023-09-22 19:47:49.153207" "level"="debug" "logger"="node/puller" "msg"="syncWorker interval failed" "error"="read delivery: stream reset" "peer_address"="8570dbc40de1d4cc18139eeaf00ae418e3111581410240b9ff75a4ee14e13705" "b │
│ in"=7 "cursor"=3515 "start"=1 "topmost"=0

Some general context:

because of the puller epoch changes introduced, peer reset their sync intervals, so all the intervals were resynced.
the storage radius increased recently, so many unreserve and evict batches occurred, and this can be the culprit that introduced the inconsistent states.
there was a retrievalIndexItem migration in the latest version

The text was updated successfully, but these errors were encountered:

istae added the needs-triaging new issues that need triaging label Sep 22, 2023

bee-runner bot added the issue label Sep 22, 2023

istae changed the title ~~General problem description~~ puller stuck attempting to sync the same interval and bin from a peer Sep 22, 2023

istae mentioned this issue Sep 22, 2023

fix(pullsync): swallow and log process want reserve get errs #4339

Merged

4 tasks

istae linked a pull request Sep 25, 2023 that will close this issue

fix(pullsync): swallow and log process want reserve get errs #4339

Merged

4 tasks

istae closed this as completed in #4339 Sep 25, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

puller stuck attempting to sync the same interval and bin from a peer #4337

puller stuck attempting to sync the same interval and bin from a peer #4337

istae commented Sep 22, 2023 •

edited

Loading

puller stuck attempting to sync the same interval and bin from a peer #4337

puller stuck attempting to sync the same interval and bin from a peer #4337

Comments

istae commented Sep 22, 2023 • edited Loading

Context

istae commented Sep 22, 2023 •

edited

Loading