Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

puller stuck attempting to sync the same interval and bin from a peer #4337

Closed
istae opened this issue Sep 22, 2023 · 0 comments · Fixed by #4339
Closed

puller stuck attempting to sync the same interval and bin from a peer #4337

istae opened this issue Sep 22, 2023 · 0 comments · Fixed by #4339
Labels
issue needs-triaging new issues that need triaging

Comments

@istae
Copy link
Member

istae commented Sep 22, 2023

⚠️ Support requests in an issue-format will be closed immediately. For support, go to Swarm's Discord.

Context

some peers are failing to send the chunks requested as part of the pullsync protocol.
These chunk addresses were offered by the peer but, when they were requested to send these chunks, the stream is reset.

This can mean that the localstore of the sending peer is corrupted or certain localstore. more specifically reserve storer items are in an inconsistent state.

this line is potentially the cause of the issue https://github.com/ethersphere/bee/blob/master/pkg/pullsync/pullsync.go#L470

This issue is causing the node to request the same sync interval over and over again from the same peer.

Notice that the received offer is around 5K per second, because the puller is stuck in a loop.
image

"time"="2023-09-22 19:47:49.153181" "level"="debug" "logger"="node/pullsync" "msg"="error syncing peer" "peer_address"="8570dbc40de1d4cc18139eeaf00ae418e3111581410240b9ff75a4ee14e13705" "bin"=7 "start"=1 "error"="stream reset"       │
│ "time"="2023-09-22 19:47:49.153207" "level"="debug" "logger"="node/puller" "msg"="syncWorker interval failed" "error"="read delivery: stream reset" "peer_address"="8570dbc40de1d4cc18139eeaf00ae418e3111581410240b9ff75a4ee14e13705" "b │
│ in"=7 "cursor"=3515 "start"=1 "topmost"=0

Some general context:

  1. because of the puller epoch changes introduced, peer reset their sync intervals, so all the intervals were resynced.
  2. the storage radius increased recently, so many unreserve and evict batches occurred, and this can be the culprit that introduced the inconsistent states.
  3. there was a retrievalIndexItem migration in the latest version
@istae istae added the needs-triaging new issues that need triaging label Sep 22, 2023
@bee-runner bee-runner bot added the issue label Sep 22, 2023
@istae istae changed the title General problem description puller stuck attempting to sync the same interval and bin from a peer Sep 22, 2023
@istae istae linked a pull request Sep 25, 2023 that will close this issue
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
issue needs-triaging new issues that need triaging
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant