ECONNRESET when uploading a large file. #27

drob · 2014-05-29T19:24:55Z

I'm getting ECONNRESET errors when uploading a 350mb file with knox-mpu.

In particular:

{
    "part": 1,
    "message": {
        "code": "ECONNRESET",
        "errno": "ECONNRESET",
        "syscall": "read"
    }
}

The part specified is different each time but is always between 1 and 4. (I am using the default batchSize of 4.)

Is there any other info that would be helpful in debugging this?

The text was updated successfully, but these errors were encountered:

mikermcneil · 2014-05-31T00:47:46Z

@drob hey man, just a guess, but have you tried:

// Max # of miliseconds client sockets (i.e. for our purposes: **requests**) should be allowed to stay connected to this particular route.
// 0 = infinite
res.setTimeout(0);

(see pillarjs/multiparty#49 (comment) for details)

luccastera · 2014-06-03T22:43:29Z

I'm getting these errors as well.

Could it be caused by S3 rate limiting? See http://blog.blitline.com/post/29157492002/things-to-know-about-s3 or Automattic/knox#199

mikermcneil · 2014-06-07T02:16:41Z

OK- so what I posted before is really the solution for a different issue, involving aborted requests (although you'll want to consider it as well). As for the issue at hand, here's the best of my understanding atm:

ECONNRESET started showing up in node 0.10- it was stifled before that. It seems to be improved in 0.11, but it will still sometimes fire. It seems that the situation will be greatly improved in Node v0.12, but that doesn't help us now.

Anyways, ECONNRESET originates when a TCP client receives an unexpected RST signal -or- potentially (not sure on this) even if it receives a FIN before an expected ACK from an earlier SYN. Furthermore this seems to be an unavoidable result of dealing with S3, at least for the moment. This very well may be b/c of what @Dambalah just pointed out:

Could it be caused by S3 rate limiting?

So nonetheless, the question becomes "how do we address it?" @sgress454 put together a workaround, for which we're going to send another PR to knox-mpu soon (hopefully by Monday at the latest). We saw promising results in a test of a 160MB file upload, and just need to take it out for a few more spins. Essentially, the reason knox is crashing on ECONNRESET is two-fold:

The res stream here needs an .on('error', ...) handler.
The existing .on('error', ...) handler for the knox client itself (here) needs a condition variable to make sure the callback to batch (or if you're using @dustMason's fork, async) is called only once.

Btw, here's some additional background on ECONNRESET in case anyone smarter than me comes along and knows more about what's going on here :)

From http://stackoverflow.com/questions/17245881/node-js-econnreset:

"ECONNRESET" means the other side of the TCP conversation abruptly closed its end of the connection. This is most probably due to one or more application protocol errors. You could look at the API server logs to see if it complains about something.

Sources:

mikermcneil · 2014-06-07T02:17:40Z

The res stream here needs an .on('error', ...) handler.

The existing .on('error', ...) handler for the knox client itself (here) needs a condition variable to make sure the callback to batch (or if you're using @dustMason's fork, async) is called only once.

@nathanoehlman are you cool w/ merging fixes to those two things?

dustMason · 2014-06-07T23:07:57Z

@mikermcneil Thanks for looking deeply into this one! I think your 2 suggestions are spot on.

sgress454 · 2014-07-07T23:18:17Z

Update: it doesn't appear that adding the .on('error') handler for the response stream prevents the ECONNRESET errors from occurring. However, our workaround involving checking that the callback is only called once was successful in handling the issue.

…x-MPU are only called once. In some cases, even after a part has been successfully uploaded, S3 will send a response error, which currently causes Knox-MPU to consider the part a failure and either retry it or bail completely. This fix causes Knox-MPU to ignore errors that come in on the response stream after it has already received a "success" status code. See nathanoehlman#27 (comment)

drob · 2014-07-25T01:36:13Z

Fwiw, adding a maxRetries setting to my uploads fixed this issue for me. (That option wasn't documented when I first started using knox-mpu.)

I'm not sure how to fix the underlying issue, though, or if there even is one. (If I'm uploading a 350mb file, it's reasonable for one of the chunks to fail at some point, right?)

Is there a philosophical reason a default maxRetries of 3, e.g., might not be preferable?

sgress454 · 2014-07-25T01:40:41Z

The underlying issue is that sometimes a chunk will upload successfully, but later send an ECONNRESET error anyway. The knox-mpu code handles this by declaring that the chunk was invalid and retrying it, or by failing altogether, when really the event should just be ignored.

mikermcneil · 2014-07-26T12:11:47Z

To add to that, node <= 0.8 didn't even used to announce these sorts of tcp errors-- it has to do with unexpected packets being received after sending the FIN, eg if an ACK is late, but still arrives, or s3 tries to give us more data than we wanted and shoots over an extra SYN or whatever

Mike's phone

On Jul 24, 2014, at 20:40, sgress454 [email protected] wrote:

The underlying issue is that sometimes a chunk will upload successfully, but later send an ECONNRESET error anyway. The knox-mpu code handles this by declaring that the chunk was invalid and retrying it, or by failing altogether, when really the event should just be ignored.

—
Reply to this email directly or view it on GitHub.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ECONNRESET when uploading a large file. #27

ECONNRESET when uploading a large file. #27

drob commented May 29, 2014

mikermcneil commented May 31, 2014

luccastera commented Jun 3, 2014

mikermcneil commented Jun 7, 2014

mikermcneil commented Jun 7, 2014

dustMason commented Jun 7, 2014

sgress454 commented Jul 7, 2014

drob commented Jul 25, 2014

sgress454 commented Jul 25, 2014

mikermcneil commented Jul 26, 2014

ECONNRESET when uploading a large file. #27

ECONNRESET when uploading a large file. #27

Comments

drob commented May 29, 2014

mikermcneil commented May 31, 2014

luccastera commented Jun 3, 2014

mikermcneil commented Jun 7, 2014

mikermcneil commented Jun 7, 2014

dustMason commented Jun 7, 2014

sgress454 commented Jul 7, 2014

drob commented Jul 25, 2014

sgress454 commented Jul 25, 2014

mikermcneil commented Jul 26, 2014