Skip to content
This repository has been archived by the owner on Aug 11, 2021. It is now read-only.

skimdb replication timeout since Nov 24, 2014 #216

Closed
dylang opened this issue Dec 16, 2014 · 4 comments
Closed

skimdb replication timeout since Nov 24, 2014 #216

dylang opened this issue Dec 16, 2014 · 4 comments

Comments

@dylang
Copy link
Contributor

dylang commented Dec 16, 2014

Hi, we've been replicating skimdb.npmjs.com since it was created, but on or around Nov 24th, 2014 it stopped working with a timeout error. We only just noticed this today, whoops, time to add more monitoring.

The following HEAD request times out, maybe that's part of the problem?

curl -i -X HEAD https://skimdb.npmjs.com/registry/

From couchdb:

{
"source": "https://skimdb.npmjs.com/registry",
"target": "registry",
"owner": "npm_mirror",
"_replication_state": "error",
"_replication_state_time": "2014-11-24T15:17:57-05:00",
"_replication_id": "be080068531f59537d57665b92d41620",
"_replication_state_reason": "timeout"
}
@seldo
Copy link

seldo commented Dec 16, 2014

Our operational logs say we rolled out a new SSL cert to the skimdb on November 24th but other than that we have made very few operational changes to skimdb since it was set up.

You're right that the HEAD request is timing out but I'm not sure that it would ever have worked in our current configuration. We can dig into it more.

To eliminate one possibility, can you try relaxing your SSL requirements by messing with verify_ssl_certificates or maybe ssl_certificate_max_depth to see if it's an SSL error? This must be a fairly isolated problem or we would have had many more than 2 error reports since the 24th.

@nowells
Copy link

nowells commented Dec 17, 2014

$ curl -i -k -X HEAD https://skimdb.npmjs.com/registry/
HTTP/1.1 200 OK
Server: CouchDB/1.5.0 (Erlang OTP/R16B03)
Date: Wed, 17 Dec 2014 00:07:40 GMT
Content-Type: text/plain; charset=utf-8
Content-Length: 259
Cache-Control: must-revalidate

# This never returns

Since the HEAD request never completes, our CouchDB 1.6.1 which uses HEAD requests for the replication times out and cannot continue.

@dylang
Copy link
Contributor Author

dylang commented Dec 17, 2014

verify_ssl_certificates

it's off

or maybe ssl_certificate_max_depth

this is set to 10

This must be a fairly isolated problem or we would have had many more than 2 error reports since the 24th.

It took us this long to create this issue because we don't have monitoring for replication failing so we didn't notice it wasn't working until yesterday, but I agree more devs should probably notice a problem by now.

It would be helpful to see a graph of skimdb replicator count on http://status.npmjs.org/ to see if there was a drop on Nov 24, 2014. 😄

@seldo
Copy link

seldo commented Dec 19, 2014

According to #215 this is a problem with older versions of Erlang being unable to process SHA256 SSL certs. This seems a very plausible explanation since that was the change we made -- we replaced our expiring cert with a new SHA256 cert -- and it also explains why only a few people were affected, since most people are running newer versions of erlang.

Traffic on that box has been growing steadily and there was no sudden drop-off on the 24th.

I'm going to close this as I'm pretty sure it's a dupe of #215. Please put further updates over there.

@seldo seldo closed this as completed Dec 19, 2014
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants