Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bad status #30

Open
childree opened this issue Nov 6, 2012 · 9 comments
Open

bad status #30

childree opened this issue Nov 6, 2012 · 9 comments
Labels

Comments

@childree
Copy link

childree commented Nov 6, 2012

When looking at the XMLresolution log we can see the number of times these errors have occurred:

$ grep " 500" xmlresolution.log

2012 Oct 14 10:22:11 fclnx30 XmlResolution[9179]:  INFO xmlresolution.fda.fcla.edu: Rack:     128.***.***.*** - - [14/Oct/2012 10:22:11] "GET /ieids/E20110314_AAABJG/ " 500 24 0.0682
2012 Oct 14 14:27:02 fclnx30 XmlResolution[9177]:  INFO xmlresolution.fda.fcla.edu: Rack:     128.***.***.*** - - [14/Oct/2012 14:27:02] "GET /ieids/E20110315_AAAAMP/ " 500 24 ...etc...

Thrown Errors:

Number of Cases: 112
Action(s): GET, POST

I'm currently working on more statistical information to include in this ticket that may help troubleshoot the issue.

Thank you,
Jen

@iterman
Copy link
Contributor

iterman commented Nov 7, 2012

tar_writer.rb is trying to access a deleted schema. Code will be developed to prevent this situation.

@lydiam
Copy link

lydiam commented Nov 7, 2012

This is a new bug, according to Ira, that happens when a schema gets deleted and the next package needs that schema and tries to read it. It happens with GETs. The solution is to retain the schemas directory and only delete/recreate it when xmlresolution is restarted.

Jen sees the logs reflecting the problem on both GETs and POSTs. Ira will look for POSTs in the log.

@lydiam
Copy link

lydiam commented Nov 7, 2012

Carol suggests doing an xmlresolution code review in the near future.

@childree
Copy link
Author

childree commented Nov 8, 2012

I'm adding further analysis into this issue to this ticket due to the number that occurred today alone.

ONLY the following schema are what XMLresolution is attempting to GET/POST when this error is thrown:

01490ebdea13c1bc82a17e4783daeeaa
1fadeaf88d4b93ab263f7c59917c26bc
2b2f6040cfc603d5873d7fa0bf976274
42519c72a741cc30e256b99369f1d735
447039d87705b9734e4fad11295eaa0b
534d7d1e9b53ece0bf0f5874444d8bcb
5e0bd6f94ec78a3a88fca2275ab05f9e
712fc5a7750e69f904f61086a997713c
7f0fd51a2a1490bbd68e5a68e7fc1738
99a72e44689e334ea8b851260347cf8e
d90774b02fa694f3b358b4ed828295be

Of this list, I still need to identify what these schema are:

447039d87705b9734e4fad11295eaa0b
534d7d1e9b53ece0bf0f5874444d8bcb
7f0fd51a2a1490bbd68e5a68e7fc1738
99a72e44689e334ea8b851260347cf8e

The following schema have been identified as follows:

01490ebdea13c1bc82a17e4783daeeaa = xlink.xsd 
1fadeaf88d4b93ab263f7c59917c26bc = XMLSchema.xsd
2b2f6040cfc603d5873d7fa0bf976274 = daitss.xsd
42519c72a741cc30e256b99369f1d735 = mets.xsd
5e0bd6f94ec78a3a88fca2275ab05f9e = xml.xsd (2001/03)
712fc5a7750e69f904f61086a997713c = xml.xsd (2001)
d90774b02fa694f3b358b4ed828295be = simpledc20021212.xsd

Analysis:

Key:
FAILURE= The schema worked on at time of failure
SUCCESS= The schema that was successful
TEST= My manual curl of the schema and calculation of md5sum to validate the SUCCESS

01490ebdea13c1bc82a17e4783daeeaa:
FAILURE:<schema md5="01490ebdea13c1bc82a17e4783daeeaa" last_modified="2007-08-23T15:02:01-04:00" namespace="http://www.w3.org/1999/xlink" location="http://www.loc.gov/standards/xlink/xlink.xsd" status="success"/>

SUCCESS: <schema md5="6bdc7f9459a502964f889d70a335cece" last_modified="2007-08-23T15:02:01-04:00" namespace="http://www.w3.org/1999/xlink" location="http://www.loc.gov/standards/xlink/xlink.xsd" status="success"/>

TEST:
$ curl http://www.loc.gov/standards/xlink/xlink.xsd > xlink.xsd
$ md5sum xlink.xsd
$ 6bdc7f9459a502964f889d70a335cece xlink.xsd

1fadeaf88d4b93ab263f7c59917c26bc
FAILURE: <schema md5="1fadeaf88d4b93ab263f7c59917c26bc" last_modified="2004-03-20T07:53:09-05:00" namespace="http://www.w3.org/2001/XMLSchema" location="http://www.w3.org/2001/XMLSchema.xsd" status="success"/>
SUCCESS: <schema md5="94ed1a93ce3147d01bcb2fc1126255ed" last_modified="2004-03-20T07:53:09-05:00" namespace="http://www.w3.org/2001/XMLSchema" location="http://www.w3.org/2001/XMLSchema.xsd" status="success"/>
TEST:
$ curl http://www.w3.org/2001/XMLSchema.xsd > XMLSchema.xsd
$ md5sum XMLSchema.xsd
$ 94ed1a93ce3147d01bcb2fc1126255ed XMLSchema.xsd

2b2f6040cfc603d5873d7fa0bf976274
FAILURE: <schema md5="2b2f6040cfc603d5873d7fa0bf976274" last_modified="2012-05-30T14:05:46-04:00" namespace="http://www.fcla.edu/dls/md/daitss/" location="http://www.fcla.edu/dls/md/daitss/daitss.xsd" status="success"/>
SUCCESS: <schema md5="a2aa0a4a13503457317d2a94a4e8b038" last_modified="2012-05-30T14:05:46-04:00" namespace="http://www.fcla.edu/dls/md/daitss/" location="http://www.fcla.edu/dls/md/daitss/daitss.xsd" status="success"/>
TEST:
$ curl http://www.fcla.edu/dls/md/daitss/daitss.xsd > daitss.xsd
$ md5sum daitss.xsd
$ a2aa0a4a13503457317d2a94a4e8b038 daitss.xsd

42519c72a741cc30e256b99369f1d735
FAILURE: <schema md5="42519c72a741cc30e256b99369f1d735" last_modified="2012-03-05T12:02:18-05:00" namespace="http://www.loc.gov/METS/" location="http://www.loc.gov/standards/mets/mets.xsd" status="success"/>
SUCCESS: <schema md5="b8a3efa3d4a9ae8918f4abb1f53bc08f" last_modified="2012-03-05T12:02:18-05:00" namespace="http://www.loc.gov/METS/" location="http://www.loc.gov/standards/mets/mets.xsd" status="success"/>
TEST:
$ curl http://www.loc.gov/standards/mets/mets.xsd > mets.xsd
$ md5sum mets.xsd
$ b8a3efa3d4a9ae8918f4abb1f53bc08f mets.xsd

5e0bd6f94ec78a3a88fca2275ab05f9e
FAILURE: <schema md5="5e0bd6f94ec78a3a88fca2275ab05f9e" last_modified="2009-01-21T17:06:40-05:00" namespace="http://www.w3.org/XML/1998/namespace" location="http://www.w3.org/2001/xml.xsd" status="success"/>
SUCCESS: <schema md5="bf97e27bdd02f7031a8a71ea4d229daf" last_modified="2009-01-21T17:06:40-05:00" namespace="http://www.w3.org/XML/1998/namespace" location="http://www.w3.org/2001/xml.xsd" status="success"/>
TEST:
$ curl http://www.w3.org/2001/xml.xsd > xml.xsd
$ md5sum xml.xsd
$ bf97e27bdd02f7031a8a71ea4d229daf xml.xsd

712fc5a7750e69f904f61086a997713c
FAILURE:<schema md5="712fc5a7750e69f904f61086a997713c" last_modified="2004-03-31T12:57:18-05:00" namespace="http://www.w3.org/XML/1998/namespace" location="http://www.w3.org/2001/03/xml.xsd" status="success"/>
SUCCESS:<schema md5="2e2cf9072dc058dcda41b7ee77a5cb54" last_modified="2004-03-31T12:57:18-05:00" namespace="http://www.w3.org/XML/1998/namespace" location="http://www.w3.org/2001/03/xml.xsd" status="success"/>
TEST:
$ curl http://www.w3.org/2001/03/xml.xsd > xml.xsd
$ md5sum xml.xsd
$ 2e2cf9072dc058dcda41b7ee77a5cb54 xml.xsd

d90774b02fa694f3b358b4ed828295be
FAILURE: <schema md5="d90774b02fa694f3b358b4ed828295be" last_modified="2012-08-21T17:14:26-04:00" namespace="http://purl.org/dc/elements/1.1/" location="http://dublincore.org/schemas/xmls/simpledc20021212.xsd" status="success"/>
SUCCESS: <schema md5="afd985136a7e721cfafa062287a27f45" last_modified="2012-08-23T15:33:48-04:00" namespace="http://purl.org/dc/elements/1.1/" location="http://dublincore.org/schemas/xmls/simpledc20021212.xsd" status="success"/>
TEST:
$ curl http://dublincore.org/schemas/xmls/simpledc20021212.xsd > simpledc20021212.xsd
$ md5sum simpledc20021212.xsd
$ afd985136a7e721cfafa062287a27f45 simpledc20021212.xsd

Conclusion:

It would appear that XMLresolution is somehow obtaining an outdated copy of schema or is in some way corrupting those that are failing. However, since the identical FAILURE md5sum is seen multiple times throughout the xmlresolution.log files, I would say that XMLresolution is not getting the most recent schema, intermittently. We need to look at our cache and squid to determine how old schema are being pulled down.

@lydiam
Copy link

lydiam commented Nov 8, 2012

The FAILURE checksums appear to be the ones computed on the schema name, and the SUCCESS checksums are the ones computed on the contents.

@childree
Copy link
Author

Yes, so perhaps as we speculated Thursday evening, there may be two versions of the code running that is causing this issue to crop up and not that the schema are corrupted or incorrect. I think the next step will be to identify the exact code that is performing this method and correct it to the hash value of the filename and content combined, as described in #14.

@childree
Copy link
Author

Actually, I'm not quite sure this is true. Let's use simpledc20021212.xsd as an example. Here is the original information from above about this schema:

d90774b02fa694f3b358b4ed828295be
FAILURE: <schema md5="d90774b02fa694f3b358b4ed828295be" last_modified="2012-08-21T17:14:26-04:00" namespace="http://purl.org/dc/elements/1.1/" location="http://dublincore.org/schemas/xmls/simpledc20021212.xsd" status="success"/>
SUCCESS: <schema md5="afd985136a7e721cfafa062287a27f45" last_modified="2012-08-23T15:33:48-04:00" namespace="http://purl.org/dc/elements/1.1/" location="http://dublincore.org/schemas/xmls/simpledc20021212.xsd" status="success"/>
TEST:
$ curl http://dublincore.org/schemas/xmls/simpledc20021212.xsd > simpledc20021212.xsd
$ md5sum simpledc20021212.xsd
$ afd985136a7e721cfafa062287a27f45 simpledc20021212.xsd

The md5sum of the content is:
$ md5sum simpledc20021212.xsd
$ afd985136a7e721cfafa062287a27f45 simpledc20021212.xsd

The md5sum of the schema name string is:
$ echo -n simpledc20021212.xsd | md5sum
$ 049eb3434affd13dd872b188588ec7af -

Even if we include the newline, the md5sum of the schema name string is:
$ echo simpledc20021212.xsd | md5sum
$ 21cdc0bb3bfcc584db89e639d53411f1 -

We're still left with not knowing how d90774b02fa694f3b358b4ed828295be was derived. As it turns out, this md5sum is the entire string of the location of the schema:
$ echo -n http://dublincore.org/schemas/xmls/simpledc20021212.xsd | md5sum
$ d90774b02fa694f3b358b4ed828295be -

Perhaps this was known but I find it interesting that the entire location of the schema is being used.

@childree
Copy link
Author

It appears I've misunderstood what should be happening. In production, the md5sum of the schema should be calculated on the URL string and is stored in the manifest.xml file within the xmlres-* directory of the AIP. Apologies for the confusion.

@lydiam
Copy link

lydiam commented Nov 15, 2012

Per this morning's meeting: it appears that this problem was caused by code implemented relating to #17, where schemas are deleted when no collection references the schema.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants