-
Notifications
You must be signed in to change notification settings - Fork 87
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SimpleHTTPResolver is getting 404 for every non-cached IIIF request #142
Comments
@regisrob - The extra requests are from a thread here: #98 . It starts with the comment titled "scande3 commented on Nov 6, 2014" (near the middle) and goes for a few replies about the issue. From the @jpstroop response, it is setup that was due to difficulty resolving the url in the IIIF specification. The best I was able to do was check the local cache first before attempting a request against the remote server. This is due to that call occurring in the "dissect_uri" method of resolver here: https://github.com/pulibrary/loris/blob/development/loris/webapp.py#L342 before any attempt to parse out if there are valid url parameters. As such, regardless of provider, it tries to see if the full url is an identifier itself first (which isn't then in the cache, obviously). However... if someone wanted to rewrite the "dissect_uri" logic, then ideally it would attempt only using the full url as a last resort. This means the only extra HTTP request would occur from a malformed request to verify that the entire request wasn't a valid uri. The reason I didn't tackle this as that would require some significant re-engineering of how the application handles errors within a resolver to be feasible. In terms of server overload, the extra "does it exist" requests aren't causing any issues for us. You can try it out with OpenSeadragon at http://www.digitalcommonwealth.org. What is a server overload risk is a situation we had happen to us the other week: for some reason, Biblioboards decided to request the full JPG image of every single one of our 100,000+ objects that actually exist in our system (so not including objects we only have harvested metadata for). As only the JP2 exists, the image server needs to convert those in real time, and that quickly ate up hard drive space and slowed things down significantly for other users. We haven't yet figured out a way to prevent this from occurring again or how to handle such a situation gracefully. |
@scande3 See #141, re: full size images. I forget who it was that asked about this in the past, but, FWIW, it's definitely come up before. I'd love to get to it, but my time is severely limited for about the next 6 months. Do you have any thoughts about where to implement it? Maybe in the resolver logic? |
@scande3 Thanks a lot for your comprehensive answer, now I think I understood the logic of having this extra request. |
@jpstroop - Unsure where to implement it just yet. In theory, the solution is the same for all of the resolvers (try the full request as a uri if it appears to be a bad request). Whether that is part of the base Resolver class or done above the Resolver level by catching errors to try the full uri in certain cases, I am not sure? I may be able to work on it in a few weeks. It isn't a significant issue in that the performance hit of those extra requests is fairly minor which leaves it as a low priority. But it is not optimal and does add to one's logs. |
Sorry, I was talking about the problem you had of someone requesting all of your full-size images, and somehow adding a config option that restricts sizes to an upper-boundary, n% of the long dimension or something like that. The reason for putting the logic in the resolver would be that a fancier resolver implementation might want to change this behavior by image or even based on user credentials. |
@regisrob are you able to give this a try again? I think PRs 251 and 255 may have helped with the extra hits to the remote server. |
Thank you @bcail, I will do my best to give it a try asap |
The Loris log file shows that the SimpleHTTPResolver by @scande3 is sending requests to the remote server for every single IIIF request that has not been previously cached.
Considering the docstring in resolver.py, I assumed that the resolver was making only one http request to retrieve the source image, copying it into the local cache, and using that local copy for every subsequent IIIF request sharing the same identifier.
But if you look at the log file below (in this example image identifer is "B452346101_C102/ecran/B452346101_C102_0005.jpg"), a new request returning 404 is sent every time you change a IIIF parameter (response will always be 404 since the remote server is not IIIF-enabled, by definition)
@jpstroop @scande3 : is it a normal behaviour? Isn't there a risk of server overload? (above all if you intend to use a viewer like OpenSeadragon which sends dozens of requests for each image)
The text was updated successfully, but these errors were encountered: