-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for S3 through the REST API #118
Conversation
This is currently using the old S3 authentication (https://s3.amazonaws.com/doc/s3-developer-guide/RESTAuthentication.html), in order to support Exoscale Storage Refs #112
This prevents doing a network request when we would return a 304 anyway
Also add specs to check for the response headers
No need to hit the storage backend, we can set the response headers from the data in Redis
S3 and Swift now run the same specs. The only difference is the before block that defines the stubbed HTTP requests and the responses from the Swift and S3 servers
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall I think this looks really good. Left some improvement suggestions.
config.yml.example.s3
Outdated
# # Redis is needed for the swift backend | ||
# redis: | ||
# host: localhost | ||
# port: 6379 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this commented out in the example file?
lib/remote_storage/rest_provider.rb
Outdated
end | ||
end | ||
|
||
not_found = !try_to_delete(url) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the exclamation mark on the method call, this reads like "don't try to delete". I would rather do found = try_to_delete(url)
and then use !found
in the condition below.
lib/remote_storage/rest_provider.rb
Outdated
raise NotImplementedError | ||
end | ||
|
||
def set_response_headers(response) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be better if this method would rather only get the actual hash containing the values as param instead of the whole response. The method doesn't need to know that it needs to access the values via response.headers
. Instead the caller should call it via set_response_headers(response.headers)
.
lib/remote_storage/rest_provider.rb
Outdated
|
||
def metadata_changed?(old_metadata, new_metadata) | ||
# check metadata relevant to the directory listing | ||
# ie. the timestamp (m) is not relevant, because it's not used in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that the directory listing contains the last-modified date, the timestamp needs to be checked here as well.
lib/remote_storage/s3_rest.rb
Outdated
require "webrick/httputils" | ||
|
||
module RemoteStorage | ||
class S3Rest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why call this "S3Rest" instead of just "S3"?. The Swift provider isn't called "SwiftRest" either.
lib/remote_storage/s3_rest.rb
Outdated
end | ||
end | ||
|
||
# S3 does not return a Last-Modified response header on PUTs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this comment rather belongs to the line that does the HEAD request, to explain why there is an additional request.
lib/remote_storage/s3_rest.rb
Outdated
|
||
# This is using the S3 authorizations, not the newer AW V4 Signatures | ||
# (https://s3.amazonaws.com/doc/s3-developer-guide/RESTAuthentication.html) | ||
def authorization_headers_for(http_verb, md5, content_type, url) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if the param signature should be rather http_verb, url, md5, content_type
. There are multiple (if not even the majority) calls to this method that just give an empty string for md5
and content_type
. This way they could be optional and the caller would just omit them.
to_return(status: 404) | ||
end | ||
|
||
it_behaves_like 'a REST adapter' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice ❤️
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup! 👍
spec/s3/app_spec.rb
Outdated
@@ -0,0 +1,56 @@ | |||
require_relative "../spec_helper" | |||
|
|||
describe "App" do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about using "S3 provider" instead of "App"?
spec/swift/app_spec.rb
Outdated
@@ -1,827 +1,56 @@ | |||
require_relative "../spec_helper" | |||
|
|||
describe "App" do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above, a more meaningful description like "Swift provider" would be better.
Make content_type and md5 optional (set to nil by default)
Also adds a spec for it
It wasn't a bug, just wrong metadata for my files on staging from a previous version of the code |
@galfert I pushes changes based on your comments. This is now deployed to the staging server |
Changes look good to me. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great overall, but I did find a few things I wouldn't merge like this.
@@ -21,3 +19,6 @@ notifications: | |||
- http://hook-juggler.herokuapp.com/hooks/travis | |||
on_success: always | |||
on_failure: always | |||
env: | |||
- BACKEND=s3 | |||
- BACKEND=swift |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't the second one overwriting the first one here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's setting a build matrix, creating one build for the s3 backend and one for the swift backend
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see. Didn't read that far in their docs.
config.yml.example.s3
Outdated
access_key_id: "" | ||
secret_key_id: "" | ||
bucket: "test-bucket" | ||
# Redis is needed for the swift backend |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment seems wrong here. In fact, all backends need Redis anyway, so it can probably be deleted.
bucket: "test-bucket" | ||
redis: | ||
host: localhost | ||
port: 6379 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the same config as the default above, so it's not necessary.
config.yml.example.swift
Outdated
# uncomment this section | ||
swift: &swift_defaults | ||
host: "https://swift.example.com" | ||
# Redis is needed for the swift backend |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to above, except Swift is correct here. But not necessary.
config.yml.example.swift
Outdated
# redis: | ||
# host: localhost | ||
# port: 6379 | ||
# uncomment this section |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The section is already uncommented? I think now that the examples are seperate, this comment is obsolete.
lib/remote_storage/s3.rb
Outdated
|
||
def do_put_request_and_return_etag_and_last_modified(url, data, content_type) | ||
res = do_put_request(url, data, content_type) | ||
# S3 does not return a Last-Modified response header on PUTs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we care? We can set it ourselves from here in Redis, as we're only using it from there, no? Otherwise we have another request for maybe a few milliseconds in difference, which the client doesn't care about.
lib/remote_storage/s3.rb
Outdated
end | ||
end | ||
|
||
def do_put_request_and_return_etag_and_last_modified(url, data, content_type) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see why we have to include the return values in the method name. We can add a documentation comment (RDoc if you want), and/or make the return line more readable. It shouldn't apply a bunch of things right in the array assignment anyway.
lib/remote_storage/s3.rb
Outdated
date = Time.now.httpdate | ||
signed_data = generate_s3_signature(http_verb, md5, content_type, date, url) | ||
{ "Authorization" => "AWS #{credentials[:access_key_id]}:#{signed_data}", | ||
"Date" => date} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks weird to me. Generally, when line-breaking objects/hashes, we do it like this:
{
"Authorization" => "AWS #{credentials[:access_key_id]}:#{signed_data}",
"Date" => date
}
spec/s3/app_spec.rb
Outdated
stub_request(:head, "#{container_url_for("phil")}/food/steak"). | ||
to_return(status: 404) | ||
stub_request(:get, "#{container_url_for("phil")}/food/steak"). | ||
to_return(status: 404) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These stubs are messy af. How is anyone supposed to know what's being stubbed there? Also, there are quite some duplicates with the exact same return values. I think we need to clean this up before merging.
Why is it necessary to break them away from their context entirely in the first place?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Them being all defined at once is caused by the way the specs are shared between the two providers, I'm open to suggestions to improve it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can group them a little bit and document for which specs exactly they are?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, at least it should be clear what each spec is for. Otherwise it's all implicit and super hard to both read and change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, at least it should be clear what each spec is for. Otherwise it's all implicit and super hard to both read and change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Them being all defined at once is caused by the way the specs are shared between the two providers, I'm open to suggestions to improve it
I don't fully understand the reason there. So it's less readable in order to avoid duplication?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In a way yes, we're setting up all the request stubs at once for all specs. A lot of the requests are linked (for example PUTs and HEADs on S3 to the same URLs). I'm going to start by adding comments and seeing how I can organize them
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have managed to remove some of the stubs, and grouped them together as well as add comment. This is a step in the right direction but it's still not easy to read
to_return(status: 404) | ||
end | ||
|
||
it_behaves_like 'a REST adapter' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup! 👍
Remove the stubs that are not required, making everything easier to understand
spec/s3/app_spec.rb
Outdated
stub_request(:delete, "#{container_url_for("phil")}/food/aguacate"). | ||
to_return(status: 200, headers: { etag: '"0815etag"' }) | ||
|
||
# PUT requests authorized updates the metadata object in redis when it changes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like it's copied from test output, but it sure isn't English language that one can understand.
This is currently using the old S3 authentication (https://s3.amazonaws.com/doc/s3-developer-guide/RESTAuthentication.html), in order to support Exoscale Storage
Also gets the metadata from Redis instead of the backend on HEAD and GET requests
Closes #112