Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for S3 through the REST API #118

Merged
merged 21 commits into from
May 11, 2018
Merged

Conversation

gregkare
Copy link
Member

This is currently using the old S3 authentication (https://s3.amazonaws.com/doc/s3-developer-guide/RESTAuthentication.html), in order to support Exoscale Storage

Also gets the metadata from Redis instead of the backend on HEAD and GET requests

Closes #112

This is currently using the old S3 authentication
(https://s3.amazonaws.com/doc/s3-developer-guide/RESTAuthentication.html),
in order to support Exoscale Storage

Refs #112
This prevents doing a network request when we would return a 304 anyway
Also add specs to check for the response headers
No need to hit the storage backend, we can set the response headers from
the data in Redis
S3 and Swift now run the same specs. The only difference is the before
block that defines the stubbed HTTP requests and the responses from the
Swift and S3 servers
Copy link
Member

@galfert galfert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall I think this looks really good. Left some improvement suggestions.

# # Redis is needed for the swift backend
# redis:
# host: localhost
# port: 6379
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this commented out in the example file?

end
end

not_found = !try_to_delete(url)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the exclamation mark on the method call, this reads like "don't try to delete". I would rather do found = try_to_delete(url) and then use !found in the condition below.

raise NotImplementedError
end

def set_response_headers(response)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be better if this method would rather only get the actual hash containing the values as param instead of the whole response. The method doesn't need to know that it needs to access the values via response.headers. Instead the caller should call it via set_response_headers(response.headers).


def metadata_changed?(old_metadata, new_metadata)
# check metadata relevant to the directory listing
# ie. the timestamp (m) is not relevant, because it's not used in
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that the directory listing contains the last-modified date, the timestamp needs to be checked here as well.

require "webrick/httputils"

module RemoteStorage
class S3Rest
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why call this "S3Rest" instead of just "S3"?. The Swift provider isn't called "SwiftRest" either.

end
end

# S3 does not return a Last-Modified response header on PUTs
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this comment rather belongs to the line that does the HEAD request, to explain why there is an additional request.


# This is using the S3 authorizations, not the newer AW V4 Signatures
# (https://s3.amazonaws.com/doc/s3-developer-guide/RESTAuthentication.html)
def authorization_headers_for(http_verb, md5, content_type, url)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if the param signature should be rather http_verb, url, md5, content_type. There are multiple (if not even the majority) calls to this method that just give an empty string for md5 and content_type. This way they could be optional and the caller would just omit them.

to_return(status: 404)
end

it_behaves_like 'a REST adapter'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice ❤️

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup! 👍

@@ -0,0 +1,56 @@
require_relative "../spec_helper"

describe "App" do
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about using "S3 provider" instead of "App"?

@@ -1,827 +1,56 @@
require_relative "../spec_helper"

describe "App" do
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above, a more meaningful description like "Swift provider" would be better.

@gregkare gregkare changed the title Add support for S3 through the REST API [WIP] Add support for S3 through the REST API May 2, 2018
@gregkare
Copy link
Member Author

gregkare commented May 2, 2018

I found a bug with the current code in that branch (in the directory listings), on it

It wasn't a bug, just wrong metadata for my files on staging from a previous version of the code

@gregkare gregkare changed the title [WIP] Add support for S3 through the REST API Add support for S3 through the REST API May 2, 2018
@gregkare
Copy link
Member Author

gregkare commented May 2, 2018

@galfert I pushes changes based on your comments. This is now deployed to the staging server

@galfert
Copy link
Member

galfert commented May 4, 2018

Changes look good to me.

Copy link
Member

@raucao raucao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great overall, but I did find a few things I wouldn't merge like this.

@@ -21,3 +19,6 @@ notifications:
- http://hook-juggler.herokuapp.com/hooks/travis
on_success: always
on_failure: always
env:
- BACKEND=s3
- BACKEND=swift
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't the second one overwriting the first one here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's setting a build matrix, creating one build for the s3 backend and one for the swift backend

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see. Didn't read that far in their docs.

access_key_id: ""
secret_key_id: ""
bucket: "test-bucket"
# Redis is needed for the swift backend
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment seems wrong here. In fact, all backends need Redis anyway, so it can probably be deleted.

bucket: "test-bucket"
redis:
host: localhost
port: 6379
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the same config as the default above, so it's not necessary.

# uncomment this section
swift: &swift_defaults
host: "https://swift.example.com"
# Redis is needed for the swift backend
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to above, except Swift is correct here. But not necessary.

# redis:
# host: localhost
# port: 6379
# uncomment this section
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The section is already uncommented? I think now that the examples are seperate, this comment is obsolete.


def do_put_request_and_return_etag_and_last_modified(url, data, content_type)
res = do_put_request(url, data, content_type)
# S3 does not return a Last-Modified response header on PUTs
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we care? We can set it ourselves from here in Redis, as we're only using it from there, no? Otherwise we have another request for maybe a few milliseconds in difference, which the client doesn't care about.

end
end

def do_put_request_and_return_etag_and_last_modified(url, data, content_type)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see why we have to include the return values in the method name. We can add a documentation comment (RDoc if you want), and/or make the return line more readable. It shouldn't apply a bunch of things right in the array assignment anyway.

date = Time.now.httpdate
signed_data = generate_s3_signature(http_verb, md5, content_type, date, url)
{ "Authorization" => "AWS #{credentials[:access_key_id]}:#{signed_data}",
"Date" => date}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks weird to me. Generally, when line-breaking objects/hashes, we do it like this:

{
  "Authorization" => "AWS #{credentials[:access_key_id]}:#{signed_data}",
  "Date" => date
}

stub_request(:head, "#{container_url_for("phil")}/food/steak").
to_return(status: 404)
stub_request(:get, "#{container_url_for("phil")}/food/steak").
to_return(status: 404)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These stubs are messy af. How is anyone supposed to know what's being stubbed there? Also, there are quite some duplicates with the exact same return values. I think we need to clean this up before merging.

Why is it necessary to break them away from their context entirely in the first place?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Them being all defined at once is caused by the way the specs are shared between the two providers, I'm open to suggestions to improve it

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can group them a little bit and document for which specs exactly they are?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, at least it should be clear what each spec is for. Otherwise it's all implicit and super hard to both read and change.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, at least it should be clear what each spec is for. Otherwise it's all implicit and super hard to both read and change.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Them being all defined at once is caused by the way the specs are shared between the two providers, I'm open to suggestions to improve it

I don't fully understand the reason there. So it's less readable in order to avoid duplication?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In a way yes, we're setting up all the request stubs at once for all specs. A lot of the requests are linked (for example PUTs and HEADs on S3 to the same URLs). I'm going to start by adding comments and seeing how I can organize them

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have managed to remove some of the stubs, and grouped them together as well as add comment. This is a step in the right direction but it's still not easy to read

to_return(status: 404)
end

it_behaves_like 'a REST adapter'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup! 👍

Remove the stubs that are not required, making everything easier to
understand
stub_request(:delete, "#{container_url_for("phil")}/food/aguacate").
to_return(status: 200, headers: { etag: '"0815etag"' })

# PUT requests authorized updates the metadata object in redis when it changes
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like it's copied from test output, but it sure isn't English language that one can understand.

@raucao raucao merged commit 1705ac7 into master May 11, 2018
@raucao raucao deleted the feature/112-s3_cleaned_up branch May 11, 2018 13:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants