Add support for S3 through the REST API #118

gregkare · 2018-04-26T11:20:21Z

This is currently using the old S3 authentication (https://s3.amazonaws.com/doc/s3-developer-guide/RESTAuthentication.html), in order to support Exoscale Storage

Also gets the metadata from Redis instead of the backend on HEAD and GET requests

Closes #112

This is currently using the old S3 authentication (https://s3.amazonaws.com/doc/s3-developer-guide/RESTAuthentication.html), in order to support Exoscale Storage Refs #112

This prevents doing a network request when we would return a 304 anyway

Also add specs to check for the response headers

No need to hit the storage backend, we can set the response headers from the data in Redis

S3 and Swift now run the same specs. The only difference is the before block that defines the stubbed HTTP requests and the responses from the Swift and S3 servers

galfert

Overall I think this looks really good. Left some improvement suggestions.

galfert · 2018-04-26T11:39:40Z

config.yml.example.s3

+  # # Redis is needed for the swift backend
+  # redis:
+  #   host: localhost
+  #   port: 6379


Why is this commented out in the example file?

galfert · 2018-04-26T13:43:05Z

lib/remote_storage/rest_provider.rb

+        end
+      end
+
+      not_found = !try_to_delete(url)


With the exclamation mark on the method call, this reads like "don't try to delete". I would rather do found = try_to_delete(url) and then use !found in the condition below.

galfert · 2018-04-26T14:00:45Z

lib/remote_storage/rest_provider.rb

+      raise NotImplementedError
+    end
+
+    def set_response_headers(response)


It would be better if this method would rather only get the actual hash containing the values as param instead of the whole response. The method doesn't need to know that it needs to access the values via response.headers. Instead the caller should call it via set_response_headers(response.headers).

galfert · 2018-04-26T14:17:53Z

lib/remote_storage/rest_provider.rb

+
+    def metadata_changed?(old_metadata, new_metadata)
+      # check metadata relevant to the directory listing
+      # ie. the timestamp (m) is not relevant, because it's not used in


Now that the directory listing contains the last-modified date, the timestamp needs to be checked here as well.

galfert · 2018-04-26T16:21:51Z

lib/remote_storage/s3_rest.rb

+require "webrick/httputils"
+
+module RemoteStorage
+  class S3Rest


Why call this "S3Rest" instead of just "S3"?. The Swift provider isn't called "SwiftRest" either.

galfert · 2018-04-26T16:24:00Z

lib/remote_storage/s3_rest.rb

+      end
+    end
+
+    # S3 does not return a Last-Modified response header on PUTs


I think this comment rather belongs to the line that does the HEAD request, to explain why there is an additional request.

galfert · 2018-04-26T16:28:48Z

lib/remote_storage/s3_rest.rb

+
+    # This is using the S3 authorizations, not the newer AW V4 Signatures
+    # (https://s3.amazonaws.com/doc/s3-developer-guide/RESTAuthentication.html)
+    def authorization_headers_for(http_verb, md5, content_type, url)


I wonder if the param signature should be rather http_verb, url, md5, content_type. There are multiple (if not even the majority) calls to this method that just give an empty string for md5 and content_type. This way they could be optional and the caller would just omit them.

galfert · 2018-04-26T17:22:14Z

spec/s3/app_spec.rb

+      to_return(status: 404)
+  end
+
+  it_behaves_like 'a REST adapter'


Very nice ❤️

galfert · 2018-04-26T17:26:24Z

spec/s3/app_spec.rb

@@ -0,0 +1,56 @@
+require_relative "../spec_helper"
+
+describe "App" do


How about using "S3 provider" instead of "App"?

galfert · 2018-04-26T17:32:10Z

spec/swift/app_spec.rb

@@ -1,827 +1,56 @@
 require_relative "../spec_helper"

 describe "App" do


Same as above, a more meaningful description like "Swift provider" would be better.

Make content_type and md5 optional (set to nil by default)

Also adds a spec for it

gregkare · 2018-05-02T09:03:50Z

~~I found a bug with the current code in that branch (in the directory listings), on it~~

It wasn't a bug, just wrong metadata for my files on staging from a previous version of the code

gregkare · 2018-05-02T12:30:02Z

@galfert I pushes changes based on your comments. This is now deployed to the staging server

galfert · 2018-05-04T13:22:20Z

Changes look good to me.

raucao

Looks great overall, but I did find a few things I wouldn't merge like this.

raucao · 2018-05-09T13:02:51Z

.travis.yml

@@ -21,3 +19,6 @@ notifications:
      - http://hook-juggler.herokuapp.com/hooks/travis
    on_success: always
    on_failure: always
+env:
+  - BACKEND=s3
+  - BACKEND=swift


Isn't the second one overwriting the first one here?

It's setting a build matrix, creating one build for the s3 backend and one for the swift backend

Ah, I see. Didn't read that far in their docs.

raucao · 2018-05-09T13:03:40Z

config.yml.example.s3

+    access_key_id: ""
+    secret_key_id: ""
+    bucket: "test-bucket"
+  # Redis is needed for the swift backend


This comment seems wrong here. In fact, all backends need Redis anyway, so it can probably be deleted.

raucao · 2018-05-09T13:04:16Z

config.yml.example.s3

+    bucket: "test-bucket"
+  redis:
+    host: localhost
+    port: 6379


This is the same config as the default above, so it's not necessary.

raucao · 2018-05-09T13:04:45Z

config.yml.example.swift

+  # uncomment this section
+  swift: &swift_defaults
+    host: "https://swift.example.com"
+  # Redis is needed for the swift backend


Similar to above, except Swift is correct here. But not necessary.

raucao · 2018-05-09T13:05:11Z

config.yml.example.swift

-  # redis:
-  #   host: localhost
-  #   port: 6379
+  # uncomment this section


The section is already uncommented? I think now that the examples are seperate, this comment is obsolete.

raucao · 2018-05-09T13:08:06Z

lib/remote_storage/s3.rb

+
+    def do_put_request_and_return_etag_and_last_modified(url, data, content_type)
+      res = do_put_request(url, data, content_type)
+      # S3 does not return a Last-Modified response header on PUTs


Why do we care? We can set it ourselves from here in Redis, as we're only using it from there, no? Otherwise we have another request for maybe a few milliseconds in difference, which the client doesn't care about.

raucao · 2018-05-09T13:09:30Z

lib/remote_storage/s3.rb

+      end
+    end
+
+    def do_put_request_and_return_etag_and_last_modified(url, data, content_type)


I don't see why we have to include the return values in the method name. We can add a documentation comment (RDoc if you want), and/or make the return line more readable. It shouldn't apply a bunch of things right in the array assignment anyway.

raucao · 2018-05-09T13:11:41Z

lib/remote_storage/s3.rb

+      date = Time.now.httpdate
+      signed_data = generate_s3_signature(http_verb, md5, content_type, date, url)
+      { "Authorization" => "AWS #{credentials[:access_key_id]}:#{signed_data}",
+        "Date" => date}


This looks weird to me. Generally, when line-breaking objects/hashes, we do it like this:

{ "Authorization" => "AWS #{credentials[:access_key_id]}:#{signed_data}", "Date" => date }

raucao · 2018-05-09T13:14:44Z

spec/s3/app_spec.rb

+    stub_request(:head, "#{container_url_for("phil")}/food/steak").
+      to_return(status: 404)
+    stub_request(:get, "#{container_url_for("phil")}/food/steak").
+      to_return(status: 404)


These stubs are messy af. How is anyone supposed to know what's being stubbed there? Also, there are quite some duplicates with the exact same return values. I think we need to clean this up before merging.

Why is it necessary to break them away from their context entirely in the first place?

Them being all defined at once is caused by the way the specs are shared between the two providers, I'm open to suggestions to improve it

Maybe we can group them a little bit and document for which specs exactly they are?

Yes, at least it should be clear what each spec is for. Otherwise it's all implicit and super hard to both read and change.

Them being all defined at once is caused by the way the specs are shared between the two providers, I'm open to suggestions to improve it

I don't fully understand the reason there. So it's less readable in order to avoid duplication?

In a way yes, we're setting up all the request stubs at once for all specs. A lot of the requests are linked (for example PUTs and HEADs on S3 to the same URLs). I'm going to start by adding comments and seeing how I can organize them

I have managed to remove some of the stubs, and grouped them together as well as add comment. This is a step in the right direction but it's still not easy to read

raucao · 2018-05-09T13:15:14Z

spec/s3/app_spec.rb

+      to_return(status: 404)
+  end
+
+  it_behaves_like 'a REST adapter'


…onfigs

Remove the stubs that are not required, making everything easier to understand

raucao · 2018-05-09T16:54:28Z

spec/s3/app_spec.rb

+    stub_request(:delete, "#{container_url_for("phil")}/food/aguacate").
+      to_return(status: 200, headers: { etag: '"0815etag"' })
+
+    # PUT requests authorized updates the metadata object in redis when it changes


This looks like it's copied from test output, but it sure isn't English language that one can understand.

gregkare added 5 commits April 26, 2018 13:17

Add support for S3 through the REST API

86dc45f

This is currently using the old S3 authentication (https://s3.amazonaws.com/doc/s3-developer-guide/RESTAuthentication.html), in order to support Exoscale Storage Refs #112

Get the ETag from Redis on a GET request

f083022

This prevents doing a network request when we would return a 304 anyway

Set headers from the Redis metadata on a GET that results in a 304

7bd4554

Also add specs to check for the response headers

Get the metadata from Redis on a HEAD request

21dad2a

No need to hit the storage backend, we can set the response headers from the data in Redis

Remove all duplication the specs

21f3a9f

S3 and Swift now run the same specs. The only difference is the before block that defines the stubbed HTTP requests and the responses from the Swift and S3 servers

gregkare requested a review from galfert April 26, 2018 11:20

gregkare added the in progress label Apr 26, 2018

gregkare mentioned this pull request Apr 26, 2018

Add support for S3 through the REST API #117

Closed

galfert requested changes Apr 26, 2018

View reviewed changes

gregkare added 9 commits April 30, 2018 15:10

Reverse the not found logic in the delete_data method to make it clearer

b00fc5b

Pass the headers to the set_response_headers directly, not the response

3b72b8d

Move a comment to the relevant line

97cd5ec

Reorder the argument in authorization_headers_for

1532a23

Make content_type and md5 optional (set to nil by default)

Rename the spec root description

454f02d

Delete unused methods

5da0d0b

Rename the S3 provider to just S3

ca0127d

Consider that the metadata has changed when the Last-Modified changes

f14ef4d

Also adds a spec for it

Uncomment the development config in the examples

a922867

gregkare changed the title ~~Add support for S3 through the REST API~~ [WIP] Add support for S3 through the REST API May 2, 2018

gregkare changed the title ~~[WIP] Add support for S3 through the REST API~~ Add support for S3 through the REST API May 2, 2018

galfert approved these changes May 4, 2018

View reviewed changes

raucao requested changes May 9, 2018

View reviewed changes

gregkare added 5 commits May 9, 2018 15:21

Remove useless comments and remove duplicate section in the example c…

d0a28c7

…onfigs

Fix coding style for a hash

0ec76c8

Rewrite to avoid long lines

709f635

Refactor the put_request method to have a return value

639c372

Run Travis builds on the Docker infrastructure

c0d88f1

Simply the request stubs, add comments

df65190

Remove the stubs that are not required, making everything easier to understand

raucao reviewed May 9, 2018

View reviewed changes

Replace placeholder comment that I forgot to replace

be33b0e

raucao approved these changes May 11, 2018

View reviewed changes

raucao merged commit 1705ac7 into master May 11, 2018

raucao removed the in progress label May 11, 2018

raucao deleted the feature/112-s3_cleaned_up branch May 11, 2018 13:24

		@@ -0,0 +1,56 @@
		require_relative "../spec_helper"

		describe "App" do

		@@ -1,827 +1,56 @@
		require_relative "../spec_helper"

		describe "App" do

Add support for S3 through the REST API #118

Add support for S3 through the REST API #118

Conversation

gregkare commented Apr 26, 2018

galfert left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gregkare commented May 2, 2018 • edited Loading

gregkare commented May 2, 2018

galfert commented May 4, 2018

raucao left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gregkare commented May 2, 2018 •

edited

Loading