Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(recordings): object storage URL may be different interally vs externally #234

Merged
merged 13 commits into from
Jan 15, 2024

Conversation

andrewazores
Copy link
Member

@andrewazores andrewazores commented Jan 10, 2024

Welcome to Cryostat3! 👋

Before contributing, make sure you have:

  • Read the contributing guidelines
  • Linked a relevant issue which this PR resolves
  • Linked any other relevant issues, PR's, or documentation, if any
  • Resolved all conflicts, if any
  • Rebased your branch PR on top of the latest upstream main branch
  • Attached at least one of the following labels to the PR: [chore, ci, docs, feat, fix, test]
  • Signed all commits using a GPG signature

To recreate commits with GPG signature git fetch upstream && git rebase --force --gpg-sign upstream/main


Based on #235
Related to #2

Description of the change:

Adds optional configuration storage-ext.url, similar to existing grafana-dashboard-ext.url.

Motivation for the change:

Cryostat uses this URL to determine the externally-facing URL where the object storage provider (S3 instance) can be reached by requesting clients, since Cryostat is likely to use an internal URL to communicate with the storage provider, but should tell clients how to download files from the object store using an externally routable URL.

How to manually test:

  1. ./mvnw clean package ; podman image prune -f
  2. ./smoketest.bash -O
  3. http -v --follow --auth=user:pass :8080/api/v3/activedownload/1 to attempt downloading an active recording - see examples below.
  4. This can also be done by opening localhost:8080 in a browser, logging in with user:pass, going to Recordings, selecting Cryostat as the target, and attempting to download the onstart recording. Open browser devtools network tab to view and verify request paths.
  5. Verify that the URL the client is redirected to looks like http://localhost:8080/storage/$somepath, where somepath is an S3 presigned URL. The localhost:8080/storage prefix is the important part, indicating that the request is being routed through the auth proxy.

Current difficulties:

1. S3 instances produce the presigned request URL using various parameters like the request URL and headers, which become encoded as part of the signature of the short-lived token for the resource. This is very similar to what we have implemented directly in 2.x. This means that using an internal URL for Cryostat to talk to S3, and an external URL for clients to talk to S3, does not meet the signature validation requirements and S3 rejects the client's request.
2. Configuring Cryostat to talk to S3 via the external route would likely resolve the point above since then the request parameters can be the same, however in practice this would mean that Cryostat would need to go through the authentication proxy as well (the same as an external client does). This gets pretty messy since it's an additional set of credentials and authorization that need to be set up just for this reason. This might not be too bad in a real Kubernetes deployment since Cryostat should be able to use its own serviceaccount token to talk through the proxy. Maybe we can use --skip-auth-regex on the storage path so that the proxy only acts a gateway in front of S3 and does not perform authz duties? This means that the S3 provider must have its own authz for anything other than presigned URLs, but since our primary deployment target scenario is to use MinIO this should be okay since that does have its own robust authz mechanism that we can configure separately.

Switching from Minio to SeaweedFS seems to have resolved the above. Seaweed tolerates the internal vs external URL, once it is configured with user basic auth to enable presigned downloads at all.

  1. Switching from Minio to SeaweedFS, it seems Seaweed is not properly respecting the Content-Disposition header set when redirecting clients to download active recordings. The new flow for this action in Cryostat 3 is that Cryostat actually pipes the JFR data out of the target JVM and uploads it to S3, rather than sending it back out to the client directly. Then Cryostat sends the client back a redirect response telling it to go get the file itself from S3 using a presigned URL, and a Content-Disposition header that should inform the client (browser) what filename to use when saving this file, overriding the filename as it is found on the server. With Minio this works as expected, but with Seaweed the saved file retains the filename it has on the server. This ends up looking the same as any other archived recording file, which is not so bad, but the bugged behaviour is that this does not match the filename of the accompanying .metadata.json file produced by the web-client. Potential fix: don't generate the .metadata.json on the client, actually supply that straight from the server and just have the client download it. The filenames may end up looking like archived recording filenames, which I think is actually OK, and at least the JFR and metadata JSON files will have matching names.

With Seaweed:

$ http -v --follow --auth=user:pass :8080/api/v3/activedownload/1
GET /api/v3/activedownload/1 HTTP/1.1
Accept: */*
Accept-Encoding: gzip, deflate, br
Authorization: Basic dXNlcjpwYXNz
Connection: keep-alive
Host: localhost:8080
User-Agent: HTTPie/3.2.2



HTTP/1.1 308 Permanent Redirect
Cache-Control: no-cache
Content-Disposition: attachment; filename="onstart.jfr"
Content-Length: 0
Date: Fri, 12 Jan 2024 18:37:23 GMT
Gap-Auth: user
Location: http://localhost:8080/api/v3/download/THB0RWlxN2U0cHE5eE1WRlJzZDYwSGp3OEdoaG53c24wa2xKa2ZuclFWOD0vY29tcG9zZS1jcnlvc3RhdC0xX29uc3RhcnRfMjAyNDAxMTJUMTgzNzIyWi5qZnI?f=b25zdGFydC5qZnI



GET /api/v3/download/THB0RWlxN2U0cHE5eE1WRlJzZDYwSGp3OEdoaG53c24wa2xKa2ZuclFWOD0vY29tcG9zZS1jcnlvc3RhdC0xX29uc3RhcnRfMjAyNDAxMTJUMTgzNzIyWi5qZnI?f=b25zdGFydC5qZnI HTTP/1.1
Accept: */*
Accept-Encoding: gzip, deflate, br
Authorization: Basic dXNlcjpwYXNz
Connection: keep-alive
Host: localhost:8080
User-Agent: HTTPie/3.2.2



HTTP/1.1 308 Permanent Redirect
Cache-Control: no-cache
Content-Disposition: attachment; filename="onstart.jfr"
Content-Length: 0
Date: Fri, 12 Jan 2024 18:37:23 GMT
Gap-Auth: user
Location: http://localhost:8080/storage/archivedrecordings/LptEiq7e4pq9xMVFRsd60Hjw8Ghhnwsn0klJkfnrQV8=/compose-cryostat-1_onstart_20240112T183722Z.jfr?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Date=20240112T183723Z&X-Amz-SignedHeaders=host&X-Amz-Expires=60&X-Amz-Credential=access_key/20240112/us-east-1/s3/aws4_request&X-Amz-Signature=d53ab9c8a79820c79822f66a33041ebf4afd064aa489f0131a643990a8f62b3d



GET /storage/archivedrecordings/LptEiq7e4pq9xMVFRsd60Hjw8Ghhnwsn0klJkfnrQV8=/compose-cryostat-1_onstart_20240112T183722Z.jfr?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Date=20240112T183723Z&X-Amz-SignedHeaders=host&X-Amz-Expires=60&X-Amz-Credential=access_key/20240112/us-east-1/s3/aws4_request&X-Amz-Signature=d53ab9c8a79820c79822f66a33041ebf4afd064aa489f0131a643990a8f62b3d HTTP/1.1
Accept: */*
Accept-Encoding: gzip, deflate, br
Authorization: Basic dXNlcjpwYXNz
Connection: keep-alive
Host: localhost:8080
User-Agent: HTTPie/3.2.2



HTTP/1.1 200 OK
Accept-Ranges: bytes
Access-Control-Expose-Headers: Content-Disposition
Content-Disposition: inline; filename="compose-cryostat-1_onstart_20240112T183722Z.jfr"
Content-Length: 1754515
Content-Type: binary/octet-stream
Date: Fri, 12 Jan 2024 18:37:23 GMT
Etag: "e8c91341182321ce9dbd2940df43e2de"
Gap-Auth: user
Last-Modified: Fri, 12 Jan 2024 18:37:23 GMT
Server: SeaweedFS Filer 30GB 3.61
X-Amz-Tagging-Anztswq: THB0RWlxN2U0cHE5eE1WRlJzZDYwSGp3OEdoaG53c24wa2xKa2ZuclFWOD0
X-Amz-Tagging-Count: 3
X-Amz-Tagging-Y29ubmvjdfvyba: c2VydmljZTpqbXg6cm1pOi8vL2puZGkvcm1pOi8vbG9jYWxob3N0OjAvam14cm1p
X-Amz-Tagging-Zxhwaxjhdglvbg: MjAyNC0wMS0xMlQxODozODoyMi42NDM4MDEzOTZa



+-----------------------------------------+
| NOTE: binary data not shown in terminal |
+-----------------------------------------+

With Minio:

$ http -v --follow --auth=user:pass :8080/api/v3/activedownload/1
GET /api/v3/activedownload/1 HTTP/1.1
Accept: */*
Accept-Encoding: gzip, deflate, br
Authorization: Basic dXNlcjpwYXNz
Connection: keep-alive
Host: localhost:8080
User-Agent: HTTPie/3.2.2



HTTP/1.1 308 Permanent Redirect
Cache-Control: no-cache
Content-Disposition: attachment; filename="onstart.jfr"
Content-Length: 0
Date: Fri, 12 Jan 2024 18:38:35 GMT
Gap-Auth: user
Location: http://localhost:8080/api/v3/download/QWFMYzJTVDRjYTBrTnUwUjNZRGpQR0xyZmtXcGdXSkduQnVONXZ5RGxXST0vY29tcG9zZS1jcnlvc3RhdC0xX29uc3RhcnRfMjAyNDAxMTJUMTgzODM1Wi5qZnI?f=b25zdGFydC5qZnI



GET /api/v3/download/QWFMYzJTVDRjYTBrTnUwUjNZRGpQR0xyZmtXcGdXSkduQnVONXZ5RGxXST0vY29tcG9zZS1jcnlvc3RhdC0xX29uc3RhcnRfMjAyNDAxMTJUMTgzODM1Wi5qZnI?f=b25zdGFydC5qZnI HTTP/1.1
Accept: */*
Accept-Encoding: gzip, deflate, br
Authorization: Basic dXNlcjpwYXNz
Connection: keep-alive
Host: localhost:8080
User-Agent: HTTPie/3.2.2



HTTP/1.1 308 Permanent Redirect
Cache-Control: no-cache
Content-Disposition: attachment; filename="onstart.jfr"
Content-Length: 0
Date: Fri, 12 Jan 2024 18:38:35 GMT
Gap-Auth: user
Location: http://localhost:8080/storage/archivedrecordings/AaLc2ST4ca0kNu0R3YDjPGLrfkWpgWJGnBuN5vyDlWI=/compose-cryostat-1_onstart_20240112T183835Z.jfr?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Date=20240112T183835Z&X-Amz-SignedHeaders=host&X-Amz-Expires=60&X-Amz-Credential=minioroot/20240112/us-east-1/s3/aws4_request&X-Amz-Signature=2d15450d129d3c270fe40011d142415ae41c2b2ee4289b3281ecb1336c7c1c21



GET /storage/archivedrecordings/AaLc2ST4ca0kNu0R3YDjPGLrfkWpgWJGnBuN5vyDlWI=/compose-cryostat-1_onstart_20240112T183835Z.jfr?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Date=20240112T183835Z&X-Amz-SignedHeaders=host&X-Amz-Expires=60&X-Amz-Credential=minioroot/20240112/us-east-1/s3/aws4_request&X-Amz-Signature=2d15450d129d3c270fe40011d142415ae41c2b2ee4289b3281ecb1336c7c1c21 HTTP/1.1
Accept: */*
Accept-Encoding: gzip, deflate, br
Authorization: Basic dXNlcjpwYXNz
Connection: keep-alive
Host: localhost:8080
User-Agent: HTTPie/3.2.2



HTTP/1.1 200 OK
Accept-Ranges: bytes
Content-Length: 1628312
Content-Type: binary/octet-stream
Date: Fri, 12 Jan 2024 18:38:35 GMT
Etag: "320062a5bb9151ac47dd8c16d9aaf98e-1"
Expires: Fri, 12 Jan 2024 18:39:35 GMT
Gap-Auth: user
Last-Modified: Fri, 12 Jan 2024 18:38:35 GMT
Server: MinIO
Strict-Transport-Security: max-age=31536000; includeSubDomains
Vary: Origin
Vary: Accept-Encoding
X-Amz-Id-2: dd9025bab4ad464b049177c95eb6ebf374d3b3fd1af9251148b658df7ac2e3e8
X-Amz-Request-Id: 17A9AD83BC0440D5
X-Amz-Tagging-Count: 3
X-Content-Type-Options: nosniff
X-Xss-Protection: 1; mode=block



+-----------------------------------------+
| NOTE: binary data not shown in terminal |
+-----------------------------------------+

Note how in the Minio case, there is an initial Content-Disposition on the first redirect, the same on the second redirect, and none in the final response. Whereas in the Seaweed case, the first two redirect responses have the same header (since these come from Cryostat), and the final response from Seaweed also contains the Content-Disposition header but with different contents. This is what causes the bug. I have patched some of the behaviour for this header as set in the two Cryostat redirect responses, as well as tried specifying that header's value in the object metadata when Cryostat uploads it to Seaweed, but to no avail. I've burned a few hours trying to work around this but keep triggering the presigned URL signature validation failure whenever I try something else, so this might just have to do for now.

@andrewazores andrewazores requested a review from a team as a code owner January 10, 2024 19:32
@github-actions github-actions bot added the needs-triage Needs thorough attention from code reviewers label Jan 10, 2024
@andrewazores andrewazores added question Further information is requested feat New feature or request and removed needs-triage Needs thorough attention from code reviewers labels Jan 10, 2024
@andrewazores andrewazores marked this pull request as draft January 10, 2024 19:49
Copy link

This PR/issue depends on:

@andrewazores andrewazores removed the question Further information is requested label Jan 12, 2024
@andrewazores andrewazores marked this pull request as ready for review January 12, 2024 20:26
@andrewazores
Copy link
Member Author

/build_test

Copy link

Workflow started at 1/12/2024, 3:26:46 PM. View Actions Run.

Copy link

CI build and push: All tests pass ✅
https://github.com/cryostatio/cryostat3/actions/runs/7507110713

1 similar comment
Copy link

CI build and push: All tests pass ✅
https://github.com/cryostatio/cryostat3/actions/runs/7507110713

Copy link
Contributor

@aali309 aali309 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me!

@andrewazores andrewazores merged commit 11b397d into cryostatio:main Jan 15, 2024
17 checks passed
@andrewazores andrewazores deleted the recording-downloads branch January 15, 2024 16:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feat New feature or request
Projects
No open projects
Status: Done
Development

Successfully merging this pull request may close these issues.

2 participants