Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DO NOT MERGE: Debug gene info endpoint returning null #1153

Closed
wants to merge 10 commits into from

Conversation

tihuan
Copy link
Contributor

@tihuan tihuan commented Dec 18, 2024

⚠️ ⚠️ ⚠️ ⚠️ ⚠️ ⚠️ ⚠️ ⚠️ ⚠️ ⚠️ ⚠️ ⚠️ ⚠️

  • DO NOT MERGE UNTIL ADDING STAGING BUCKET PERMISSION TO DEV ROLE
  • Revert staging role iam to dev role

⚠️ ⚠️ ⚠️ ⚠️ ⚠️ ⚠️ ⚠️ ⚠️ ⚠️ ⚠️ ⚠️ ⚠️ ⚠️

Argus Stack Details
Name thuang-test-rdev-tests
Short Name certain-bull
Base URL https://certain-bull.dev-sc.dev.czi.team

This reverts commit 6d162bf.

@tihuan tihuan added the stack label Dec 18, 2024
@tihuan tihuan force-pushed the thuang-test-rdev-tests branch 3 times, most recently from b6f146d to 31fd26b Compare December 18, 2024 23:49
@tihuan tihuan changed the title DO NOT MERGE: Revert "feat: agent (#1148)" DO NOT MERGE: Debug gene info endpoint returning null Dec 18, 2024
@@ -22,4 +22,4 @@ stack:
- name: DATA_LOCATOR_DOMAIN
value: $(__ARGUS_STACK_INGRESS_HOST)
- name: CXG_BUCKET_PATH
value: hosted-cellxgene-dev
value: hosted-cellxgene-staging
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hspitzley-czi I'm trying to test a fix by changing Explorer BE from hitting dev to staging, but I'm getting:

Unable to discover s3 region name from s3://hosted-cellxgene-staging (type=value_error)

In https://argo.prod.platform.si.czi.technology/applications/certain-bull?resource=&node=%2FPod%2Fargus-single-cell-explorer-rdev%2Fcertain-bull-stack-explorer-7654d9cc58-8nzrg%2F0&tab=logs

Is that because rdev role permission doesn't have access to staging bucket?

The bug I'm trying to address is that rdev gene info endpoint always returns null 🤔

CURL

curl 'https://certain-bull.dev-sc.dev.czi.team/cellxgene/s3_uri/s3%253A%252F%252Fhosted-cellxgene-dev%252Fsuper-cool-spatial.cxg/api/v0.3/geneinfo?geneID=&gene=AR' \
  -H 'accept: application/json' \
  -H 'accept-language: en-US,en;q=0.9,zh-TW;q=0.8,zh;q=0.7,zh-CN;q=0.6' \
  -H 'cookie: argus_sticky_session=1734566048.418.1662.83615|e3f9a91a7a0db7bca0ae9c8322a67838' \
  -H 'dnt: 1' \
  -H 'priority: u=1, i' \
  -H 'referer: https://certain-bull.dev-sc.dev.czi.team/d/super-cool-spatial.cxg/' \
  -H 'sec-ch-ua: "Google Chrome";v="131", "Chromium";v="131", "Not_A Brand";v="24"' \
  -H 'sec-ch-ua-mobile: ?0' \
  -H 'sec-ch-ua-platform: "macOS"' \
  -H 'sec-fetch-dest: empty' \
  -H 'sec-fetch-mode: cors' \
  -H 'sec-fetch-site: same-origin' \
  -H 'user-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36'

whereas on staging it returns actual info:

curl 'https://api.cellxgene.staging.single-cell.czi.technology/cellxgene/s3_uri/s3%253A%252F%252Fhosted-cellxgene-staging%252Fpbmc3k.cxg/api/v0.3/geneinfo?geneID=&gene=ACD' \
  -H 'accept: application/json' \
  -H 'accept-language: en-US,en;q=0.9,zh-TW;q=0.8,zh;q=0.7,zh-CN;q=0.6' \
  -H 'cookie: argus_sticky_session=1734565349.449.38.739565|df4dba4aefd2407066eb62a1488673d2' \
  -H 'dnt: 1' \
  -H 'origin: https://cellxgene.staging.single-cell.czi.technology' \
  -H 'priority: u=1, i' \
  -H 'referer: https://cellxgene.staging.single-cell.czi.technology/' \
  -H 'sec-ch-ua: "Google Chrome";v="131", "Chromium";v="131", "Not_A Brand";v="24"' \
  -H 'sec-ch-ua-mobile: ?0' \
  -H 'sec-ch-ua-platform: "macOS"' \
  -H 'sec-fetch-dest: empty' \
  -H 'sec-fetch-mode: cors' \
  -H 'sec-fetch-site: same-site' \
  -H 'user-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36'

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, that seems likely. I'm not sure what the application code is running to produce that error but looking at the role used by rdev I'm not seeing permission for the staging bucket
https://github.com/chanzuckerberg/single-cell-explorer/blob/main/.infra/common.yaml#L24

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see! I think the code that produces the error is this line in the Explorer BE code:

raise ValueError(f"Unable to discover s3 region name from {path}")

My hypothesis is that the Explorer BE is returning null for gene info query now, because behind the scene Explorer GE actually queries Data Portal's dev gene info endpoint for data, and I believe dev data portal endpoint is no longer working, thus us seeing the issue

https://cellxgene.dev.single-cell.czi.technology/ <-- gets 503 service not available now

So I'm thinking the solution here is for Explorer rdev to switch from hitting dev to staging and thus needing permission to staging bucket

Should I just change the common iam from dev to staging ?

arn:aws:iam::699936264352:role/data_portal_staging_explorer

Let me try that out! Thanks for pointing me to the code!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might want to check if there are other differences in the permissions between the dev role and staging role. You could probably pretty easily add permission to the dev role for the staging bucket

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah thanks for the callout! Okay let me test if using staging role actually works, if so I'll just add staging bucket permission to dev role 🙆‍♂️

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huh yeah github org wide search didn't return anything o_o

https://github.com/search?q=org:chanzuckerberg%20jheath-argus-demo-deleteme&type=code

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ooh it looks like the access key and secret key set with argus secrets belongs to that user. (pass -r to see the values)

» argus list secrets --app single-cell-explorer --env rdev                                                                                                                                                                                          hspitzley@MacBookPro
Listing environment variables on environment rdev
+-----------------------+----------+-------------+
|          KEY          |  VALUE   |   SOURCE    |
+-----------------------+----------+-------------+
| AWS_ACCESS_KEY_ID     | ******** | environment |
| AWS_SECRET_ACCESS_KEY | ******** | environment |
+-----------------------+----------+-------------+

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was some reason we needed it to assume a user instead of a role (I'm forgetting why right now) but this gives us a path forward. You could copy those values elsewhere so you can restore them later and then overwrite them with the corresponding values from staging and I think that will make it able to access the staging bucket

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh fantastic!! Thanks so much for helping me out with this 🤩 🙏 That's really good to know 💡

Okay will do and retest 🫡

Thanks again!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay can confirm that my rdev can now hit staging endpoint thank you!

However, somehow the staging endpoint is still returning null, so I'm adding more logs to troubleshoot what's going on there. Will update this thread 👌

@tihuan tihuan force-pushed the thuang-test-rdev-tests branch 3 times, most recently from ad4cde2 to f70b093 Compare December 19, 2024 22:57
@tihuan tihuan force-pushed the thuang-test-rdev-tests branch from d0cbc71 to 366e793 Compare December 23, 2024 16:33
@tihuan tihuan force-pushed the thuang-test-rdev-tests branch from 4719a41 to 15b9998 Compare December 23, 2024 16:58
@tihuan tihuan force-pushed the thuang-test-rdev-tests branch from 67889c1 to d98718d Compare December 23, 2024 17:28
@tihuan tihuan force-pushed the thuang-test-rdev-tests branch 2 times, most recently from d203f43 to 92a1302 Compare December 23, 2024 17:55
@tihuan tihuan force-pushed the thuang-test-rdev-tests branch from 92a1302 to d81ca4c Compare December 23, 2024 17:56
@tihuan tihuan force-pushed the thuang-test-rdev-tests branch from 89e3b23 to 18feafd Compare December 23, 2024 18:47
@argocd-czi-prod
Copy link

Stale Stack Warning

The stack associated with this PR has not been updated since 2024-12-23 18:51:13 UTC (~10.2 days ago).

The threshold for deletion specified in the 'stale_rdev_cleanup_threshold_days' setting for the associated app (single-cell-explorer) is 14 days.

Unless a new commit is made to this branch before the threshold is reached, the stack will be deleted.

@tihuan
Copy link
Contributor Author

tihuan commented Jan 6, 2025

Closing this for now to focus on higher priority stuff. The latest investigation shows that the rdev hitting the staging API still returns null for some reason 🤔

#1153 (comment)

@tihuan tihuan closed this Jan 6, 2025
@tihuan tihuan deleted the thuang-test-rdev-tests branch January 6, 2025 15:50
Copy link

codecov bot commented Jan 6, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 0.00%. Comparing base (f283373) to head (a822c59).
Report is 1 commits behind head on main.

Additional details and impacted files
@@     Coverage Diff      @@
##   main   #1153   +/-   ##
============================
============================

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants