Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DO NOT MERGE: Debug gene info endpoint returning null #1153

Closed
wants to merge 10 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .infra/common.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ stack:
create: true
automount: true
annotations:
"eks.amazonaws.com/role-arn": arn:aws:iam::699936264352:role/data_portal_dev_explorer
"eks.amazonaws.com/role-arn": arn:aws:iam::699936264352:role/data_portal_staging_explorer
image:
repository: 533267185808.dkr.ecr.us-west-2.amazonaws.com/core-platform/single-cell-explorer/explorer
resources:
Expand Down
4 changes: 2 additions & 2 deletions .infra/rdev/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ stack:
services:
explorer:
image:
tag: sha-64ce2a39
tag: sha-18feafd2
replicaCount: 1
env:
# env vars common to all deployment stages
Expand All @@ -22,4 +22,4 @@ stack:
- name: DATA_LOCATOR_DOMAIN
value: $(__ARGUS_STACK_INGRESS_HOST)
- name: CXG_BUCKET_PATH
value: hosted-cellxgene-dev
value: hosted-cellxgene-staging
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hspitzley-czi I'm trying to test a fix by changing Explorer BE from hitting dev to staging, but I'm getting:

Unable to discover s3 region name from s3://hosted-cellxgene-staging (type=value_error)

In https://argo.prod.platform.si.czi.technology/applications/certain-bull?resource=&node=%2FPod%2Fargus-single-cell-explorer-rdev%2Fcertain-bull-stack-explorer-7654d9cc58-8nzrg%2F0&tab=logs

Is that because rdev role permission doesn't have access to staging bucket?

The bug I'm trying to address is that rdev gene info endpoint always returns null 🤔

CURL

curl 'https://certain-bull.dev-sc.dev.czi.team/cellxgene/s3_uri/s3%253A%252F%252Fhosted-cellxgene-dev%252Fsuper-cool-spatial.cxg/api/v0.3/geneinfo?geneID=&gene=AR' \
  -H 'accept: application/json' \
  -H 'accept-language: en-US,en;q=0.9,zh-TW;q=0.8,zh;q=0.7,zh-CN;q=0.6' \
  -H 'cookie: argus_sticky_session=1734566048.418.1662.83615|e3f9a91a7a0db7bca0ae9c8322a67838' \
  -H 'dnt: 1' \
  -H 'priority: u=1, i' \
  -H 'referer: https://certain-bull.dev-sc.dev.czi.team/d/super-cool-spatial.cxg/' \
  -H 'sec-ch-ua: "Google Chrome";v="131", "Chromium";v="131", "Not_A Brand";v="24"' \
  -H 'sec-ch-ua-mobile: ?0' \
  -H 'sec-ch-ua-platform: "macOS"' \
  -H 'sec-fetch-dest: empty' \
  -H 'sec-fetch-mode: cors' \
  -H 'sec-fetch-site: same-origin' \
  -H 'user-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36'

whereas on staging it returns actual info:

curl 'https://api.cellxgene.staging.single-cell.czi.technology/cellxgene/s3_uri/s3%253A%252F%252Fhosted-cellxgene-staging%252Fpbmc3k.cxg/api/v0.3/geneinfo?geneID=&gene=ACD' \
  -H 'accept: application/json' \
  -H 'accept-language: en-US,en;q=0.9,zh-TW;q=0.8,zh;q=0.7,zh-CN;q=0.6' \
  -H 'cookie: argus_sticky_session=1734565349.449.38.739565|df4dba4aefd2407066eb62a1488673d2' \
  -H 'dnt: 1' \
  -H 'origin: https://cellxgene.staging.single-cell.czi.technology' \
  -H 'priority: u=1, i' \
  -H 'referer: https://cellxgene.staging.single-cell.czi.technology/' \
  -H 'sec-ch-ua: "Google Chrome";v="131", "Chromium";v="131", "Not_A Brand";v="24"' \
  -H 'sec-ch-ua-mobile: ?0' \
  -H 'sec-ch-ua-platform: "macOS"' \
  -H 'sec-fetch-dest: empty' \
  -H 'sec-fetch-mode: cors' \
  -H 'sec-fetch-site: same-site' \
  -H 'user-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36'

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, that seems likely. I'm not sure what the application code is running to produce that error but looking at the role used by rdev I'm not seeing permission for the staging bucket
https://github.com/chanzuckerberg/single-cell-explorer/blob/main/.infra/common.yaml#L24

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see! I think the code that produces the error is this line in the Explorer BE code:

raise ValueError(f"Unable to discover s3 region name from {path}")

My hypothesis is that the Explorer BE is returning null for gene info query now, because behind the scene Explorer GE actually queries Data Portal's dev gene info endpoint for data, and I believe dev data portal endpoint is no longer working, thus us seeing the issue

https://cellxgene.dev.single-cell.czi.technology/ <-- gets 503 service not available now

So I'm thinking the solution here is for Explorer rdev to switch from hitting dev to staging and thus needing permission to staging bucket

Should I just change the common iam from dev to staging ?

arn:aws:iam::699936264352:role/data_portal_staging_explorer

Let me try that out! Thanks for pointing me to the code!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might want to check if there are other differences in the permissions between the dev role and staging role. You could probably pretty easily add permission to the dev role for the staging bucket

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah thanks for the callout! Okay let me test if using staging role actually works, if so I'll just add staging bucket permission to dev role 🙆‍♂️

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huh yeah github org wide search didn't return anything o_o

https://github.com/search?q=org:chanzuckerberg%20jheath-argus-demo-deleteme&type=code

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ooh it looks like the access key and secret key set with argus secrets belongs to that user. (pass -r to see the values)

» argus list secrets --app single-cell-explorer --env rdev                                                                                                                                                                                          hspitzley@MacBookPro
Listing environment variables on environment rdev
+-----------------------+----------+-------------+
|          KEY          |  VALUE   |   SOURCE    |
+-----------------------+----------+-------------+
| AWS_ACCESS_KEY_ID     | ******** | environment |
| AWS_SECRET_ACCESS_KEY | ******** | environment |
+-----------------------+----------+-------------+

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was some reason we needed it to assume a user instead of a role (I'm forgetting why right now) but this gives us a path forward. You could copy those values elsewhere so you can restore them later and then overwrite them with the corresponding values from staging and I think that will make it able to access the staging bucket

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh fantastic!! Thanks so much for helping me out with this 🤩 🙏 That's really good to know 💡

Okay will do and retest 🫡

Thanks again!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay can confirm that my rdev can now hit staging endpoint thank you!

However, somehow the staging endpoint is still returning null, so I'm adding more logs to troubleshoot what's going on there. Will update this thread 👌

2 changes: 1 addition & 1 deletion client/__tests__/e2e/test_config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ server:
verbose: false

gene_info:
api_base: "https://api.cellxgene.dev.single-cell.czi.technology/gene_info/v1"
api_base: "https://api.cellxgene.staging.single-cell.czi.technology/gene_info/v1"
default_dataset:
presentation:
max_categories: 1000
Expand Down
19 changes: 16 additions & 3 deletions server/common/rest.py
Original file line number Diff line number Diff line change
Expand Up @@ -293,14 +293,27 @@ def gene_info_get(request):
"""
api_base_url = current_app.app_config.server__gene_info__api_base
headers = {"Content-Type": "application/json", "Accept": "application/json"}
url = f"{api_base_url}/gene_info?geneID={request.args['geneID']}&gene={request.args['gene']}"

current_app.logger.info(f"👀👀👀👀 Requesting gene info from {url}")

try:
response = requests.get(
url=f"{api_base_url}/gene_info?geneID={request.args['geneID']}&gene={request.args['gene']}", headers=headers
)
response = requests.get(url=url, headers=headers)
if response.status_code == 200:
# DEBUG
# DEBUG
# DEBUG
current_app.logger.info("😎😎😎😎😎😎😎😎😎😎😎😎😎😎😎😎😎😎😎 OK")
current_app.logger.info(f"Response content: {response.content}")

return make_response(response.content, HTTPStatus.OK, {"Content-Type": "application/json"})
else:
# in the event of a failed search, return empty response
# DEBUG
# DEBUG
# DEBUG
current_app.logger.error(f"Error:☠️☠️☠️☠️☠️☠️ Received status code {response.status_code} from API")
current_app.logger.error(f"Response content: {response.text}")
return None
except Exception as e:
return abort_and_log(HTTPStatus.BAD_REQUEST, str(e), include_exc_info=True)
Expand Down
47 changes: 34 additions & 13 deletions server/common/utils/data_locator.py
Original file line number Diff line number Diff line change
Expand Up @@ -141,18 +141,39 @@ def discover_s3_region_name(uri): # type: ignore
"""If this is an s3 protocol, discover and return the (aws) region name.
If a return name could not be discovered, or if the uri is not an s3 protocol, return None."""

protocol, _ = DataLocator._get_protocol_and_path(uri) # type: ignore
if protocol == "s3":
bucket = urlparse(uri).netloc
client = boto3.client("s3")
try:
res = client.head_bucket(Bucket=bucket)
except botocore.exceptions.ClientError:
return None

region = res.get("ResponseMetadata", {}).get("HTTPHeaders", {}).get("x-amz-bucket-region")
if region:
return region
try:
protocol, _ = DataLocator._get_protocol_and_path(uri) # type: ignore
if protocol == "s3":
bucket = urlparse(uri).netloc
print(f"Attempting to discover region for bucket: {bucket}")
client = boto3.client("s3")

# DEBUG
# DEBUG
# DEBUG Check identity of assumed role
sts_client = boto3.client("sts")
response = sts_client.get_caller_identity()
print("~~~~~~~~~~~~~~~~~~~~~~~~😎~~~~~~~~~~~~~~~~~~~~~~~~")
print(f"Assumed role details: {response}")
print("~~~~~~~~~~~~~~~~~~~~~~~~😎~~~~~~~~~~~~~~~~~~~~~~~~")

try:
res = client.head_bucket(Bucket=bucket)
print(f"Received response for head_bucket: {res}")
except botocore.exceptions.ClientError as e:
print(f"ClientError occurred while accessing bucket {bucket}: {e}")
return None

region = res.get("ResponseMetadata", {}).get("HTTPHeaders", {}).get("x-amz-bucket-region")
if region:
print(f"Region for bucket {bucket} discovered: {region}")
return region
else:
print(f"Region not found in response for bucket {bucket}")
return None
else:
print(f"URI does not use the S3 protocol: {uri}")
return None
return None
except Exception as e:
print(f"An unexpected error occurred while processing URI: {uri}. Error: {e}")
return None
4 changes: 2 additions & 2 deletions server/tests/unit/common/config/test_server_config.py
Original file line number Diff line number Diff line change
Expand Up @@ -65,14 +65,14 @@ def test_handle_data_locator_can_read_from_dataroot(self, mock_discover_region_n
mock_discover_region_name.return_value = "us-west-2"
dataroots = {
"d1": {"base_url": "set1", "dataroot": "/path/to/set1_datasets/"},
"d2": {"base_url": "set2/subdir", "dataroot": "s3://hosted-cellxgene-dev"},
"d2": {"base_url": "set2/subdir", "dataroot": "s3://hosted-cellxgene-staging"},
}
file_name = self.custom_app_config(
dataroots=dataroots, config_file_name=self.config_file_name, data_locator_region_name="true"
)
config = AppConfig(file_name)
self.assertEqual(config.server__data_locator__s3_region_name, "us-west-2")
mock_discover_region_name.assert_called_once_with("s3://hosted-cellxgene-dev")
mock_discover_region_name.assert_called_once_with("s3://hosted-cellxgene-staging")

def test_handle_app__sets_web_base_url(self):
config = self.get_config(web_base_url="anything.com")
Expand Down
Loading