Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SSLEOFError when connecting web3isgner #11

Open
hentai8 opened this issue Dec 16, 2024 · 22 comments
Open

SSLEOFError when connecting web3isgner #11

hentai8 opened this issue Dec 16, 2024 · 22 comments

Comments

@hentai8
Copy link

hentai8 commented Dec 16, 2024

I followed the walkthrough exactly, without customizing anything, but I got an SSLEOFError when I tried web3signer_status as follows

2024-12-05T06:33:17.808Z
LAMBDA_WARNING: Unhandled exception. The most likely cause is an issue in the function code. However, in rare cases, a Lambda runtime update can cause unexpected function behavior. For functions using managed runtimes, runtime updates can be triggered by a function change, or can be applied automatically. To determine if the runtime has been updated, check the runtime version in the INIT_START log entry. If this error correlates with a change in the runtime version, you may be able to mitigate this error by temporarily rolling back to the previous runtime version. For more information, see https://docs.aws.amazon.com/lambda/latest/dg/runtimes-update.html
2024-12-05T06:33:17.808Z
[ERROR] Exception: exception happened: HTTPSConnectionPool(host='signer.devnitrovalidator.private', port=443): Max retries exceeded with url: /upcheck (Caused by SSLError(SSLEOFError(8, 'EOF occurred in violation of protocol (_ssl.c:1147)'))) Traceback (most recent call last): File "/var/task/lambda_function.py", line 146, in lambda_handler raise Exception("exception happened: {}".format(e))

I also tried running tests/e2e/e2e_setup.sh at the same time, but got the same error.
I ran cdk destroy before each deploy to make sure that no resource errors from the previous deploy will affect this one.

I really want to use this technique so I've tried it over and over again but it's always this SSLEOFError, please let me know if you need more info, thanks a lot!

@hentai8
Copy link
Author

hentai8 commented Dec 16, 2024

The log from running the e2e test is as follows

+ ./scripts/generate_key_policy.sh nitro_validator_output.json
+ aws kms put-key-policy --policy-name default --key-id arn:aws:kms:us-east-1:xxx:key/xxx --policy file://key_policy.json
+ aws lambda invoke --function-name arn:aws:lambda:us-east-1:xxx:function:devNitroValidator-NitroInvokeLambda398BB8E0-xxx --cli-binary-format raw-in-base64-out --payload '{"operation": "set_tls_key"}' lambda-output
{
    "StatusCode": 200,
    "ExecutedVersion": "$LATEST"
}
+ ./scripts/start_signing_service.sh nitro_validator_output.json
i-xxx:
● nitro-signing-server.service - Nitro Enclaves Signing Server Loaded: loaded (/etc/systemd/system/nitro-signing-server.service; enabled; vendor preset: disabled) Active: active (running) since Mon 2024-12-16 08:31:23 UTC; 15s ago Main PID: 4057 (python3) Tasks: 5 Memory: 27.7M CGroup: /system.slice/nitro-signing-server.service ├─4057 python3 /home/ec2-user/app/watchdog.py └─4092 /bin/nitro-cli run-enclave --cpu-count 2 --memory 3806 --eif-path /home/ec2-user/app/server/signing_server.eif --enclave-cid 16 --debug-mode Dec 16 08:31:23 ip-10-0-99-156.ec2.internal systemd[1]: Started Nitro Enclaves Signing Server. Dec 16 08:31:23 ip-10-0-99-156.ec2.internal watchdog.py[4057]: Start allocating memory... Dec 16 08:31:25 ip-10-0-99-156.ec2.internal watchdog.py[4057]: Started enclave with enclave-cid: 16, memory: 3806 MiB, cpu-ids: [1, 3]
i-xxx:
● nitro-signing-server.service - Nitro Enclaves Signing Server Loaded: loaded (/etc/systemd/system/nitro-signing-server.service; enabled; vendor preset: disabled) Active: active (running) since Mon 2024-12-16 08:31:23 UTC; 15s ago Main PID: 4011 (python3) Tasks: 5 Memory: 27.9M CGroup: /system.slice/nitro-signing-server.service ├─4011 python3 /home/ec2-user/app/watchdog.py └─4043 /bin/nitro-cli run-enclave --cpu-count 2 --memory 3806 --eif-path /home/ec2-user/app/server/signing_server.eif --enclave-cid 16 --debug-mode Dec 16 08:31:23 ip-10-0-183-85.ec2.internal systemd[1]: Started Nitro Enclaves Signing Server. Dec 16 08:31:23 ip-10-0-183-85.ec2.internal watchdog.py[4011]: Start allocating memory... Dec 16 08:31:25 ip-10-0-183-85.ec2.internal watchdog.py[4011]: Started enclave with enclave-cid: 16, memory: 3806 MiB, cpu-ids: [1, 3]
{
    "Version": 2,
    "Tier": "Standard"
}

(16/12/2024 08:31:41) service has been started and is healthy
+ ./tests/e2e/web3signer_status.sh nitro_validator_output.json

16/12/2024 08:31:41: sending request
{
    "StatusCode": 200,
    "FunctionError": "Unhandled",
    "ExecutedVersion": "$LATEST"
}
result: {"errorMessage": "exception happened: HTTPSConnectionPool(host='signer.devnitrovalidator.private', port=443): Max retries exceeded with url: /upcheck (Caused by SSLError(SSLEOFError(8, 'EOF occurred in violation of protocol (_ssl.c:1147)')))", "errorType": "Exception", "requestId": "8e74ab26-3d20-420d-b857-ded30a7a99fa", "stackTrace": ["  File \"/var/task/lambda_function.py\", line 146, in lambda_handler\n    raise Exception(\"exception happened: {}\".format(e))\n"]}

16/12/2024 08:31:57: sending request
{
    "StatusCode": 200,
    "FunctionError": "Unhandled",
    "ExecutedVersion": "$LATEST"
}
result: {"errorMessage": "exception happened: HTTPSConnectionPool(host='signer.devnitrovalidator.private', port=443): Max retries exceeded with url: /api/v1/eth2/publicKeys (Caused by SSLError(SSLEOFError(8, 'EOF occurred in violation of protocol (_ssl.c:1147)')))", "errorType": "Exception", "requestId": "873b8d38-4f0f-44c5-aa3f-fa1e45dd1caf", "stackTrace": ["  File \"/var/task/lambda_function.py\", line 162, in lambda_handler\n    raise Exception(\"exception happened: {}\".format(e))\n"]}

16/12/2024 08:32:14: sending request
{
    "StatusCode": 200,
    "FunctionError": "Unhandled",
    "ExecutedVersion": "$LATEST"
}

@hentai8
Copy link
Author

hentai8 commented Dec 16, 2024

image

@peterpan0708
Copy link

peterpan0708 commented Dec 18, 2024

Hi @dpdornseifer Sir, can you take a look at this issue. We have been stuck here for a long time and still have no idea, much appreciate!

@dpdornseifer
Copy link
Contributor

dpdornseifer commented Dec 19, 2024

Hey @hentai8 @peterpan0708,
the solution was still based on IMDSv1 and was missing a few general updates such as the changed docker security model.

Created a new PR including different changes. Can you please try this branch and let me know if you are still facing the issue #12 : https://github.com/aws-solutions-library-samples/guidance-for-secure-blockchain-validation-using-aws-nitro-enclaves/tree/feature/IMDSv2_enablement

PR will be merged to #main soonish.

Cheers

@hentai8
Copy link
Author

hentai8 commented Dec 19, 2024

@dpdornseifer Thank you so much, sir. I will try it later.

@peterpan0708
Copy link

Hey @hentai8 @peterpan0708, the solution was still based on IMDSv1 and was missing a few general updates such as the changed docker security model.

Created a new PR including different changes. Can you please try this branch and let me know if you are still facing the issue #12 : https://github.com/aws-solutions-library-samples/guidance-for-secure-blockchain-validation-using-aws-nitro-enclaves/tree/feature/IMDSv2_enablement

PR will be merged to #main soonish.

Cheers

Hi sir, when I'm using python3.9 i got this error:

image

after I upgrade my python version to 3.12, i get this error:

./scripts/start_signing_service.sh output.json
i-020b2e3625ec8d694:
----------ERROR------- Unit nitro-signing-server.service could not be found. failed to run commands: exit status 4
i-04b887bd468479ac6:
----------ERROR------- Unit nitro-signing-server.service could not be found. failed to run commands: exit status 4
{
"Version": 2,
"Tier": "Standard"
}

@dpdornseifer
Copy link
Contributor

@peterpan0708
Copy link

@peterpan0708 did you delete the previous stack and do the entire setup/walkthrough (https://github.com/aws-solutions-library-samples/guidance-for-secure-blockchain-validation-using-aws-nitro-enclaves/blob/main/docs/walkthrough.md) again?

yes, i run cdk destroy prodNitroValidator command

@peterpan0708
Copy link

peterpan0708 commented Dec 20, 2024

@peterpan0708 did you delete the previous stack and do the entire setup/walkthrough (https://github.com/aws-solutions-library-samples/guidance-for-secure-blockchain-validation-using-aws-nitro-enclaves/blob/main/docs/walkthrough.md) again?

On the new branch , after the deployment, I login in to the prodNitroValidator, there is also no /etc/systemc/system/nitro-signing-server.service.

It just seems failed to bootstrap the workflow.
On the main branch, it works

@dpdornseifer
Copy link
Contributor

Can you please remove the ECR image and run tests/e2e_setup.sh. What platform are you deploying from arm or x86?

@peterpan0708
Copy link

Can you please remove the ECR image and run tests/e2e_setup.sh. What platform are you deploying from arm or x86?

I'm using x86(t3.small)

I run the e2e_setup.sh, and got this error:

image

It says The maximum number of VPCs has been reached.
how many vpcs do i need?

@peterpan0708
Copy link

Can you please remove the ECR image and run tests/e2e_setup.sh. What platform are you deploying from arm or x86?

I'm using x86(t3.small)

I run the e2e_setup.sh, and got this error:

image It says The maximum number of VPCs has been reached. how many vpcs do i need?

My mistake, the region in the e2e_setup.sh is us-east-1 by default, we already have 5 vpcs in that area.

I run e2e_setup.sh in ap-east-1 and I still got this :

image

@hentai8
Copy link
Author

hentai8 commented Dec 20, 2024

The new branch you provided is working fine and I can get the pubkey from web3signer in enclaves, thanks a lot for your help!

image

Hey @hentai8 @peterpan0708, the solution was still based on IMDSv1 and was missing a few general updates such as the changed docker security model.

Created a new PR including different changes. Can you please try this branch and let me know if you are still facing the issue #12 : https://github.com/aws-solutions-library-samples/guidance-for-secure-blockchain-validation-using-aws-nitro-enclaves/tree/feature/IMDSv2_enablement

PR will be merged to #main soonish.

Cheers

@dpdornseifer
Copy link
Contributor

@peterpan0708 @hentai8 realized that there were still some potential issues with multi region environments and also solved a compatibility issue in the e2e_test script causing an error running on Linux. Pushed a minor update to the branch.

@peterpan0708
Copy link

peterpan0708 commented Dec 23, 2024

@peterpan0708 @hentai8 realized that there were still some potential issues with multi region environments and also solved a compatibility issue in the e2e_test script causing an error running on Linux. Pushed a minor update to the branch.

@dpdornseifer sir, still can't run correctly in ap-east-1.

error is still:
starting signing service
i-035f795d75d994b4b:
----------ERROR------- Unit nitro-signing-server.service could not be found. failed to run commands: exit status 4
i-0f3e55ea6dabe4caa:
----------ERROR------- Unit nitro-signing-server.service could not be found. failed to run commands: exit status 4
{
"Version": 2,
"Tier": "Standard"
}

(23/12/2024 05:57:02) service has been started and is healthy

@dpdornseifer
Copy link
Contributor

dpdornseifer commented Dec 30, 2024

Hallo @peterpan0708,

just ran a e2e test from scratch (including cdk bootstrap) in ap-southeast-1 without any issue (dev and prod stack). Are you able to provide some more context about the issue above? Is the user-data script on the EC2 instance not able to download the docker images from the associated ECR?

@peterpan0708
Copy link

Hallo @peterpan0708,

just ran a e2e test from scratch (including cdk bootstrap) in ap-southeast-1 without any issue (dev and prod stack). Are you able to provide some more context about the issue above? Is the user-data script on the EC2 instance not able to download the docker images from the associated ECR?

I also try deploy all things from scratch
I delete the ecr repo and s3 bucket, and i begin from cdk bootstrap, still can't work correctly.

@peterpan0708
Copy link

Hallo @peterpan0708,

just ran a e2e test from scratch (including cdk bootstrap) in ap-southeast-1 without any issue (dev and prod stack). Are you able to provide some more context about the issue above? Is the user-data script on the EC2 instance not able to download the docker images from the associated ECR?

I login into devNitroValidator, it seems that the I can't get ${WATCHDOG_SYSTEMD_S3_URL} and ${WATCHDOG_S3_URL} correctly
image

@dpdornseifer
Copy link
Contributor

@peterpan0708 ,

these placeholders should be substituted with the right S3 urls during deployment. Has the snipped you shared been taken from the EC2 instance's user-date directly?

@peterpan0708
Copy link

peterpan0708 commented Dec 31, 2024

@peterpan0708 ,

these placeholders should be substituted with the right S3 urls during deployment. Has the snipped you shared been taken from the EC2 instance's user-date directly?

I print ${WATCHDOG_SYSTEMD_S3_URL} and ${WATCHDOG_S3_URL}, these values are correct during the deployment

@peterpan0708
Copy link

peterpan0708 commented Dec 31, 2024

@peterpan0708 ,

these placeholders should be substituted with the right S3 urls during deployment. Has the snipped you shared been taken from the EC2 instance's user-date directly?

I login devNitroValidator instance

aws configure list , it shows the region is None.
image

aws s3 ls cdk-hnb659fds-assets-727392871343-ap-east-1 returns error: An error occurred (IllegalLocationConstraintException) when calling the ListObjectsV2 operation: The ap-east-1 location constraint is incompatible for the region specific endpoint this request was sent to

I have to config the region by myself, and then i can success run the command .
How to solve this?

@peterpan0708
Copy link

Hi sir, @dpdornseifer

After some struggle, I managed to get the entire code working in ap-east-1. However, I had to modify the project's source code. I feel the issue still lies with permissions and the region settings.

In the user_data.sh file, I added the AWS_DEFAULT_REGION and AWS_REGION environment variables. Additionally, I prefixed the AWS SDK commands with sudo.

image image

Is my approach correct?

Additionally, I would like to ask about the upgrade process. Currently, the version of web3Signer in this code is 22.10-jdk11, which is no longer functioning properly. Assuming our production environment is running 22.10-jdk11 and we want to upgrade to 24.12, do we need to modify the application/eth2/enclave/Dockerfile file and then run cdk deploy again? (We definitely cannot run cdk destroy because there is already data in DynamoDB).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants