-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add example of AKS attestation and secret provisioning #38
base: master
Are you sure you want to change the base?
Add example of AKS attestation and secret provisioning #38
Conversation
Signed-off-by: Veena Saini <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 13 of 13 files at r1, all commit messages.
Reviewable status: all files reviewed, 8 unresolved discussions, not enough approvals from maintainers (3 more required), not enough approvals from different teams (2 more required, approved so far: ) (waiting on @veenasai2)
-- commits, line 3 at r1:
I'd like to add more explanations in this commit message. Taking from PR, smth like this:
This commit provides a reference implementation to show how Gramine DCAP attestation examples work in an Azure Kubernetes Service (AKS) cluster. The added example creates two Docker images: one for (non-SGX) `ra-tls-secret-prov` server and one for SGX `ra-tls-secret-prov` client. Both images are deployed as part of the AKS Confidential Compute cluster.
We'll do it during final rebase.
Examples/aks-attestation/README.md, line 18 at r1 (raw file):
server applications, where by default server is running on localhost:4433. In the example, the client sends its SGX quote to the server for verification. After successful quote verification, the server sends a secret to the client. To run these client and server applications inside the AKS
Please start To run these client ...
sentence as a new paragraph.
Examples/aks-attestation/README.md, line 19 at r1 (raw file):
client sends its SGX quote to the server for verification. After successful quote verification, the server sends a secret to the client. To run these client and server applications inside the AKS cluster, user needs to prepare two docker images, one for the client and one for the server. In our
docker
-> Docker
Examples/aks-attestation/README.md, line 28 at r1 (raw file):
`localhost` to `<AKS-DNS-NAME.*.cloudapp.azure.com>`. In order to create base client and server images for the AKS environment, user can execute the
images
-> Docker images
Examples/aks-attestation/README.md, line 61 at r1 (raw file):
2. Create the GSC client image: Note: We tested this example with DCAP driver 1.11 specified in the GSC configuration file.
This looks a bit ugly. Let's re-format like this:
2. Create the GSC client image (note that we tested this example with DCAP driver 1.11 specified in the GSC configuration file):
Please re-wrap my proposed text to 80-char limit.
Examples/aks-attestation/README.md, line 66 at r1 (raw file):
$ cd gsc $ cp config.yaml.template config.yaml $ openssl genrsa -3 -out enclave-key.pem 3072
Actually, let's remove these two commands (cp and openssl genrsa). The GSC user is supposed to have done this already. No need to replicate it in this example.
Examples/aks-attestation/README.md, line 95 at r1 (raw file):
service for the container node. The service will internally connect with az-dcap-client to fetch the platform collateral required for quote generation. In this demo, the `aks-secret-prov-client-deployment.yaml` uses aesmd service exposed by AKS with the help of the
`aks-secret-prov-client-deployment.yaml` uses
->
`aks-secret-prov-client-deployment.yaml` file uses
Examples/aks-attestation/README.md, line 96 at r1 (raw file):
platform collateral required for quote generation. In this demo, the `aks-secret-prov-client-deployment.yaml` uses aesmd service exposed by AKS with the help of the sgxquotehelper plugin.
We have AESMD
in the very first paragraph of this README. Let's be consistent and also replace aesmd
-> AESMD
everywhere in this paragraph.
Signed-off-by: Veena Saini <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 12 of 13 files reviewed, 8 unresolved discussions, not enough approvals from maintainers (3 more required), not enough approvals from different teams (2 more required, approved so far: ), "fixup! " found in commit messages' one-liners (waiting on @dimakuv)
Previously, dimakuv (Dmitrii Kuvaiskii) wrote…
I'd like to add more explanations in this commit message. Taking from PR, smth like this:
This commit provides a reference implementation to show how Gramine DCAP attestation examples work in an Azure Kubernetes Service (AKS) cluster. The added example creates two Docker images: one for (non-SGX) `ra-tls-secret-prov` server and one for SGX `ra-tls-secret-prov` client. Both images are deployed as part of the AKS Confidential Compute cluster.
We'll do it during final rebase.
ok, thanks.
Examples/aks-attestation/README.md, line 18 at r1 (raw file):
Previously, dimakuv (Dmitrii Kuvaiskii) wrote…
Please start
To run these client ...
sentence as a new paragraph.
done, thanks
Examples/aks-attestation/README.md, line 19 at r1 (raw file):
Previously, dimakuv (Dmitrii Kuvaiskii) wrote…
docker
->Docker
done, thanks
Examples/aks-attestation/README.md, line 28 at r1 (raw file):
Previously, dimakuv (Dmitrii Kuvaiskii) wrote…
images
->Docker images
done, thanks
Examples/aks-attestation/README.md, line 61 at r1 (raw file):
Previously, dimakuv (Dmitrii Kuvaiskii) wrote…
This looks a bit ugly. Let's re-format like this:
2. Create the GSC client image (note that we tested this example with DCAP driver 1.11 specified in the GSC configuration file):
Please re-wrap my proposed text to 80-char limit.
80 or 100 char limit?
Examples/aks-attestation/README.md, line 66 at r1 (raw file):
Previously, dimakuv (Dmitrii Kuvaiskii) wrote…
Actually, let's remove these two commands (cp and openssl genrsa). The GSC user is supposed to have done this already. No need to replicate it in this example.
ok, thanks.
Examples/aks-attestation/README.md, line 95 at r1 (raw file):
Previously, dimakuv (Dmitrii Kuvaiskii) wrote…
`aks-secret-prov-client-deployment.yaml` uses
->
`aks-secret-prov-client-deployment.yaml` file uses
done, thanks
Examples/aks-attestation/README.md, line 96 at r1 (raw file):
Previously, dimakuv (Dmitrii Kuvaiskii) wrote…
We have
AESMD
in the very first paragraph of this README. Let's be consistent and also replaceaesmd
->AESMD
everywhere in this paragraph.
done, thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 1 files at r2, all commit messages.
Reviewable status: all files reviewed, 2 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @dimakuv and @veenasai2)
a discussion (no related file):
Bar the commit message change, I'm fine with this PR. Approving.
Examples/aks-attestation/README.md, line 61 at r1 (raw file):
Previously, veenasai2 wrote…
80 or 100 char limit?
Sorry, you have 100 char limit in this document, so you should have aligned to 100 char limit. Which you did. Anyway, resolved.
(In our RST documentation, we use 80 char limit. However, for MD (Markdown) documentation of our examples, we don't impose a strict limit, so using 100 char limit is fine.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: all files reviewed, 1 unresolved discussion, not enough approvals from maintainers (2 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @dimakuv)
a discussion (no related file):
Previously, dimakuv (Dmitrii Kuvaiskii) wrote…
Bar the commit message change, I'm fine with this PR. Approving.
ok, thanks.
Examples/aks-attestation/README.md, line 61 at r1 (raw file):
Previously, dimakuv (Dmitrii Kuvaiskii) wrote…
Sorry, you have 100 char limit in this document, so you should have aligned to 100 char limit. Which you did. Anyway, resolved.
(In our RST documentation, we use 80 char limit. However, for MD (Markdown) documentation of our examples, we don't impose a strict limit, so using 100 char limit is fine.)
ok, thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 12 of 13 files at r1, 1 of 1 files at r2, all commit messages.
Reviewable status: all files reviewed, 19 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @dimakuv and @veenasai2)
Examples/aks-attestation/aks-secret-prov-client-deployment.yaml, line 13 at r2 (raw file):
app: gsc-ra-tls-secret-prov-client spec: volumes:
spec:
volumes:
metadata:
labels:
labels:
app: gsc-ra-tls-secret-prov-client
Please decide on indentation and unify it (here and in the other YAMLs).
Examples/aks-attestation/aks-secret-prov-client-deployment.yaml, line 22 at r2 (raw file):
ra-tls-server-aks-dns.eastus.cloudapp.azure.com:4433
What's this domain?
Examples/aks-attestation/aks-secret-prov-server.dockerfile, line 16 at r2 (raw file):
# version used for quote generation. User can replace the below package with the # latest package. RUN wget https://packages.microsoft.com/ubuntu/18.04/prod/pool/main/a/az-dcap-client/az-dcap-client_1.10_amd64.deb \
Any reason for installing this particular deb manually, but others via apt
?
Examples/aks-attestation/aks-secret-prov-server-deployment.yaml, line 25 at r2 (raw file):
# TODO: Requesting an SGX machine is not needed, but Intel DCAP libraries have a bug of trying to # open the SGX driver (see https://github.com/intel/linux-sgx/issues/756)
double space
Examples/aks-attestation/aks-secret-prov-server-deployment.yaml, line 25 at r2 (raw file):
# TODO: Requesting an SGX machine is not needed, but Intel DCAP libraries have a bug of trying to # open the SGX driver (see https://github.com/intel/linux-sgx/issues/756)
Seems this bug is already fixed?
Examples/aks-attestation/base-image-generation-script.sh, line 6 at r2 (raw file):
echo "***** gramine directory exists, proceeding to image generation *****" else bash ./gramine_build.sh
Hmm, how does this work? Do we build Gramine twice and separately, once here and once in the GSC code? Why? What if both Gramine versions differ?
Examples/aks-attestation/gramine_build.sh, line 2 at r2 (raw file):
dead link
Examples/aks-attestation/gramine_build.sh, line 4 at r2 (raw file):
# Please refer to https://gramine.readthedocs.io/en/latest/building.html#id2 for more details. # install Gramine dependencies
Other comments here are capitalized, please unify
Examples/aks-attestation/gramine_build.sh, line 27 at r2 (raw file):
# Download Gramine git clone https://github.com/gramineproject/gramine.git
Please add --depth=1
.
Examples/aks-attestation/gramine_build.sh, line 31 at r2 (raw file):
mkdir -p meson_build_output # Generate Signing Key
Signing Key
-> signing key
Examples/aks-attestation/README.md, line 1 at r2 (raw file):
Gramine Attestation Inside AKS cluster
Here and in all other places: You seem to capitalize words quite randomly ;) Please change to capitalizing only the first word and proper names.
Examples/aks-attestation/README.md, line 28 at r2 (raw file):
<AKS-DNS-NAME.*.cloudapp.azure.com>
I'm a bit uneasy about this wildcard. Shouldn't we bind to a specific domain, or a list of them instead? Is it possible to create an instance on Azure which has a colliding , but a different wildcarded part?
Examples/aks-attestation/README.md, line 32 at r2 (raw file):
Since both client and server applications will run inside containers in the AKS cluster
What's the point of using SGX then? 🤔
If you want to remove the need for trusting the cloud provider then you can't run the verifier on the very same infrastructure?
Examples/aks-attestation/README.md, line 54 at r2 (raw file):
- Reference deployment file: `aks-secret-prov-server-deployment.yaml`
Is this a code snippet for some AKS configuration file? If so, it should be wrapped in triple backticks. I'm not really sure how to interpret it in the current form, which is a single-entry bullet list.
Examples/aks-attestation/README.md, line 81 at r2 (raw file):
6. Deploy `gsc-aks-secret-prov-client-img:latest` in AKS confidential compute cluster: - Reference deployment file: `aks-secret-prov-client-deployment.yaml`
ditto (formatting)
Examples/aks-attestation/README.md, line 124 at r2 (raw file):
the secret has been provisioned to the client
Wait, who's the client and who's the server here? Usually it's the server which runs inside SGX providing some service and the clients can connect to it and attest it. Here the clients are SGX workers which receive jobs from a trusted server after verification? Or is it some other model?
Examples/aks-attestation/certs/README, line 8 at r2 (raw file):
Loaded in server (verifier), so it will send it to the client during TLS handshake. The "Common Name" field is set to `ra-tls-server-aks-dns.eastus.cloudapp.azure.com`.
Why include such a certificate in the repo? It's unusable by anyone except the author of this PR, which controls ra-tls-server-aks-dns.eastus.cloudapp.azure.com
(as I assume). Or maybe I'm missing something?
Examples/aks-attestation/certs/README, line 9 at r2 (raw file):
handshake. The "Common Name" field is set to `ra-tls-server-aks-dns.eastus.cloudapp.azure.com`. - `server2.key` -- RSA private key in PEM format. Loaded in server (verifier).
This should be generated during build / provided as argument instead of including in the repo, otherwise people may accidentally deploy it somewhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you very much @mkow for reviewing this PR. For now I am just answering your queries. I will soon push all the suggested changes. Thanks.
Reviewable status: all files reviewed, 19 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @dimakuv and @mkow)
Examples/aks-attestation/aks-secret-prov-client-deployment.yaml, line 13 at r2 (raw file):
Previously, mkow (Michał Kowalczyk) wrote…
spec: volumes:
metadata: labels:
labels: app: gsc-ra-tls-secret-prov-client
Please decide on indentation and unify it (here and in the other YAMLs).
sure, thanks.
Examples/aks-attestation/aks-secret-prov-client-deployment.yaml, line 22 at r2 (raw file):
Previously, mkow (Michał Kowalczyk) wrote…
ra-tls-server-aks-dns.eastus.cloudapp.azure.com:4433
What's this domain?
this domain name is assigned to the container where ra-tls-secret-prov server is running. We declare the domain name in aks-secret-prov-server-deployment.yaml file as service.beta.kubernetes.io/azure-dns-label-name: ra-tls-server-aks-dns
Examples/aks-attestation/aks-secret-prov-server.dockerfile, line 16 at r2 (raw file):
Previously, mkow (Michał Kowalczyk) wrote…
Any reason for installing this particular deb manually, but others via
apt
?
In the past we noticed some regression with latest az-dcap-client versions. That's why we kept the one that is working for us. Thanks.
Examples/aks-attestation/aks-secret-prov-server-deployment.yaml, line 25 at r2 (raw file):
Previously, mkow (Michał Kowalczyk) wrote…
double space
sure, I will remove that.
Examples/aks-attestation/aks-secret-prov-server-deployment.yaml, line 25 at r2 (raw file):
Previously, mkow (Michał Kowalczyk) wrote…
Seems this bug is already fixed?
Need to test this one again, thanks
Examples/aks-attestation/base-image-generation-script.sh, line 6 at r2 (raw file):
Previously, mkow (Michał Kowalczyk) wrote…
Hmm, how does this work? Do we build Gramine twice and separately, once here and once in the GSC code? Why? What if both Gramine versions differ?
First step is to create ra-tls-secret-prov libraries which are later used for creating base client and server images. Second step is where we graminize client image as part of gsc. Hence, being of different version will not impact each other. Thanks.
Examples/aks-attestation/gramine_build.sh, line 2 at r2 (raw file):
Previously, mkow (Michał Kowalczyk) wrote…
dead link
Thanks I will replace with working link.
Examples/aks-attestation/gramine_build.sh, line 4 at r2 (raw file):
Previously, mkow (Michał Kowalczyk) wrote…
Other comments here are capitalized, please unify
sure, thanks.
Examples/aks-attestation/gramine_build.sh, line 27 at r2 (raw file):
Previously, mkow (Michał Kowalczyk) wrote…
Please add
--depth=1
.
sure, thanks
Examples/aks-attestation/gramine_build.sh, line 31 at r2 (raw file):
Previously, mkow (Michał Kowalczyk) wrote…
Signing Key
->signing key
sure, thanks
Examples/aks-attestation/README.md, line 1 at r2 (raw file):
Previously, mkow (Michał Kowalczyk) wrote…
Gramine Attestation Inside AKS cluster
Here and in all other places: You seem to capitalize words quite randomly ;) Please change to capitalizing only the first word and proper names.
ok, thanks.
Examples/aks-attestation/README.md, line 28 at r2 (raw file):
Previously, mkow (Michał Kowalczyk) wrote…
<AKS-DNS-NAME.*.cloudapp.azure.com>
I'm a bit uneasy about this wildcard. Shouldn't we bind to a specific domain, or a list of them instead? Is it possible to create an instance on Azure which has a colliding , but a different wildcarded part?
This domain name is user configured domain name for his deployment. Depending upon the Azure subscription this domain could be anything (e.g. eastus, westus) , that's why we kept it open for the user to fill. User just need to provid the AKS-DNS-NAME in server deployment file and the complete dns name can be fetched from the azure portal post deployment.
Examples/aks-attestation/README.md, line 32 at r2 (raw file):
Previously, mkow (Michał Kowalczyk) wrote…
Since both client and server applications will run inside containers in the AKS cluster
What's the point of using SGX then? 🤔
If you want to remove the need for trusting the cloud provider then you can't run the verifier on the very same infrastructure?
Although these are part of the same cluster but they may be located at geographically different machines. Hence the need for SGX attestation for a remote party.
Examples/aks-attestation/README.md, line 54 at r2 (raw file):
Previously, mkow (Michał Kowalczyk) wrote…
- Reference deployment file: `aks-secret-prov-server-deployment.yaml`
Is this a code snippet for some AKS configuration file? If so, it should be wrapped in triple backticks. I'm not really sure how to interpret it in the current form, which is a single-entry bullet list.
This is not a code snippet. This is just a reference deployment file that the user can customise.
Examples/aks-attestation/README.md, line 81 at r2 (raw file):
Previously, mkow (Michał Kowalczyk) wrote…
ditto (formatting)
This is not a code snippet. This is just a reference deployment file that user can customise.
Examples/aks-attestation/README.md, line 124 at r2 (raw file):
Previously, mkow (Michał Kowalczyk) wrote…
the secret has been provisioned to the client
Wait, who's the client and who's the server here? Usually it's the server which runs inside SGX providing some service and the clients can connect to it and attest it. Here the clients are SGX workers which receive jobs from a trusted server after verification? Or is it some other model?
Step 1: Server (not necessarily running on SGX enabled machine): Waiting for connections. It has a domain name:port as ra-tls-server-aks-dns.eastus.cloudapp.azure.com:4433
Step 2: Client (Running on SGX enabled machine): Sends its SGX quote to the above server.
After verifying the quote, the server will provision a dummy secret to the client.
Examples/aks-attestation/certs/README, line 8 at r2 (raw file):
Previously, mkow (Michał Kowalczyk) wrote…
Why include such a certificate in the repo? It's unusable by anyone except the author of this PR, which controls
ra-tls-server-aks-dns.eastus.cloudapp.azure.com
(as I assume). Or maybe I'm missing something?
This is just a sample certificate to get the demo working. We are explicitely calling out the fields the user would have to modify before generating their actual production certificates.
Examples/aks-attestation/certs/README, line 9 at r2 (raw file):
Previously, mkow (Michał Kowalczyk) wrote…
This should be generated during build / provided as argument instead of including in the repo, otherwise people may accidentally deploy it somewhere.
Actually here, I am following the same certficates directory format that we have for https://github.com/gramineproject/gramine/tree/master/CI-Examples/ra-tls-secret-prov/certs . Your point is valid. Here, we are providing the key just to reduce an extra step to the sample working. I can call out seperately in README that the key should not be used in production.
Signed-off-by: Veena Saini <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 9 of 13 files reviewed, 19 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (2 more required, approved so far: ), "fixup! " found in commit messages' one-liners (waiting on @dimakuv and @mkow)
Examples/aks-attestation/aks-secret-prov-server-deployment.yaml, line 25 at r2 (raw file):
Previously, veenasai2 wrote…
sure, I will remove that.
done, thanks.
Examples/aks-attestation/aks-secret-prov-server-deployment.yaml, line 25 at r2 (raw file):
Previously, veenasai2 wrote…
Need to test this one again, thanks
I am still able to see error logs for the server, if I don't use epc memory for the server container. As I can see from the issue (intel/linux-sgx#756) comments that the fix will be merged in some upcoming releases and so far the latest release is on 18 Nov 2021. So , I am still keeping the TODO comment. Thanks
Examples/aks-attestation/gramine_build.sh, line 2 at r2 (raw file):
Previously, veenasai2 wrote…
Thanks I will replace with working link.
done, thanks.
Examples/aks-attestation/gramine_build.sh, line 4 at r2 (raw file):
Previously, veenasai2 wrote…
sure, thanks.
done, thanks.
Examples/aks-attestation/gramine_build.sh, line 27 at r2 (raw file):
Previously, veenasai2 wrote…
sure, thanks
done, thanks.
Examples/aks-attestation/gramine_build.sh, line 31 at r2 (raw file):
Previously, veenasai2 wrote…
sure, thanks
done, thanks
Examples/aks-attestation/README.md, line 1 at r2 (raw file):
Previously, veenasai2 wrote…
ok, thanks.
done, thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 9 of 13 files reviewed, 19 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (2 more required, approved so far: ), "fixup! " found in commit messages' one-liners (waiting on @dimakuv and @mkow)
Examples/aks-attestation/README.md, line 32 at r2 (raw file):
So, so the server attest the client (isn't this flipped? normally we call client the party that does the attestation)? Then who attests the server?
Although these are part of the same cluster but they may be located at geographically different machines. Hence the need for SGX attestation for a remote party.
But you can just use TLS to have machine to machine verification, no need for SGX there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 9 of 13 files reviewed, 19 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (2 more required, approved so far: ), "fixup! " found in commit messages' one-liners (waiting on @dimakuv and @mkow)
Examples/aks-attestation/README.md, line 32 at r2 (raw file):
Previously, boryspoplawski (Borys Popławski) wrote…
So, so the server attest the client (isn't this flipped? normally we call client the party that does the attestation)? Then who attests the server?
Although these are part of the same cluster but they may be located at geographically different machines. Hence the need for SGX attestation for a remote party.
But you can just use TLS to have machine to machine verification, no need for SGX there.
@boryspoplawski Here, the client wants to prove to the server that it is indeed running on an SGX enabled machine. During the mbedTLS handshake the client will first verify server's certificates (available at Examples/aks-attestation/certs/), after that the server will receive the client's X.509 certificate. This X.509 certificate has SGX quote embedded into it. This SGX quote is verified by the server using DCAP quote verification libraries. Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 9 of 13 files reviewed, 19 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (2 more required, approved so far: ), "fixup! " found in commit messages' one-liners (waiting on @dimakuv and @mkow)
Examples/aks-attestation/README.md, line 32 at r2 (raw file):
Previously, veenasai2 wrote…
@boryspoplawski Here, the client wants to prove to the server that it is indeed running on an SGX enabled machine. During the mbedTLS handshake the client will first verify server's certificates (available at Examples/aks-attestation/certs/), after that the server will receive the client's X.509 certificate. This X.509 certificate has SGX quote embedded into it. This SGX quote is verified by the server using DCAP quote verification libraries. Thanks.
We can modify this example to have server run in a separate cluster itself (to imply that both are in separate machines/infra), making SGX attestation relevant. Is this OK?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 9 of 13 files reviewed, 19 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (2 more required, approved so far: ), "fixup! " found in commit messages' one-liners (waiting on @dimakuv and @mkow)
Examples/aks-attestation/README.md, line 32 at r2 (raw file):
We can modify this example to have server run in a separate cluster
It doesn't change anything. Either you trust the cloud provider and do not need SGX, or do not and then you need to verify the process from a trusted machine. A cluster cannot be trusted and untrusted both at the same time, it wouldn't make sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 9 of 13 files reviewed, 19 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (2 more required, approved so far: ), "fixup! " found in commit messages' one-liners (waiting on @dimakuv and @mkow)
Examples/aks-attestation/README.md, line 32 at r2 (raw file):
Previously, boryspoplawski (Borys Popławski) wrote…
We can modify this example to have server run in a separate cluster
It doesn't change anything. Either you trust the cloud provider and do not need SGX, or do not and then you need to verify the process from a trusted machine. A cluster cannot be trusted and untrusted both at the same time, it wouldn't make sense.
What we are saying is that the verifier can run in a separate cluster hosted in one of our local systems - hence making it a trusted cluster.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 9 of 13 files reviewed, 19 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (2 more required, approved so far: ), "fixup! " found in commit messages' one-liners (waiting on @dimakuv and @mkow)
Examples/aks-attestation/README.md, line 32 at r2 (raw file):
Previously, aneessahib (Anees Sahib) wrote…
What we are saying is that the verifier can run in a separate cluster hosted in one of our local systems - hence making it a trusted cluster.
All right, then that verifier doesn't have to run in SGX - there is no need for that since the cluster is trusted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 4 of 4 files at r3, all commit messages.
Reviewable status: all files reviewed, 13 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (2 more required, approved so far: ), "fixup! " found in commit messages' one-liners (waiting on @aneessahib, @boryspoplawski, @dimakuv, @mkow, and @veenasai2)
Examples/aks-attestation/aks-secret-prov-client-deployment.yaml, line 22 at r2 (raw file):
Previously, veenasai2 wrote…
this domain name is assigned to the container where ra-tls-secret-prov server is running. We declare the domain name in aks-secret-prov-server-deployment.yaml file as
service.beta.kubernetes.io/azure-dns-label-name: ra-tls-server-aks-dns
But here you hardcoded some domain, which I guess you own? We don't know where the server will be running for the user who tries this example.
Examples/aks-attestation/aks-secret-prov-server.dockerfile, line 16 at r2 (raw file):
Previously, veenasai2 wrote…
In the past we noticed some regression with latest az-dcap-client versions. That's why we kept the one that is working for us. Thanks.
Could you check if that's still the case? We really shouldn't hardcode some old versions of software (new ones may include security fixes).
Examples/aks-attestation/base-image-generation-script.sh, line 6 at r2 (raw file):
Previously, veenasai2 wrote…
First step is to create ra-tls-secret-prov libraries which are later used for creating base client and server images. Second step is where we graminize client image as part of gsc. Hence, being of different version will not impact each other. Thanks.
@dimakuv: Is this ok? I don't think we'll keep ra-tls compatible between versions? (i.e. that you can use some old client + new server)
Why can't we take the libraries from the image which has Gramine built inside? AFAIR there was a step where GSC built Gramine in a separate image and only afterwards added the user workload, creating a separate image.
Examples/aks-attestation/gramine_build.sh, line 2 at r2 (raw file):
Previously, veenasai2 wrote…
done, thanks.
But now it points to the part of the docs after discussing the kernel driver and its installation, but in the sentence above you mention the driver (the context suggests that the link will explain in more detail the driver requirements).
Examples/aks-attestation/README.md, line 28 at r2 (raw file):
Previously, veenasai2 wrote…
This domain name is user configured domain name for his deployment. Depending upon the Azure subscription this domain could be anything (e.g. eastus, westus) , that's why we kept it open for the user to fill. User just need to provid the AKS-DNS-NAME in server deployment file and the complete dns name can be fetched from the azure portal post deployment.
Yup, but that's not what I'm asking about, the question is whether someone else can get AKS-DNS-NAME.SOMETHING_DIFFERENT_HERE.cloudapp.azure.com
.
Examples/aks-attestation/README.md, line 32 at r2 (raw file):
Previously, boryspoplawski (Borys Popławski) wrote…
All right, then that verifier doesn't have to run in SGX - there is no need for that since the cluster is trusted.
+1 to what Borys says, what's the point of running the server under Gramine if we don't even attest its enclave?
Examples/aks-attestation/README.md, line 54 at r2 (raw file):
Previously, veenasai2 wrote…
This is not a code snippet. This is just a reference deployment file that the user can customise.
But why is this a list of points, with just one point? It reads quite weirdly for me.
Examples/aks-attestation/README.md, line 124 at r2 (raw file):
Previously, veenasai2 wrote…
Step 1: Server (not necessarily running on SGX enabled machine): Waiting for connections. It has a domain name:port as ra-tls-server-aks-dns.eastus.cloudapp.azure.com:4433
Step 2: Client (Running on SGX enabled machine): Sends its SGX quote to the above server.
After verifying the quote, the server will provision a dummy secret to the client.
Could you make it more clear in the readme that this is the model presented here? I think it's the opposite of the standard one (where the enclaves are the servers providing some services).
Or maybe just reverse it and use the more common model?
Examples/aks-attestation/certs/README, line 8 at r1 (raw file):
The "Common Name" field is set to `ra-tls-server-aks-dns.eastus.cloudapp.azure.com`.
I don't understand this. How can this work for anyone except you? This is a domain which you (as I assume) have ownership of. How can someone following this tutorial host anything there?
Examples/aks-attestation/certs/README, line 8 at r2 (raw file):
Previously, veenasai2 wrote…
This is just a sample certificate to get the demo working. We are explicitely calling out the fields the user would have to modify before generating their actual production certificates.
Could we instead generate the certificate during build? This way we won't risk someone deploying this dummy certificate anywhere.
Examples/aks-attestation/certs/README, line 9 at r2 (raw file):
Previously, veenasai2 wrote…
Actually here, I am following the same certficates directory format that we have for https://github.com/gramineproject/gramine/tree/master/CI-Examples/ra-tls-secret-prov/certs . Your point is valid. Here, we are providing the key just to reduce an extra step to the sample working. I can call out seperately in README that the key should not be used in production.
Why not generate the key during build?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: all files reviewed, 14 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (2 more required, approved so far: ), "fixup! " found in commit messages' one-liners (waiting on @aneessahib, @boryspoplawski, @dimakuv, @mkow, and @veenasai2)
a discussion (no related file):
From the discussions, I see that Michal and Borys would prefer a "turn-key" example that would collect all the relevant information (domain names, etc.) and auto-generate the required files (private keys, certificates).
The approach of this PR is more of a tutorial: look how we did it, and adjust our scripts to your parameters (like domain name) and your needs (running the Secret Provisioning server on your local laptop instead of the AKS cluster).
I treated this PR as the latter (the tutorial). I still think that it's a reasonable approach -- we have more or less the same approach with our Examples (they have insecure__
manifest options, which users must tweak for their particular real-world scenarios).
One solution to this I can think of: what about moving this AKS tutorial to https://github.com/gramineproject/contrib?
Examples/aks-attestation/base-image-generation-script.sh, line 6 at r2 (raw file):
Why can't we take the libraries from the image which has Gramine built inside? AFAIR there was a step where GSC built Gramine in a separate image and only afterwards added the user workload, creating a separate image.
Hm, yes, but we don't have a way to create a DCAP-enabled Gramine image via gsc build
currently: https://github.com/gramineproject/gsc/blob/master/templates/Dockerfile.ubuntu.compile.template#L45-L54
IIRC, this PR was started long before we finalized the meson setup -Ddcap=enabled
compilation option. If I would start from scratch, I would first create a PR to the GSC core that adds a new GSC build option like gsc build --enable-dcap
. This would change the GSC build process of Gramine to enable all the DCAP stuff, including RA-TLS and Secret Provisioning libraries. Then I believe we would not need these bash scripts in this PR.
I don't think we'll keep ra-tls compatible between versions? (i.e. that you can use some old client + new server)
Ideally, we should (and we currently do). These are shared libraries -- they are supposed to have backwards-compatible APIs.
Examples/aks-attestation/README.md, line 124 at r2 (raw file):
Or maybe just reverse it and use the more common model?
No, this would be super-confusing. Actually, I tried to look at Kubernetes (secret provisioning) and MS Azure (KeyVault) documents but didn't find any good easy-to-follow terminology.
Could you make it more clear in the readme that this is the model presented here? I think it's the opposite of the standard one (where the enclaves are the servers providing some services).
We have a Secret Provisioning scenario here: a centralized service (Secret Provisioning Server) runs forever and waits for remote systems to connect to it (clients of Secret Provisioning). This service has the secrets loaded into it (passwords, certificates, etc.). Upon client requests, this service validates the client attestation evidence (the SGX quote) and if validation succeeds, this service releases the secret to the client. The clients in this scenario are SGX enclaves.
So the simplest thing to change in this README is to make all mentions of "servers" and "clients" more specific:
server
->Secret Provisioning service
client
->SGX application (client of the Secret Provisioning service)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: all files reviewed, 14 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (2 more required, approved so far: ), "fixup! " found in commit messages' one-liners (waiting on @aneessahib, @dimakuv, @mkow, and @veenasai2)
a discussion (no related file):
see that Michal and Borys would prefer a "turn-key" example that would collect all the relevant information (domain names, etc.) and auto-generate the required files (private keys, certificates).
I've not reviewed the whole PR, just some parts - I've not seen the part with domain names and certificates, so I cannot comment on that (including specifying what I "prefer").
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: all files reviewed, 14 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (2 more required, approved so far: ), "fixup! " found in commit messages' one-liners (waiting on @aneessahib, @dimakuv, @mkow, and @veenasai2)
a discussion (no related file):
From the discussions, I see that Michal and Borys would prefer a "turn-key" example that would collect all the relevant information (domain names, etc.) and auto-generate the required files (private keys, certificates).
That would be better, but even without this, the problem is that there are some magic hardcoded DNS names which has to be replaced by the user without being marked as such. Let's at least put something like "YOUR_CLUSTER_NAME" there.
The approach of this PR is more of a tutorial: look how we did it, and adjust our scripts to your parameters (like domain name) and your needs (running the Secret Provisioning server on your local laptop instead of the AKS cluster).
This point from my side is about suggesting a deployment which makes no sense. I'm fine with a simplistic example, but it should emulate a real case. Here we're e.g. putting both sides into enclaves, but attest only one. Put yourself into shoes of someone who's just learning everything here (that's why they read this example) and they see this. I'd be totally confused.
One solution to this I can think of: what about moving this AKS tutorial to https://github.com/gramineproject/contrib?
For contrib I think the current state would be ok, but definitely not for the official examples.
Examples/aks-attestation/base-image-generation-script.sh, line 6 at r2 (raw file):
Previously, dimakuv (Dmitrii Kuvaiskii) wrote…
Why can't we take the libraries from the image which has Gramine built inside? AFAIR there was a step where GSC built Gramine in a separate image and only afterwards added the user workload, creating a separate image.
Hm, yes, but we don't have a way to create a DCAP-enabled Gramine image via
gsc build
currently: https://github.com/gramineproject/gsc/blob/master/templates/Dockerfile.ubuntu.compile.template#L45-L54IIRC, this PR was started long before we finalized the
meson setup -Ddcap=enabled
compilation option. If I would start from scratch, I would first create a PR to the GSC core that adds a new GSC build option likegsc build --enable-dcap
. This would change the GSC build process of Gramine to enable all the DCAP stuff, including RA-TLS and Secret Provisioning libraries. Then I believe we would not need these bash scripts in this PR.I don't think we'll keep ra-tls compatible between versions? (i.e. that you can use some old client + new server)
Ideally, we should (and we currently do). These are shared libraries -- they are supposed to have backwards-compatible APIs.
Hmm, then I don't like that we're working on a tutorial which does things in an already-deprecated way ;)
How much work would be to add --enable-dcap
to GSC? Seems pretty straightforward and probably also needed by other usecases.
Examples/aks-attestation/README.md, line 124 at r2 (raw file):
Previously, dimakuv (Dmitrii Kuvaiskii) wrote…
Or maybe just reverse it and use the more common model?
No, this would be super-confusing. Actually, I tried to look at Kubernetes (secret provisioning) and MS Azure (KeyVault) documents but didn't find any good easy-to-follow terminology.
Could you make it more clear in the readme that this is the model presented here? I think it's the opposite of the standard one (where the enclaves are the servers providing some services).
We have a Secret Provisioning scenario here: a centralized service (Secret Provisioning Server) runs forever and waits for remote systems to connect to it (clients of Secret Provisioning). This service has the secrets loaded into it (passwords, certificates, etc.). Upon client requests, this service validates the client attestation evidence (the SGX quote) and if validation succeeds, this service releases the secret to the client. The clients in this scenario are SGX enclaves.
So the simplest thing to change in this README is to make all mentions of "servers" and "clients" more specific:
server
->Secret Provisioning service
client
->SGX application (client of the Secret Provisioning service)
But what's the real-world case for such a thing? I'd rather imagine that the server wants to outsource part of the work to the cloud, so it spawns workers in enclaves in the cloud and connect to them. Not that the workers are spawned (by who, actually?) and they connect to the server.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: all files reviewed, 14 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (2 more required, approved so far: ), "fixup! " found in commit messages' one-liners (waiting on @aneessahib, @dimakuv, @mkow, and @veenasai2)
Examples/aks-attestation/base-image-generation-script.sh, line 6 at r2 (raw file):
Previously, mkow (Michał Kowalczyk) wrote…
Hmm, then I don't like that we're working on a tutorial which does things in an already-deprecated way ;)
How much work would be to add--enable-dcap
to GSC? Seems pretty straightforward and probably also needed by other usecases.
I agree that adding --enable-dcap
to GSC is the way to go. We definitely should implement this feature, independent of this PR. And then this PR can build on top of that DCAP-enabling one.
Examples/aks-attestation/README.md, line 124 at r2 (raw file):
Previously, mkow (Michał Kowalczyk) wrote…
But what's the real-world case for such a thing? I'd rather imagine that the server wants to outsource part of the work to the cloud, so it spawns workers in enclaves in the cloud and connect to them. Not that the workers are spawned (by who, actually?) and they connect to the server.
Actually, many cloud deployments will be like this. A good example is Marblerun: https://gramine.readthedocs.io/en/latest/attestation.html#edgeless-marblerun. There is one central entity (the Coordinator), which serves as both the enclave spawner and the secret provisioning service. In case of Marblerun, the Coordinator == the server, and Marbles (SGX enclaves) == clients of this server.
This is not the only example. From what I understand, Microsoft wants to do the same with its MAA (Microsoft Azure Attestation) infrastructure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: all files reviewed, 14 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (2 more required, approved so far: ), "fixup! " found in commit messages' one-liners (waiting on @aneessahib, @dimakuv, @mkow, and @veenasai2)
Examples/aks-attestation/README.md, line 124 at r2 (raw file):
which serves as both the enclave spawner and the secret provisioning service. In case of Marblerun, the Coordinator == the server, and Marbles (SGX enclaves) == clients of this server.
But it's actually the Coordinator connecting and sending data to the workers. Maybe it's just that client-server naming doesn't work well here and we should use coordinator-workers instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: all files reviewed, 14 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (2 more required, approved so far: ), "fixup! " found in commit messages' one-liners (waiting on @aneessahib, @dimakuv, @mkow, and @veenasai2)
Examples/aks-attestation/README.md, line 124 at r2 (raw file):
But it's actually the Coordinator connecting and sending data to the workers.
Not exactly. In Marblerun (in case of Gramine), the Coordinator generates envvars like env.MARBLERUN_COORDINATOR_IP_ADDRESS = "1.1.1.1"
, puts them in the manifest files, transfers this binaries+manifest bundle to the other machine and kicks Gramine to start on that other machine. Then Gramine starts the pre-main logic of the SGX enclave that looks at envvars and connects to the Coordinator. So I would still call it a classic server-client architecture, where SGX enclaves are clients of Coordinator.
Maybe it's just that client-server naming doesn't work well here and we should use coordinator-workers instead?
I would prefer to have a renaming like this then:
- server -> Secret Provisioning service
- client -> SGX application (or enclavized application, or graminized application)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: all files reviewed, 14 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (2 more required, approved so far: ), "fixup! " found in commit messages' one-liners (waiting on @aneessahib, @dimakuv, @mkow, and @veenasai2)
Examples/aks-attestation/README.md, line 124 at r2 (raw file):
I would prefer to have a renaming like this then: [...]
sgtm
Signed-off-by: Veena Saini <[email protected]>
Signed-off-by: Veena Saini <[email protected]>
Signed-off-by: Veena Saini [email protected]
Description of the changes
This PR provides a reference implementation to show how gramine attestation (DCAP) samples works inside AKS cluster. We have created two docker images for ra-tls-secret-prov server and ra-tls-secret-prov client. Both images are deployed as part of AKS confidential compute cluster and both quote generation and quote verification are successful inside AKS cluster.
For client deployment inside AKS cluster, we have gsc/Examples/aks-attestation/aks-secret-prov-client-deployment.yaml and for server deployment gsc/Examples/aks-attestation/aks-secret-prov-server-deployment.yaml file.
For more details, we have created a readme file.
This PR is an updated version of #11.
How to test this PR?
Please follow gsc/Examples/aks-attestation/README.md
This change is