Help - Adding non-Kubernetes workloads to your mesh #12349
Replies: 11 comments 9 replies
-
Hi there. Thank you for the detailed description of the issue. We can help with that. So here are a couple of thoughts. The panics that you see in the non k8s proxy Reaching in-cluster workloads from the non-k8s workload. Logs in k8s proxy Let us know when you check these things and we can take it further. |
Beta Was this translation helpful? Give feedback.
-
@zaharidichev - Thanks for the prompt reply, I was able to resolve the issue. Below are the details.
There was a typo in the LINKERD2_PROXY_IDENTITY_SPIRE_SOCKET
from spire-agent logs
Corrected
I've corrected that and restarted the proxy. Not seeing any panic errors now.
I was able to connect to the external-workload from
The only thing that I noticed is, I am seeing my linkerd-proxy logs on the external-workload flooded with these
Am I missing something in the config? |
Beta Was this translation helpful? Give feedback.
-
Hi there, I am glad things have been resolved. Let me answer your questions one by one:
Configuring it as root was just for the purpose of the demo. There is no restriction as to which user the proxy runs as. In order to configure IPtables you need a privileged user, but this can be done out of band.
This is correct
This is interesting. What happens there is that the proxy tries to contact the policy controller and provides it with a string that describes the workload. What is the value of {
"ns": "mixed-env",
"external_workload": "external-workload"
} In the docs, there are a number of excape chars in order to make it work. Do these get processed correctly on your environment. Is the policy controller logging some errors where it cannot find the workload? |
Beta Was this translation helpful? Give feedback.
-
The environment vars look ok to me, apart from the spacing, not sure if that matters.
Also, the traffic is not balanced between the external-workload and in-cluster workload.
As per the docs, I have create a service that selects over both the machine as well as an in-cluster workload |
Beta Was this translation helpful? Give feedback.
-
I scaled down the in-cluster workload, however, I noticed that the legacy-app svc doesn't have any endpoints, when scaled up both the
After scaling down
And requests fail due to endpoints not being available.
Is is because these services use a common label
Shouldn't the external-workloads service
I believe the problem was with the External Workload CRDs version as policy controller pods were complaining about meshTLS not being found(see error snippet below), while the external workload is created using
The
|
Beta Was this translation helpful? Give feedback.
-
I was able to resolve this problem, by running the external-workload as a non-root user(since the proxy is configured using root user which ignores all traffic from the root. Thanks a lot @zaharidichev for you prompt response 🥂 .
|
Beta Was this translation helpful? Give feedback.
-
I would like a response on the service endpoints for external-workloads. |
Beta Was this translation helpful? Give feedback.
-
Yes, I do see an endpointslices now
|
Beta Was this translation helpful? Give feedback.
-
However, requests from the EC2 instances are still failing with the
From proxy logs on the Ec2 instance
|
Beta Was this translation helpful? Give feedback.
-
That was a typo mistake on my end, I used the correct endpoint now and it still doesn't work
|
Beta Was this translation helpful? Give feedback.
-
I am just experimenting here, I saw this in the proxy logs as well as it was trying to route the request to the service and back to the VM, which is why it is failing. I understand this now. Thanks for the explanation. Also for a This describes creating a Server resource to deny traffic, however our default policy is set to |
Beta Was this translation helpful? Give feedback.
-
We are working on a POC to mesh
external-workloads
, and referenced this https://linkerd.io/2.15/tasks/adding-non-kubernetes-workloads/# guide. However, it's not working as intended and we would like some help.We are using a spire server/agent running on the same EC2 instance with
aws_iid
node attestation and AWS PCA upstream.Additionally, we have connectivity between the EC2 instance and K8s worker nodes.
Versions
Spire agent is attested
external-workload is registered
However, We are seeing admin panic errors in the linkerd-proxy logs on EC2. Note we used the same linkerd-proxy version which is installed in the K8s cluster.
If we use the service endpoint IPs to make a curl request from EC2 external-workload instance it goes through/
If we make a curl request to the EC2 external-workload instance IP, from the client pod it doesn't work and times out.
Seeing the admin panicked errors in the EC2 linkerd-proxy logs where external-workload is running.
Followed by steam of unknown service errors in the EC2 linkerd-proxy logs
Looking at the K8s linkerd-proxy sidecar logs from the client pod. It shows a similar error as seen from the linkerd-proxy logs on the EC2 instance.
Beta Was this translation helpful? Give feedback.
All reactions