-
Notifications
You must be signed in to change notification settings - Fork 210
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[nri-bundle] Error: couldn't find key cluster-id in Secret newrelic/pl-cluster-secrets #661
Comments
In case this helps, this is the Terraform code we're using to deploy the chart: resource "helm_release" "newrelic" {
count = var.enable_newrelic ? 1 : 0
chart = "nri-bundle"
name = "newrelic-bundle"
repository = "https://helm-charts.newrelic.com"
version = "3.3.0"
create_namespace = true
namespace = "newrelic"
timeout = 900
max_history = 10
values = [
templatefile("${path.module}/templates/values-newrelic.yaml",
{
clusterName = var.cluster_name
newRelicLicenseKey = local.shared-licences["newrelicLicenseKey"]
pixieApiKey = local.shared-licences["newrelicPixieApiKey"]
pixieChartKey = local.shared-licences["newrelicPixieChartKey"]
}),
]
depends_on = [
kubectl_manifest.pixie-viziers,
kubectl_manifest.pixie-crd,
]
}
resource "kubectl_manifest" "pixie-viziers" {
count = var.enable_newrelic ? length(data.kubectl_file_documents.pixie-viziers[0].documents) : 0
yaml_body = element(data.kubectl_file_documents.pixie-viziers[0].documents, count.index)
depends_on = [
null_resource.cluster_up
]
}
resource "kubectl_manifest" "pixie-crd" {
count = var.enable_newrelic ? length(data.kubectl_file_documents.pixie-crd[0].documents) : 0
yaml_body = element(data.kubectl_file_documents.pixie-crd[0].documents, count.index)
depends_on = [
null_resource.cluster_up
]
}
data "kubectl_file_documents" "pixie-viziers" {
count = var.enable_newrelic ? 1 : 0
content = data.http.pixie-viziers[0].body
}
data "kubectl_file_documents" "pixie-crd" {
count = var.enable_newrelic ? 1 : 0
content = data.http.pixie-crd[0].body
}
data "http" "pixie-viziers" {
count = var.enable_newrelic ? 1 : 0
url = "https://raw.githubusercontent.com/pixie-labs/pixie/release/cloud/prod/1642205277/k8s/operator/crd/base/px.dev_viziers.yaml"
}
data "http" "pixie-crd" {
count = var.enable_newrelic ? 1 : 0
url = "https://raw.githubusercontent.com/pixie-labs/pixie/release/cloud/prod/1642205277/k8s/operator/helm/crds/olm_crd.yaml"
}
data "aws_ssm_parameter" "licences" {
count = var.enable_newrelic ? 1 : 0
name = "/shared/licences"
} Where the values file template is: global:
cluster: ${clusterName}
licenseKey: ${newRelicLicenseKey}
lowDataMode: true
kubeEvents:
enabled: true
webhook:
enabled: true
prometheus:
enabled: true
logging:
enabled: true
ksm:
enabled: false
newrelic-infrastructure:
privileged: true
newrelic-pixie:
apiKey: ${pixieApiKey}
enabled: true
pixie-chart:
clusterName: ${clusterName}
deployKey: ${pixieChartKey}
enabled: true We're deploying to an EKS cluster and from what I've observed, the |
Our workaround for now is to set |
i am having the same issue which is "[“pl-cluster-secrets can not find” when i try to install the helm chart.
but still not working, i tried to delete the pods, but it did not work. |
After updating to the latest chart version ( |
I think I found the problem, I am having the same issue with one of the vizer-metadata pods:
I think this is preventing vizer from updating that secret. |
The thing here is that is not on It is a circular dependency that, for now, we cannot solve. Raise the amount of |
Isn't pixie part of newrelic? Where can we raise an issue for this to get solved in the pixie operator? |
We are all from New Relic but in different teams in the same way that I have asked in the internal Slack for somebody to take a look. They should prioritize issues on their boards so they can take some time to answer. |
Hello! I am from the Pixie team at New Relic. This isn't an issue with the Pixie operator, but how the current NR/Pixie integration works. This issue should be fixed by an update to how the whole NR/Pixie integration mechanism will work. This should be out by the end of the month, and we will update here when that is ready. |
Hey, |
Yea, I also got the error |
@aimichelle : can you please take a look and suggest the next steps? |
Bug description
I'm trying to install the nri-bundle-3.3.0 chart using terraform and sometimes, not always, the installation fails because one of the pods fails to start within the wait time set for the helm release.
I'm setting a helm timeout of 900 seconds, and still, sometimes that's not enough...
When I inspect the Pod that is failing to start, I see the following error in its events:
If I wait for long enough, it eventually works, a way to speed it up is to delete the failed Pod until it succeeds, but I don't think this is viable, we're usgin Terraform to provision our clusters, and we end up wasting time because of this when it sould be able to run unattended.
Version of Helm and Kubernetes
Which chart?
The chart is
nri-bundle-3.3.0
What happened?
The helm release fails, waiting for all the resources to become ready within 900 seconds.
What you expected to happen?
I would expect the deployment to succeed, this seems to be some sort of race condition where a secret (
pl-cluster-secrets
) is being created/updated after the pod that needs it to start, so I'd expect that secret to be ready before the deployment is created.I would also expect 900 seconds to be enough time for any helm release.
How to reproduce it?
Just a normal helm install as mentioned in the readme, these are the values I'm using:
This seems to be similar to #539
The text was updated successfully, but these errors were encountered: