Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

connection refused while scraping for kube scheduler metrics #35959

Open
shrutichy91 opened this issue Oct 23, 2024 · 4 comments
Open

connection refused while scraping for kube scheduler metrics #35959

shrutichy91 opened this issue Oct 23, 2024 · 4 comments
Assignees
Labels
question Further information is requested receiver/prometheus Prometheus receiver Stale

Comments

@shrutichy91
Copy link

Component(s)

receiver/prometheus

Describe the issue you're reporting

I have a 3 node k8s cluster.
I am using otel as daemonset with the following config:

extensions:
# The health_check extension is mandatory for this chart.
# Without the health_check extension the collector will fail the readiness and liveliness probes.
# The health_check extension can be modified, but should never be removed.
health_check: {}
memory_ballast: {}
bearertokenauth:
token: "XXXXXX"

processors:

batch:
    timeout: 1s
    send_batch_size: 1000
    send_batch_max_size: 2000

# If set to null, will be overridden with values based on k8s resource limits

receivers:

prometheus:
  config:
    scrape_configs:
      - job_name: kube-scheduler-nodeport
        honor_labels: true
        kubernetes_sd_configs:
          - role: pod
            namespaces:
              names:
                - kube-system
        bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
        scheme: https
        tls_config:
          insecure_skip_verify: true
        relabel_configs:
          # Keep pods with the specified labels
          - source_labels:
              [
                __meta_kubernetes_pod_label_component,
                __meta_kubernetes_pod_label_tier,
              ]
            action: keep
            regex: kube-scheduler;control-plane
          - source_labels: [__meta_kubernetes_pod_ip]
            action: replace
            target_label: __address__
            regex: (.*)
            replacement: $$1:10259

otlp:
  protocols:
    grpc:
      endpoint: ${env:MY_POD_IP}:4317
    http:
      endpoint: ${env:MY_POD_IP}:4318

exporters:
logging: {}
prometheusremotewrite:
endpoint: "xxxxxxx"
resource_to_telemetry_conversion:
enabled: true
tls:
insecure: true
auth:
authenticator: bearertokenauth
service:
telemetry:
metrics:
address: ${env:MY_POD_IP}:8888
logs:
level: debug
extensions:
- health_check
- bearertokenauth
pipelines:

  metrics:
    exporters:
      - logging
      - prometheusremotewrite
    processors:
      - batch
    receivers:
      - prometheus

I get the below error.

2024-10-23T12:45:56.402Z debug scrape/scrape.go:1331 Scrape failed {"kind": "receiver", "name": "prometheus", "data_type": "metrics", "scrape_pool": "kube-scheduler", "target": "https://100.xx.xx.xx:10259/metrics", "error": "Get "[https://100.xx.xx.xx:10259/metrics](https://100.xx.xx.xx:10259/metrics%5C)": dial tcp 100.xx.xx.xx:10259: connect: connection refused"}

I have the kube controller as three pods running on one node each on a 3 node cluster in the kube-system namespace.
DO I need a k8s service of type nodeport to get this to work?

I tried to login to the node, and run the curl -kvv https://100.xx.xx.xx:10259/metrics, I get connection refused, but it does work with
curl -kvv https://localhost:10259/metrics

@shrutichy91 shrutichy91 added the needs triage New item requiring triage label Oct 23, 2024
@github-actions github-actions bot added the receiver/prometheus Prometheus receiver label Oct 23, 2024
Copy link
Contributor

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@dashpole
Copy link
Contributor

Is the metrics port exposed on the scheduler pod? You shouldn't need a service if the scheduler is running in-cluster.

@dashpole dashpole added question Further information is requested and removed needs triage New item requiring triage labels Oct 23, 2024
@dashpole dashpole self-assigned this Oct 23, 2024
@Juliaj
Copy link
Contributor

Juliaj commented Oct 31, 2024

In our environment, this is reproducible with build 0.111.0 and not reproducible with 0.110.0.

Copy link
Contributor

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@github-actions github-actions bot added the Stale label Dec 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested receiver/prometheus Prometheus receiver Stale
Projects
None yet
Development

No branches or pull requests

3 participants