-
Notifications
You must be signed in to change notification settings - Fork 153
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prometheus Plugin Update from 773.v3b_62d8178eec
to 778.ve1c932a_ff24f
prevents Jenkins from startup
#683
Comments
@Dohbedoh any ideas? |
I don't get it why it always happens somewhere else. I've multiple docker based Jenkins instance without ANY issues and I've also tested it on a manual installation on linux, no issues as well. What is different here?! |
I don't think that this is supposed to be here. Could the jenkins test harness dependency be loaded in this instance ? |
Starting an instance with this list, I can't reproduce and can't figure out what includes that class.
This should hopefully tell us what include the Note: you will need to have Jenkins started - so downgrade temporarily the prometheus plugin. |
I also have the same issue, that since upgrading to
|
@tt-kkaiser any chance to get a thread dump or at least the initialization thread stacktrace stuck on startup ? |
@Dohbedoh At the moment thats not possible, because this Issue only happens on my Production Jenkins and not on my Test Jenkins instance, even though both have the same data (docker image, plugins, configurations) |
Stacktrace of deadlock threads would be the quickest way to narrow this down. |
We encountered the same issue as in issue 647. After removing Prometheus and enhanced metrics, which is dependent on Prometheus, it opened successfully. I run @Dohbedoh's script, while Jenkins started without mentioned plugins, nothing printed. |
On my side I'm using test-harness into a maven project to validate plugin loading and dsl script. I'm aware it's a bit different from using a real jenkins instance, but it still highlight in a reproductible way there is something wrong with the plugin lifecycle If I'm able to reproduce on my test environment it should be possible to reproduce it on the plugin test or at least on the jenkins bom that test againts other plugin dependencies To be honnest I'm also a bit lost on how Initilizer works and I saw also some issues with the Perhaps there should be some check to avoid reinitialization ? I can see something like https://github.com/jenkinsci/opentelemetry-plugin/blob/42e64cb74a0e728cbf020333d77d4f5ecbbcc752/src/main/java/io/jenkins/plugins/opentelemetry/JenkinsOpenTelemetryPluginConfiguration.java#L231 on the opentelemetry plugin ? Not sure if it helps |
@Dohbedoh here is a list of all the plugins in the jenkins instance:
|
@Dohbedoh @jonesbusy You guys think it's sufficient enough to call |
As I cannot reproduce the issue, could anyone of the affected users test the PR #684 ? HPI will appear here: https://ci.jenkins.io/job/Plugins/job/prometheus-plugin/job/PR-684/1/ |
Does not fix the issue for me, startup ends on the same part as using v778 did
|
Ok thank you |
I've created a second version using the lifecycle hooks. If anyone could test this again, that would be great: https://ci.jenkins.io/job/Plugins/job/prometheus-plugin/job/PR-684/lastSuccessfulBuild/artifact/org/jenkins-ci/plugins/prometheus/780.ve7641cfcb_594/prometheus-780.ve7641cfcb_594.hpi |
On it! |
@Waschndolos Its still not booting up |
@tt-kkaiser Is it still the same error message? I've rebooted like 20 times on my instance without any error. I don't get it... |
@Waschndolos I never got any error messages. The jenkins instance just stops booting and after some time the Kubernetes Startup Probe kills the Container, but even without the startup probe the jenkins just boots up indefinitely. |
Guess I'll need to setup a local cluster. Let me check.. |
I dont think you need a local cluster, the plugin worked for me when all my build jobs had no build data (the stuff stored in caches/ workspace/ and jobs/**/builds/) but now that I have copied everything from my live instance it stopped booting up so it has something to do with that. |
We dont run on Container also. I believe you should have recently finished jobs or something like that. |
Ok I'll check that, although it's strange I operate a Jenkins on Docker with like > 3k Jobs without any problems.. I'll see if i can reproduce the issue |
I have got the stacktrace whilst the jenkins was booting up:
|
Testing it |
Adding that to disabled metrics, does not seem to help |
Couldn't the JobCollector be initialized asynchronously? |
The |
The difference that I see now, is that the data for the /prometheus endpoint is load at startup which causes a longer startup time, before the #682 change it was load when the endpoint was first requested, which did not cause any issues, because whilst the collectors were initialized, the endpoint just returned an empty 200 OK response. But as soon as all collectors were finished the endpoint would respond with the correct data. I think this behavior should be kept. The change that only one instance exists for the PrometheusMetrics class is okay, but the collectors must be registered asynchronously, that can be done by adding a field called |
Ah didn't see your responses yet. I've added synchronized to the |
Actually not always the case. Guice injector in some environment might initialize earlier than this, as shown in the issue description: #635. I have quickly propose something: #685. And tested the HPI locally. |
Testing it! |
Its working! Thanks for the effort and the quick implementation of the fix 👍 |
Well thanks for your patience ! And sorry for the fuss. Trying to fix one scenario was breaking others... Hopefully this one solve the overall problem.. @jonesbusy could you test this one too ? |
@Dohbedoh Doing it now. Just before I was testing https://github.com/jenkinsci/prometheus-plugin/pull/684/checks?check_run_id=27949816152 (incrementals 781.v3dcc3856e4b_6) and didn't faces the duplicate collector registration I will now test the https://github.com/jenkinsci/prometheus-plugin/pull/685/checks?check_run_id=27950494840 (incrementals 779.vc06615c39172) I will let you know in some minutes my findings |
I confirm it work also on my side with 779.vc06615c39172 |
alrighty then. I'll merge it and perform a release. Thank you all for your support |
-> https://github.com/jenkinsci/prometheus-plugin/releases/tag/779.vb_59179a_27643 - Afk now enjoying my day off today :) Thank you all again! |
Jenkins and plugins versions report
2.452.3 and all plugin up to date
What Operating System are you using (both controller, and any agents involved in the problem)?
All. Official docker image deployed on K8S using Jenkins Helm chart
Reproduction steps
Restart Jenkins
Expected Results
All work as expected like before on
773.v3b_62d8178eec
Actual Results
Anything else?
This is similar to #647 that was fixed on 2.5.3
Are you interested in contributing a fix?
No response
The text was updated successfully, but these errors were encountered: