Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resolve port conflict when starting multiple PMM agents #2573

Closed
wants to merge 12 commits into from

Conversation

GuanqunYang193
Copy link
Contributor

In our setting in production, we often deploy different instances of PMM clients on the same machines. However, we occasionally encounter port conflict issues, which are not automatically resolved until we restart the PMM client. The reason for this problem is as follows:

  • When the PMM client starts agents, it checks whether the port is unoccupied and then uses the selected port to launch other agents. However, when we start multiple PMM clients simultaneously (in our experiments, reproducing this problem consistently requires 5 or more concurrent instances), there is a race condition issue with the ports. This means that different clients simultaneously consider the same port available.

  • When the PMM client fails to start the agent process, it keeps retrying with the same parameters. This situation prevents the process from starting successfully.

Therefore, I submitted this PR to address the issue:

  • implemented a limit on the number of retries for the process. When the retries exceed this limit, the process will request new parameters from the supervisor.
  • When the supervisor detects that the process needs new parameters, it will select a new port and pass the updated parameters to the process.
  • To test this mechanism, I also submitted a new test using "nc" to simulate scenarios of port conflicts.

@GuanqunYang193 GuanqunYang193 requested a review from a team as a code owner October 24, 2023 23:09
@GuanqunYang193 GuanqunYang193 requested review from idoqo and artemgavrilov and removed request for a team October 24, 2023 23:09
@it-percona-cla
Copy link

it-percona-cla commented Oct 24, 2023

CLA assistant check
All committers have signed the CLA.

@BupycHuk
Copy link
Member

Hi @GuanqunYang193, can we learn the reason you run multiple PMM clients on the same machine? You can configure each PMM to use different port ranges to not have conflicts between them.

@GuanqunYang193
Copy link
Contributor Author

Hi @GuanqunYang193, can we learn the reason you run multiple PMM clients on the same machine? You can configure each PMM to use different port ranges to not have conflicts between them.

Thanks for quick reply! We have multiple database instance deployed with docker on single machine and each database has their PMM agent because:

  1. We don't want tightly couples the client to the PMM server it reports to, which makes our instance and PMM client hard to migrate.
  2. If a single PMM client on the host, responsible for all instances, encounters an issue like a crash or outage, it results in the loss of metrics for every instance.
  3. We can easily limit the resource of pmm client, which allows for greater flexibility and scalability when adding or modifying database instance.

@GuanqunYang193
Copy link
Contributor Author

Hi @GuanqunYang193, can we learn the reason you run multiple PMM clients on the same machine? You can configure each PMM to use different port ranges to not have conflicts between them.

Using different port ranges can also be relatively complex in this shared environment. The number of available port on shared machines is limited. We also need to deploy new instance and remove old one, which means we have to manage the ports like memory management!
We believe that PMM client should be able to dynamically select and retry new ports within the port range, rather than getting stuck indefinitely when the initial process startup fails.

@BupycHuk
Copy link
Member

Hi @GuanqunYang193,

Using different port ranges can also be relatively complex in this shared environment. The number of available port on shared machines is limited. We also need to deploy new instance and remove old one, which means we have to manage the ports like memory management!

Even with this feature implemented you still have to manage ports pmm-agent providing API on.

We believe that PMM client should be able to dynamically select and retry new ports within the port range, rather than getting stuck indefinitely when the initial process startup fails.

Althought I agree with this statement, I believe that it should be implemented other way:
When supervisor starts process and process fails because of busy port
instead of trying to pass new params into existing object, process should completely stop and be destroyed and then supervisor should create new Process object and start it. We have status changes, so we can add one more status for completely failure case and use it as a trigger on supervisor side.

@GuanqunYang193
Copy link
Contributor Author

GuanqunYang193 commented Jan 19, 2024

When supervisor starts process and process fails because of busy port
instead of trying to pass new params into existing object, process should completely stop and be destroyed and then supervisor should create new Process object and start it. We have status changes, so we can add one more status for completely failure case and use it as a trigger on supervisor side.

Thanks for providing another option!
The idea of adding a failure statue is reasonable and I will change how this feature is implemented.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants