Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add timeout for linodego http client #219

Merged
merged 2 commits into from
Jun 20, 2024
Merged

add timeout for linodego http client #219

merged 2 commits into from
Jun 20, 2024

Conversation

rahulait
Copy link
Contributor

@rahulait rahulait commented Jun 18, 2024

Why this PR is needed

We see cloud-controller-manager stuck for 10-15 mins when doing vpcless installs with firewalling enabled using cluster-api-provider-linode. Upon further troubleshooting, we see some calls getting stuck for more than 10 mins:

2024/06/17 13:23:11.262239 DEBUG RESTY
==============================================================================
~~~ REQUEST ~~~
GET  /v4/linode/instances  HTTP/1.1
HOST   : api.linode.com
HEADERS:
	Accept: application/json
	Authorization: Bearer XXXX
	Content-Type: application/json
	User-Agent: linode-cloud-controller-manager linodego/v1.33.0 https://github.com/linode/linodego
BODY   :
***** NO CONTENT *****
------------------------------------------------------------------------------
~~~ RESPONSE ~~~
STATUS       : 200 OK
PROTO        : HTTP/1.1
RECEIVED AT  : 2024-06-17T13:23:11.262101656Z
TIME DURATION: 15m47.632783808s
HEADERS      :
	Access-Control-Allow-Credentials: true

This PR adds timeout for http requests so that they complete within the specified time (default 2 min, configurable using LINODE_REQUEST_TIMEOUT environment variable).

Steps to reproduce the issue using CAPL:

  1. Provision new CAPL cluster
CONTROL_PLANE_MACHINE_COUNT=3 WORKER_MACHINE_COUNT=3 FW_AUDIT_ONLY=false clusterctl generate cluster vpcless --infrastructure local-linode:v0.0.0 --flavor kubeadm-vpcless | k apply -f -
  1. Check time taken by nodes to come up and join the cluster. Third control plane node would have taken more than 12-15mins after the second control plane node joined the cluster.

How to test the fix

Provision CAPL cluster for vpcless flavor with linode-ccm image set to image built from this patch. I have a custom image with this patch at docker.io/rahulait/ccm:timeout. Once provisioned, check and see that nodes have joined the cluster without much gaps between their provisioning.

General:

  • Have you removed all sensitive information, including but not limited to access keys and passwords?
  • Have you checked to ensure there aren't other open or closed Pull Requests for the same bug/feature/question?

Pull Request Guidelines:

  1. Does your submission pass tests?
  2. Have you added tests?
  3. Are you addressing a single feature in this PR?
  4. Are your commits atomic, addressing one change per commit?
  5. Are you following the conventions of the language?
  6. Have you saved your large formatting changes for a different PR, so we can focus on your work?
  7. Have you explained your rationale for why this feature is needed?
  8. Have you linked your PR to an open issue

Copy link
Member

@AshleyDumaine AshleyDumaine left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@AshleyDumaine AshleyDumaine added the bugfix for any bug fixes in the changelog. label Jun 18, 2024
cloud/linode/client/client.go Outdated Show resolved Hide resolved
cloud/linode/cloud.go Outdated Show resolved Hide resolved
@rahulait rahulait merged commit 6a4e482 into main Jun 20, 2024
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bugfix for any bug fixes in the changelog.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants