Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make use of IaaS functionality on IBM Cloud (s390x) #483

Closed
BbolroC opened this issue Jun 14, 2022 · 3 comments
Closed

Make use of IaaS functionality on IBM Cloud (s390x) #483

BbolroC opened this issue Jun 14, 2022 · 3 comments
Labels
enhancement Improvement to an existing feature needs-review Needs to be assessed by the team.

Comments

@BbolroC
Copy link
Member

BbolroC commented Jun 14, 2022

Which feature do you think can be improved?

ATM, a node called s390x_node_base_ubuntu2004 is running 24/7 and serving the test for s390x on the Marist community cloud. But sometimes it gets unstable (failures on cpu check for k8s test) and slow (longer than 2-hour running time) due to the multi-tenancy feature of the cloud. I have been testing an equivalent t-shirt type of the machine on IBM Cloud since late May. It has been quite good in terms of stability and performance. It also supports the Ansible Galaxy collection for IBM cloud (https://github.com/IBM-Cloud/ansible-collection-ibm) It is confirmed that creation/destruction of the instance is viable (https://github.com/IBM-Cloud/ansible-collection-ibm/tree/master/examples/simple-vm-ssh)
I would like to add/remove a jenkins slave node for s390x on a request of PR like slaves for x86 whose label is ubuntu_20.04. This resolves #470.

How can it be improved?

Describe how specifically you think it could be improved.
Thanks to the IAAS functionality, we could scale out jenkins slaves and handles tests in parallel as requests grow. A clean slate environment with a new instance could remove any test failures due to not tearing down the resources properly. Roughly, the installation of the collection can be achieved by

$ ansible-galaxy collection install ibm.cloudcollection

The creation and destruction of the instance can be achieved by

$ ansible-playbook -e vsi_name $VSI_NAME -e subnet_name=$SUBNET_NAME -e zone=$ZONE_NAME -e fip_name=$FLOATING_IP_NAME create.yml
$ ansible-playbook -e vsi_name $VSI_NAME -e subnet_name=$SUBNET_NAME -e zone=$ZONE_NAME destroy.yml

The arguments above are changed depending on the jenkins node index. (e.g. 1, 2, 3, ...). Other variables are configured in a file or an environment variable.

Additional Information

I do not know exactly how the CI works, but I don't think this is not viable if some of the maintainers help me out in line with https://github.com/kata-containers/ci/blob/main/deployment/packet/README.md.

Before raising this enhancement request

Have you looked at the limitations document?

@BbolroC BbolroC added enhancement Improvement to an existing feature needs-review Needs to be assessed by the team. labels Jun 14, 2022
@BbolroC BbolroC changed the title Make use of IAAS functionality on IBM Cloud (s390x) Make use of IaaS functionality on IBM Cloud (s390x) Jun 14, 2022
@GabyCT
Copy link
Contributor

GabyCT commented Jun 15, 2022

It lgtm, so first you will add the s390x Jenkins slave and then you want to try with x86?

@BbolroC
Copy link
Member Author

BbolroC commented Jun 29, 2022

Hi, Gaby. I have gotten back from vacation. Basically what I am thinking of a workflow without knowledge about how x86 works is as follows:

  1. Configure a slave pool with a bunch of slaves and let them wait for a connection to an instance (let's assume 4 slaves are configured with a predetermined IP address in advance).
  2. Poll the build queue if a job whose name includes a term s390x comes in.
  3. When a job appears, check an available slave from the pool.
  4. Create an instance and make a connection with one of the pre-registered slave if there is a slave not being used.
  5. If no slaves are available, let the job wait until one of the occupied slaves gets released again.
  6. Once a job is finished, an instance handling the job is destroyed and the connected slave gets into a disconnected state.

Yes, this introduces a logic to the CI. If the implementation of the logic does not look trivial, we could make the procedure dynamic and random like:

  1. No pre-configuration for a slave
  2. Poll the build queue if a job whose name includes a term s390x comes in.
  3. When a job appears, create an instance and configure a slave with a random name based on info from the creation and the job
  4. Once a job is finished, an instance is removed and the associated slave is also deleted.

This looks simpler and cleaner, but one concern about this is that an instant creation and release of an IP (floating IP on IBM Cloud) costs a certain amount of money no matter how long it is used. (e.g. assume that it costs $1 for create-release. if a job is triggered 10 times, this accounts for $10) This would offset one of the goals I would like to achieve through this (financially).

I look forward to your feedback. Thanks. 😀

@BbolroC
Copy link
Member Author

BbolroC commented Jul 15, 2022

Done with the following setup:

@BbolroC BbolroC closed this as completed Jul 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Improvement to an existing feature needs-review Needs to be assessed by the team.
Projects
None yet
Development

No branches or pull requests

2 participants