Deploy Kubernetes on Jetstream with Kubespray 2.21.0 #46

zonca · 2023-03-02T01:46:44Z

Still finishing testing

jlf599 · 2023-03-02T02:05:05Z

You'll want to change

openstack recordset list tg-xxxxxxxxx.projects.jetstream-cloud.org.

to

_openstack recordset list xxxxxxxxx.projects.jetstream-cloud.org.
_

zonca · 2023-03-02T02:07:51Z

@jlf599 sorry, the link up there was to the old version, I removed it, better if you check the diff in the pull request

jlf599 · 2023-03-02T02:11:40Z

I inadvertently marked ready for review -- sorry. Is it ready or not?

zonca · 2023-03-02T02:37:50Z

No, I'm still testing

zonca · 2023-03-02T22:53:25Z

ok, @jlf599, I completed testing. Everything seems to work fine.

do you have someone that could test the tutorial and provide feedback?

I have already asked @julienchastang

zonca · 2023-03-08T22:37:43Z

@robertej09 @julienchastang would you have time to test the recipe in the next couple of weeks?
Otherwise we can merge it immediately and you can provide feedback later on.

julienchastang · 2023-03-09T16:14:01Z

OK thank you for doing this. Yes, I will try to make some time to evaluate this soon. Thanks again!

zonca · 2023-03-16T04:19:46Z

ok, solved the issue with Designate by disassociating and reassociating the floating IP.
Also, I now use the existing "auto allocated network".

I am debugging some networking issues for which I have an open ticket, however I do not this that they are related to this tutorial.
So it is ready for review and merging.

julienchastang · 2023-03-16T19:43:17Z

OK will review soon. I need to clear a couple of things off of my plate and then will get to this.

jlf599 · 2023-03-16T19:47:39Z

I do think at some point, converting the instance creation methodology is going to be a good idea. Basically, if you create NO networking infrastructure and just create an instance, it should use the Openstack "Just give me a network" protocol and do all of the right things.

If you need to get the public IP, you can use one of these methods to have the instance phone home with it:

wget http://169.254.169.254/latest/meta-data/public-ipv4 -qO -
wget http://ipinfo.io/ip -qO -
curl http://169.254.169.254/latest/meta-data/public-ipv4
curl http://ipinfo.io/ip

Aaron Wells on the JS2 team has code examples for Terraform for setting up. I believe some are linked from the docs site but you can also consult with him directly via Slack if you'd like.

zonca · 2023-03-16T20:00:46Z

I do think at some point, converting the instance creation methodology is going to be a good idea. Basically, if you create NO networking infrastructure and just create an instance, it should use the Openstack "Just give me a network" protocol and do all of the right things.

the problem is that the Kubespray Terraform recipe is quite complex, I am trying to modify it as little as possible to prevent other issues to popup.

If you need to get the public IP, you can use one of these methods to have the instance phone home with it:

wget http://169.254.169.254/latest/meta-data/public-ipv4 -qO - wget http://ipinfo.io/ip -qO - curl http://169.254.169.254/latest/meta-data/public-ipv4 curl http://ipinfo.io/ip

Aaron Wells on the JS2 team has code examples for Terraform for setting up. I believe some are linked from the docs site but you can also consult with him directly via Slack if you'd like.

I think the current workaround is suitable, see in the tutorial where I release and add back the $IP

julienchastang · 2023-03-23T19:08:32Z

I went through the new workflow:

Terraform prompts be about installing some experimental feature to complete the workflow (no biggie)
The VM is now on the auto_allocated_network
The usual workflow ran to completion (terraform / ansible / helm, etc.)
The Hub appears to be up and running judging by kubtctl get pods -A output.
DNS is not working, therefore the Hub URL is unreachable. I did see these instructions. Still trying to figure out what is going on here.

(ping @robertej09)

julienchastang · 2023-03-23T20:34:05Z

OK, I have DNS working now. Just had to:

# Create new DNS record
openstack recordset create \
  --record <floating-ip-of-instance> \
  --type A \
  <project-ID>.projects.jetstream-cloud.org. \
  <your-desired-hostname>.<project-ID>.projects.jetstream-cloud.org.

Thanks Ana (@robertej09) for reminding me that this was in our docs all along.

julienchastang · 2023-03-23T20:39:20Z

Also, I noticed that install_jhub.sh now has a bunch of diagnostic niceties. For Letsencrypt I did not have to run deploymentPatch.sh. I am not sure if that is intentional or not.

zonca · 2023-04-03T17:12:52Z

one issue with this deployment is that Terraform creates a dedicated subnet.
Once the subnet is available, people on the same allocation that create a VM without specifying networking might get assigned to the subnet created by Terraform.
So when we run Terraform delete, the subnet cannot be deleted.

I think this is not a big issue, I have been affected because I am in the Jetstream support allocation and there is a lot of creation/deletion of VMs.

However, I would like to try once more to see if I can modify the Terraform recipes of Kubespray to not create any networking as suggested by @jlf599. Because this should also fix our other issue with Designate.

For reference I will use these Terraform recipes: https://github.com/wellsaar/terraform-js2/blob/main/ubuntu_nginx_mariadb/Ubuntu22.tf

zonca · 2023-05-03T18:45:07Z

I tried hard to make the Terraform recipe use the auto_allocated_network, but I couldn't get it to work, see zonca/jetstream_kubespray#23.

Moreover, even if I get it to work, there are just too many changes that will be difficult to re-apply to every update of Kubespray.

So I think the recipe should continue to create a k8s-internal-network. However, I will add some explanation of this in the tutorial.

I also plan to rerun the tutorial a couple of times and make sure all the steps are fine.
I'll notify again once this is ready to merge.

jlf599 · 2023-05-03T18:53:40Z

It might require a rework, but IIRC, Aaron Wells (JS2 staff) has terraform that utilizes auto_allocated_network

The key difference is when you go to set things up, if you don't create any networking at all and just create instances with the auto_allocated_network and then create a floating ip and attach it, that's all you do. No need to create router, net, subnet, or port.

jlf599 · 2023-05-03T18:54:32Z

the other issue here is that if people on that allocation are using the auto_allocated_ subnet, you might break things for them. I'd highly recommend engaging Mike and Aaron via Slack on this.

zonca · 2023-05-03T23:35:48Z

@jlf599 I think the issue is marginal, I guess not many people happen to deploy kubernetes using kubespray and also launch instances on "auto_allocated_network" in the same allocation. And as long as we make people aware of the problem in the documentation, they can go around it.

@julienchastang @robertej09 do you have a preference between switching to use auto_allocated_network, so that we are creating less networking resources and we do not risk breaking other people in the same allocation using auto_allocated_network themselves (and having larger maintenance burden due to large changes to terraform) and leaving the recipe as it is now creating a dedicated k8s-internal-network?

julienchastang · 2023-05-04T21:08:10Z

Thank you all for your hard work on this matter. While I don't have a strong preference, I do emphasize the importance of clean resource management for the numerous JupyterHub clusters we run. Ensuring that resources are properly torn down without any dangling or orphaned resources is just as important to avoid tedious manual cleanup.

ana-v-espinoza · 2023-05-05T00:20:58Z

Hey all,

I'll admit that I had to read through this a few times to ensure I was understanding what I was looking at. I'm sure attempting to multi-task while doing so didn't help!

As Julien said, thanks all for your hard work. I personally don't have a strong preference either way either, but I can see the merit in both options. While keeping the Terraform workflow minimally modified reduces the maintenance workload and is proven to work, it creates/uses more network resources which not every allocation may have access to. There's also this:

one issue with this deployment is that Terraform creates a dedicated subnet.
Once the subnet is available, people on the same allocation that create a VM without specifying networking might get assigned to the subnet created by Terraform.

and this:

the other issue here is that if people on that allocation are using the auto_allocated_ subnet, you might break things for them.

@zonca , does this happen exclusively when creating VMs through Terraform, or also via the openstack cli or the web portals? Depending on whether or not other Jetstream2 users even use Terraform for their infrastructure creation (outside of this Kubespray workflow), as Andrea says, this may not be a large problem, especially if it's a well documented emergent "feature."

Just my 2 cents,

Ana

jlf599 · 2023-05-15T19:31:55Z

@zonca , does this happen exclusively when creating VMs through Terraform, or also via the openstack cli or the web portals? Depending on whether or not other Jetstream2 users even use Terraform for their infrastructure creation (outside of this Kubespray workflow), as Andrea says, this may not be a large problem, especially if it's a well documented emergent "feature."

This can affect users that are working in ANY of the JS2 interfaces. It's hard to spot if you're using Exosphere exclusively, though. It's easier to track down via Horizon or the CLI.

I think ultimately, moving to the OpenStack "Just give me a network" style -- which basically says you don't create or specify any network bits and OpenStack puts you on the auto_allocated_network and auto_allocated_subnet.

Basically, there's no security gain in isolated subnets or networks. It's not like physical networking where you are physically attaching to devices and being isolated that way. All of this is really handled via iptables and routing rules. So if the desire for doing this is security, it's really not gaining anything. If the desire for this is just to avoid making larger changes, I understand that, though in the long run refactoring makes the whole process simpler for users to troubleshoot.

I would say in the long run, we should work together to see if we can make it work the new way...though that day may not be today (or tomorrow).

julienchastang · 2023-05-21T18:30:31Z

Another complication is the recordset entry needs to be manually deleted upon cluster tear down. Otherwise next time you try to attach a new recordset to the auto_allocated_network , you'll be stymied. It would be nice if the recordset / publicly accessible URL entries were automatically handled as before (i.e., it just worked, for the most part).

jlf599 · 2023-05-23T17:24:47Z

I'm not sure one needs to add a recordset to auto_allocated_network anything. That should just work. We created those zones when we made the transition from TG-xxxxxxx style allocation names to xxxxxxxx style. If something is breaking there, it might have to do with how the networkin is all being created. (i.e. it's doing it the hard way) J On 5/21/23 2:30 PM, Julien Chastang wrote: Another complication is the recordset entry needs to be manually deleted upon cluster tear down. Otherwise next time you try to attach a new recordset to the auto_allocated_network , you'll be stymied. It would be nice if the recordset / publicly accessible URL entries were automatically handled as before (i.e., it just worked, for the most part). — Reply to this email directly, view it on GitHub<#46 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ACZWV72OJOVB7BOQGJMBCQLXHJNNFANCNFSM6AAAAAAVMYW5SQ>. You are receiving this because you were mentioned.Message ID: ***@***.***>

…

-- Jeremy Fischer Research Cloud Infrastructure Manager UITS Research Technologies ***@***.******@***.***>

zonca · 2023-05-23T22:13:44Z

given the feedback I'll try again to make kubespray work with the auto allocated network. It is going to be a big upfront amount of work, hopefully successful, but it should simplify a lot the infrastructure.

jlf599 · 2023-05-23T22:29:00Z

Basically, if you remove all networking setup and just tell the VM launch to use the auto_allocated_network, it will do everything for you. It should hopefully be fairly simple...and definitely would make things simpler for the future. Don't create any ports or nets, subnets, or routers -- just specify on VM creation (it's --nic net-id=auto_allocated_network if you were using the CLI, probably similar for terraform yaml). You'll create and add the floating ip as needed -- on create, you should be able to get the value so you know what it is. I haven't looked at the yaml, but if you think it would help, I would be happy to. On 5/23/23 6:13 PM, Andrea Zonca wrote: given the feedback I'll try again to make kubespray work with the auto allocated network. It is going to be a big upfront amount of work, hopefully successful, but it should simplify a lot the infrastructure. — Reply to this email directly, view it on GitHub<#46 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ACZWV77SNP76PZGWRKYMFGTXHUZCFANCNFSM6AAAAAAVMYW5SQ>. You are receiving this because you were mentioned.Message ID: ***@***.***>

…

-- Jeremy Fischer Research Cloud Infrastructure Manager UITS Research Technologies ***@***.******@***.***>

zonca · 2023-05-23T22:30:47Z

I tried already, the instances have a network, but they cannot connect to each other.

zonca · 2023-05-23T22:31:00Z

anyway I'll try again then ask for help

zonca · 2023-05-30T05:31:43Z

I have some good progress going on! I'll update this soon.

zonca · 2023-07-09T14:49:27Z

thanks @jlf599 @zacharygraber, the tutorial is now working,
there are now a lot more changes to the Terraform recipe, see zonca/jetstream_kubespray#21, however I am mostly removing resources.

I think the PR can be merged.

Then @julienchastang @robertej09 should make a more extensive testing, I deployed a simple JupyterHub and everything seemed to be working, but there might be still something more subtle which is broken.

julienchastang · 2023-07-17T15:37:46Z

OK, sounds good. I've been on vacation, but will try to make time for this soon.

zonca · 2023-07-24T14:00:55Z

@jlf599 @zacharygraber I think this PR can be merged

julienchastang · 2023-07-31T23:05:59Z

@robertej09 and I have been working on this over the last few days and I believe things look good. Just have to remember to replace router_id as described in cluster.tvars. Easy enough, but I wasn't sure if this was mentioned in the blog/docs. Anyway, it looks good from here. Thanks for doing this!

zonca added 2 commits March 1, 2023 17:42

Deploy Kubernetes with Kubespray 2.21.0

45fb8fa

remove link to old tutorial, link to new

e2cd5bc

zonca mentioned this pull request Mar 2, 2023

Branch v2.21.0 existing network zonca/jetstream_kubespray#22

Merged

jlf599 marked this pull request as ready for review March 2, 2023 02:10

jlf599 self-assigned this Mar 2, 2023

zonca marked this pull request as draft March 2, 2023 03:02

zonca marked this pull request as ready for review March 2, 2023 22:53

zonca added 2 commits June 15, 2023 19:58

2.21.0 is probably not the latest version anymore

3ee16f9

updated instructions on modifying cluster.tfvars

191586d

zonca mentioned this pull request Jun 16, 2023

Branch v2.21.0 auto alloc zonca zonca/jetstream_kubespray#24

Closed

zonca added 3 commits July 9, 2023 07:40

remove mention of magnum

d2a28cc

explain new Terraform recipe

d6a9305

Merge branch 'main' into kubespray_2.21.0

ccc85c5

jlf599 merged commit 8bb95c0 into jetstream-cloud:main Jul 26, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deploy Kubernetes on Jetstream with Kubespray 2.21.0 #46

Deploy Kubernetes on Jetstream with Kubespray 2.21.0 #46

zonca commented Mar 2, 2023 •

edited

Loading

jlf599 commented Mar 2, 2023

zonca commented Mar 2, 2023

jlf599 commented Mar 2, 2023

zonca commented Mar 2, 2023

zonca commented Mar 2, 2023

zonca commented Mar 8, 2023

julienchastang commented Mar 9, 2023

zonca commented Mar 16, 2023

julienchastang commented Mar 16, 2023

jlf599 commented Mar 16, 2023

zonca commented Mar 16, 2023

julienchastang commented Mar 23, 2023

julienchastang commented Mar 23, 2023 •

edited

Loading

julienchastang commented Mar 23, 2023

zonca commented Apr 3, 2023

zonca commented May 3, 2023

jlf599 commented May 3, 2023

jlf599 commented May 3, 2023

zonca commented May 3, 2023

julienchastang commented May 4, 2023

ana-v-espinoza commented May 5, 2023

jlf599 commented May 15, 2023

julienchastang commented May 21, 2023

jlf599 commented May 23, 2023 via email

zonca commented May 23, 2023

jlf599 commented May 23, 2023 via email

zonca commented May 23, 2023

zonca commented May 23, 2023

zonca commented May 30, 2023

zonca commented Jul 9, 2023

julienchastang commented Jul 17, 2023

zonca commented Jul 24, 2023

julienchastang commented Jul 31, 2023

Deploy Kubernetes on Jetstream with Kubespray 2.21.0 #46

Deploy Kubernetes on Jetstream with Kubespray 2.21.0 #46

Conversation

zonca commented Mar 2, 2023 • edited Loading

jlf599 commented Mar 2, 2023

zonca commented Mar 2, 2023

jlf599 commented Mar 2, 2023

zonca commented Mar 2, 2023

zonca commented Mar 2, 2023

zonca commented Mar 8, 2023

julienchastang commented Mar 9, 2023

zonca commented Mar 16, 2023

julienchastang commented Mar 16, 2023

jlf599 commented Mar 16, 2023

zonca commented Mar 16, 2023

julienchastang commented Mar 23, 2023

julienchastang commented Mar 23, 2023 • edited Loading

julienchastang commented Mar 23, 2023

zonca commented Apr 3, 2023

zonca commented May 3, 2023

jlf599 commented May 3, 2023

jlf599 commented May 3, 2023

zonca commented May 3, 2023

julienchastang commented May 4, 2023

ana-v-espinoza commented May 5, 2023

jlf599 commented May 15, 2023

julienchastang commented May 21, 2023

jlf599 commented May 23, 2023 via email

zonca commented May 23, 2023

jlf599 commented May 23, 2023 via email

zonca commented May 23, 2023

zonca commented May 23, 2023

zonca commented May 30, 2023

zonca commented Jul 9, 2023

julienchastang commented Jul 17, 2023

zonca commented Jul 24, 2023

julienchastang commented Jul 31, 2023

zonca commented Mar 2, 2023 •

edited

Loading

julienchastang commented Mar 23, 2023 •

edited

Loading