-
Notifications
You must be signed in to change notification settings - Fork 16
Deploy Kubernetes on Jetstream with Kubespray 2.21.0 #46
Conversation
You'll want to change openstack recordset list tg-xxxxxxxxx.projects.jetstream-cloud.org. to _openstack recordset list xxxxxxxxx.projects.jetstream-cloud.org. |
@jlf599 sorry, the link up there was to the old version, I removed it, better if you check the diff in the pull request |
I inadvertently marked ready for review -- sorry. Is it ready or not? |
No, I'm still testing |
ok, @jlf599, I completed testing. Everything seems to work fine. do you have someone that could test the tutorial and provide feedback? I have already asked @julienchastang |
@robertej09 @julienchastang would you have time to test the recipe in the next couple of weeks? |
OK thank you for doing this. Yes, I will try to make some time to evaluate this soon. Thanks again! |
ok, solved the issue with Designate by disassociating and reassociating the floating IP. I am debugging some networking issues for which I have an open ticket, however I do not this that they are related to this tutorial. |
OK will review soon. I need to clear a couple of things off of my plate and then will get to this. |
I do think at some point, converting the instance creation methodology is going to be a good idea. Basically, if you create NO networking infrastructure and just create an instance, it should use the Openstack "Just give me a network" protocol and do all of the right things. If you need to get the public IP, you can use one of these methods to have the instance phone home with it: wget http://169.254.169.254/latest/meta-data/public-ipv4 -qO - Aaron Wells on the JS2 team has code examples for Terraform for setting up. I believe some are linked from the docs site but you can also consult with him directly via Slack if you'd like. |
the problem is that the Kubespray Terraform recipe is quite complex, I am trying to modify it as little as possible to prevent other issues to popup.
I think the current workaround is suitable, see in the tutorial where I release and add back the |
I went through the new workflow:
(ping @robertej09) |
OK, I have DNS working now. Just had to:
Thanks Ana (@robertej09) for reminding me that this was in our docs all along. |
Also, I noticed that |
one issue with this deployment is that Terraform creates a dedicated subnet. I think this is not a big issue, I have been affected because I am in the Jetstream support allocation and there is a lot of creation/deletion of VMs. However, I would like to try once more to see if I can modify the Terraform recipes of Kubespray to not create any networking as suggested by @jlf599. Because this should also fix our other issue with Designate. For reference I will use these Terraform recipes: https://github.com/wellsaar/terraform-js2/blob/main/ubuntu_nginx_mariadb/Ubuntu22.tf |
I tried hard to make the Terraform recipe use the auto_allocated_network, but I couldn't get it to work, see zonca/jetstream_kubespray#23. Moreover, even if I get it to work, there are just too many changes that will be difficult to re-apply to every update of Kubespray. So I think the recipe should continue to create a I also plan to rerun the tutorial a couple of times and make sure all the steps are fine. |
It might require a rework, but IIRC, Aaron Wells (JS2 staff) has terraform that utilizes auto_allocated_network The key difference is when you go to set things up, if you don't create any networking at all and just create instances with the auto_allocated_network and then create a floating ip and attach it, that's all you do. No need to create router, net, subnet, or port. |
the other issue here is that if people on that allocation are using the auto_allocated_ subnet, you might break things for them. I'd highly recommend engaging Mike and Aaron via Slack on this. |
@jlf599 I think the issue is marginal, I guess not many people happen to deploy kubernetes using kubespray and also launch instances on "auto_allocated_network" in the same allocation. And as long as we make people aware of the problem in the documentation, they can go around it. @julienchastang @robertej09 do you have a preference between switching to use |
Thank you all for your hard work on this matter. While I don't have a strong preference, I do emphasize the importance of clean resource management for the numerous JupyterHub clusters we run. Ensuring that resources are properly torn down without any dangling or orphaned resources is just as important to avoid tedious manual cleanup. |
Hey all, I'll admit that I had to read through this a few times to ensure I was understanding what I was looking at. I'm sure attempting to multi-task while doing so didn't help! As Julien said, thanks all for your hard work. I personally don't have a strong preference either way either, but I can see the merit in both options. While keeping the Terraform workflow minimally modified reduces the maintenance workload and is proven to work, it creates/uses more network resources which not every allocation may have access to. There's also this:
and this:
@zonca , does this happen exclusively when creating VMs through Terraform, or also via the openstack cli or the web portals? Depending on whether or not other Jetstream2 users even use Terraform for their infrastructure creation (outside of this Kubespray workflow), as Andrea says, this may not be a large problem, especially if it's a well documented emergent "feature." Just my 2 cents, Ana |
This can affect users that are working in ANY of the JS2 interfaces. It's hard to spot if you're using Exosphere exclusively, though. It's easier to track down via Horizon or the CLI. I think ultimately, moving to the OpenStack "Just give me a network" style -- which basically says you don't create or specify any network bits and OpenStack puts you on the auto_allocated_network and auto_allocated_subnet. Basically, there's no security gain in isolated subnets or networks. It's not like physical networking where you are physically attaching to devices and being isolated that way. All of this is really handled via iptables and routing rules. So if the desire for doing this is security, it's really not gaining anything. If the desire for this is just to avoid making larger changes, I understand that, though in the long run refactoring makes the whole process simpler for users to troubleshoot. I would say in the long run, we should work together to see if we can make it work the new way...though that day may not be today (or tomorrow). |
Another complication is the |
I'm not sure one needs to add a recordset to auto_allocated_network anything. That should just work. We created those zones when we made the transition from TG-xxxxxxx style allocation names to xxxxxxxx style.
If something is breaking there, it might have to do with how the networkin is all being created. (i.e. it's doing it the hard way)
J
On 5/21/23 2:30 PM, Julien Chastang wrote:
Another complication is the recordset entry needs to be manually deleted upon cluster tear down. Otherwise next time you try to attach a new recordset to the auto_allocated_network , you'll be stymied. It would be nice if the recordset / publicly accessible URL entries were automatically handled as before (i.e., it just worked, for the most part).
—
Reply to this email directly, view it on GitHub<#46 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ACZWV72OJOVB7BOQGJMBCQLXHJNNFANCNFSM6AAAAAAVMYW5SQ>.
You are receiving this because you were mentioned.Message ID: ***@***.***>
…--
Jeremy Fischer
Research Cloud Infrastructure Manager
UITS Research Technologies
***@***.******@***.***>
|
given the feedback I'll try again to make kubespray work with the auto allocated network. It is going to be a big upfront amount of work, hopefully successful, but it should simplify a lot the infrastructure. |
Basically, if you remove all networking setup and just tell the VM launch to use the auto_allocated_network, it will do everything for you. It should hopefully be fairly simple...and definitely would make things simpler for the future.
Don't create any ports or nets, subnets, or routers -- just specify on VM creation (it's --nic net-id=auto_allocated_network if you were using the CLI, probably similar for terraform yaml).
You'll create and add the floating ip as needed -- on create, you should be able to get the value so you know what it is.
I haven't looked at the yaml, but if you think it would help, I would be happy to.
On 5/23/23 6:13 PM, Andrea Zonca wrote:
given the feedback I'll try again to make kubespray work with the auto allocated network. It is going to be a big upfront amount of work, hopefully successful, but it should simplify a lot the infrastructure.
—
Reply to this email directly, view it on GitHub<#46 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ACZWV77SNP76PZGWRKYMFGTXHUZCFANCNFSM6AAAAAAVMYW5SQ>.
You are receiving this because you were mentioned.Message ID: ***@***.***>
…--
Jeremy Fischer
Research Cloud Infrastructure Manager
UITS Research Technologies
***@***.******@***.***>
|
I tried already, the instances have a network, but they cannot connect to each other. |
anyway I'll try again then ask for help |
I have some good progress going on! I'll update this soon. |
thanks @jlf599 @zacharygraber, the tutorial is now working, I think the PR can be merged. Then @julienchastang @robertej09 should make a more extensive testing, I deployed a simple JupyterHub and everything seemed to be working, but there might be still something more subtle which is broken. |
OK, sounds good. I've been on vacation, but will try to make time for this soon. |
@jlf599 @zacharygraber I think this PR can be merged |
@robertej09 and I have been working on this over the last few days and I believe things look good. Just have to remember to replace |
Still finishing testing