-
-
Notifications
You must be signed in to change notification settings - Fork 112
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support to Ubuntu 20.04 #813
Add support to Ubuntu 20.04 #813
Conversation
I agree and just checked current usage:
Servers to upgrade:
That's a bit of work because we don't have a zero-downtime transition playbook at the moment. I doubt that we'll get to it in the next few months but I'll put it on the list to plan for it. I guess that 2026 is a hard deadline. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you! This is great! Your long commit messages are excellent and help me a lot to understand what you tried and why you came to these solutions. 🏅
Just one idea below of what to try.
playbooks/provision.yml
Outdated
- name: Fix Ruby # noqa 301 | ||
command: bash -lc "rbenv uninstall -f {{ ruby_version }} && rbenv install {{ ruby_version }}" | ||
become: yes | ||
become_user: "{{ app_user }}" | ||
when: ansible_distribution_major_version >= '20' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, this would happen each time we run the provisioning and it takes a while. Did you try to increase the memory of the virtual machine?
Or maybe we can skip the task if assets have compiled successfully? That would be a weird hack here, too...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the suggestion!
I did try to increase memory and cpu just in case. Unfortunately the error persisted. My best guess for the cause of this error still is something within the galaxy module taking care of installing ruby.
I think checking if assets precompilation work before running this would be hard and confusing, specially because the precompile runs in a different playbook (deploy.yml).
I understand that this should run every time a new ruby gets installed. What do you think about setting a hidden file with {{ ruby_version}}
and only running the fix task if the task setting such file changed? This would prevent it to run the fix when not necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think about setting a hidden file
Yes, it's also a hack but we are hacking anyway. 😉 It's a good idea. If you can easily put it in the ruby installation path then it gets removed automatically when removing a ruby version. It's like marking an installation as "patched". But if that's too difficult then a simple file in the home directory would do, too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in commit 415c8ff
Starting in version 20 Python 2 packages, specially python-psycopg2 required by geerlingguy.postgresql, are no longer part of the distro and we have to work with their versions for Python 3. My first attempt to get this done was by setting the `ansible_python_interpreter` to something like: ``` ansible_python_interpreter: "/usr/bin/python{% if ansible_distribution_major_version < '20' %}2.7{% else %}3{% endif%}" ``` But it dind't work because at the time `ansible_python_interpreter` needs to be evaluated `ansible_distribution_major_version` still is undefined. This is the reason I'm going for this particular more creative solution. In the future, when the project maintainers decide to end support for Ubuntu 16, this solution should not be necessary and we can just work with Python 3 in Ubuntu 18 and 20.
This package no longer exists and was causing the following error: ``` TASK [common : install base packages] ********************************************************************************************************************************************* fatal: [local_vagrant]: FAILED! => {"changed": false, "msg": "No package matching 'python-psycopg2' is available"} Tuesday 14 June 2022 14:31:23 -0300 (0:00:00.361) 0:00:12.428 ********** ```
This version adds support to Ubuntu 20 while still supports 16 and 18. Fixes the following error present when running in Ubuntu 20: ``` TASK [geerlingguy.postgresql : Include OS-specific variables (Debian).] *********************************************************************************************************** fatal: [local_vagrant]: FAILED! => {"ansible_facts": {}, "ansible_included_var_files": [], "changed": false, "message": "Could not find or access 'Ubuntu-20.yml'\nSearched in:\n\t/home/manzo/workspace/Camar inhaManzo/ofn-install/community/geerlingguy.postgresql/vars/Ubuntu-20.yml\n\t/home/manzo/workspace/CamarinhaManzo/ofn-install/community/geerlingguy.postgresql/Ubuntu-20.yml\n\t/home/manzo/workspace/Camarinh aManzo/ofn-install/roles/dbserver/vars/Ubuntu-20.yml\n\t/home/manzo/workspace/CamarinhaManzo/ofn-install/roles/dbserver/Ubuntu-20.yml\n\t/home/manzo/workspace/CamarinhaManzo/ofn-install/community/geerlinggu y.postgresql/tasks/vars/Ubuntu-20.yml\n\t/home/manzo/workspace/CamarinhaManzo/ofn-install/community/geerlingguy.postgresql/tasks/Ubuntu-20.yml\n\t/home/manzo/workspace/CamarinhaManzo/ofn-install/playbooks/v ars/Ubuntu-20.yml\n\t/home/manzo/workspace/CamarinhaManzo/ofn-install/playbooks/Ubuntu-20.yml on the Ansible Controller.\nIf you are using a module and expect the file to exist on the remote, see the remote _src option"} Tuesday 14 June 2022 14:51:16 -0300 (0:00:00.015) 0:08:08.065 ********** ```
In Ubuntu 20, when running assets precompilation there was this error: ``` TASK [deploy : precompile assets] ************************************************************************************************************************************************* fatal: [local_vagrant]: FAILED! => {"changed": true, "cmd": ["bash", "-lc", "bundle exec rake assets:precompile RAILS_ENV=development"], "delta": "0:00:05.815405", "end": "2022-06 -15 11:15:39.701298", "msg": "non-zero return code", "rc": -6, "start": "2022-06-15 11:15:33.885893", "stderr": "I, [2022-06-15T11:15:39.534899 #62413] INFO -- : Writing /home/op enfoodnetwork/apps/openfoodnetwork/releases-old/2022-06-15-111445/public/assets/iehack-725caee6a64e094fd6f6faeb3b7d8456ba36bfca88b2b2d46fd02661da15f6d7.js\nI, [2022-06-15T11:15:39 .535309 #62413] INFO -- : Writing /home/openfoodnetwork/apps/openfoodnetwork/releases-old/2022-06-15-111445/public/assets/iehack-725caee6a64e094fd6f6faeb3b7d8456ba36bfca88b2b2d46 fd02661da15f6d7.js.gz\nfree(): invalid pointer", "stderr_lines": ["I, [2022-06-15T11:15:39.534899 #62413] INFO -- : Writing /home/openfoodnetwork/apps/openfoodnetwork/releases-ol d/2022-06-15-111445/public/assets/iehack-725caee6a64e094fd6f6faeb3b7d8456ba36bfca88b2b2d46fd02661da15f6d7.js", "I, [2022-06-15T11:15:39.535309 #62413] INFO -- : Writing /home/ope nfoodnetwork/apps/openfoodnetwork/releases-old/2022-06-15-111445/public/assets/iehack-725caee6a64e094fd6f6faeb3b7d8456ba36bfca88b2b2d46fd02661da15f6d7.js.gz", "free(): invalid poi nter"], "stdout": "yarn install v1.22.4\n[1/4] Resolving packages...\nsuccess Already up-to-date.\nDone in 0.66s.", "stdout_lines": ["yarn install v1.22.4", "[1/4] Resolving packa ges...", "success Already up-to-date.", "Done in 0.66s."]} Wednesday 15 June 2022 08:15:39 -0300 (0:00:05.946) 0:00:55.217 ******** ``` The important part is: `free(): invalid pointer`. This is a kind of a generic memory error that I could not find the actual cause not even with valgrind. This led me to trial and error. First I investigated if removing from Gemfile some gems with native compilation such as `pg` and `json` had any effect, but the error persisted after all my tries. After that I went to remove all gems and reinstall and the error persisted. Then I tried to remove the Ruby being used and reinstall it. Which fixed the problem! But I must admit this is an ugly solution. Now knowing the error source is within the Ruby installation, I've tried to upgrade the `zzet.rbenv` ansible galaxy module, but had no success. I've also tried to remove jemalloc from the installation and the error persisted. Finally I have moved the uninstall/reinstall to being the very next thing after calling `zzet.rbenv` and to my surprise it also fixed the problem! From this I can only conclude there is some unknown issue in this module that is beyond the scope of OFN. With this long description of what I have tried and failed I hope you understand that I've did my best to find a better solution, but had no success in my search. Uninstalling and reinstalling the same thing was the only solution that worked. Why it is a problem only in Ubuntu 20 and not 18 and 16 is another mystery. In the future, if someone accepts the quest of removing this ugly fix, the first thing I suggest to check is if there is a version of `zzet.rbenv` newer than 3.6.0 and, if there is such version, check if it fixes the error described here.
It is only necessary to run once per Ruby. Running it every time the provision playbook runs did not cause any error, but it took a while to run delaying deployments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great!
I actually have to provision a new server this week. I'll try this.
Which Ansible version do you use? I'm running into some trouble with Ansible 2.9.6 and Ubuntu 20 because the systemd output changed. I found the fixing pull request but I can't find a release with it: |
Ah, don't worry. I updated to Ansible 2.9.16 and it works. 👍 |
Nice it wasn't something harder to fix :) I did an test install on Digital Ocean today. There was an issue that I did not handle here because I was running my tests on a vagrant box. It seems the recommended way to install certbot in Ubuntu 20.04 has changed, but by upgrading the galaxy module to the latest version it should work. You may want to cherry-pick this: f2fd353 |
Ah, thanks! You may want to review #815. |
I did my best to create atomic commits with messages full of details.
This should close issue #811
I have developed using Vagrant by changing the the image to focal64. Then I ran:
vagrant destroy -f && vagrant up && ansible-playbook --limit vagrant playbooks/setup.yml && ansible-playbook --limit vagrant playbooks/provision.yml && ansible-playbook --limit vagrant playbooks/deploy.yml
After I got all these commands to run without errors I check in the browser if http://localhost:8080 loaded OFN homepage without errors.
With everything working in 20.04 I got back the Vagrantfile to xenial and bionic, repeating the test detailed above, in order to check if my changes had no unwanted side effects in them. All looked good in my tests.
Final notes: