-
-
Notifications
You must be signed in to change notification settings - Fork 14.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
nixos/azure: move image-specific configs from azure-common to azure-image, fix console output #359365
base: master
Are you sure you want to change the base?
Conversation
9e744ad
to
3ede656
Compare
c6f4b45
to
b5ef96b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally LGTM, but the release notes is a blocker.
Note I have not tested it, only read the code.
cd60b01
to
0657a3d
Compare
Thanks @nh2 for the insights, I have made improvements according to comments and updated the release note. |
I've been trying to test it but I'm running in an issue (server side from Microsoft, I've been asked by Microsoft to open a support ticket for their team to further investigate their server side errors) azcopy is failling with 500 when I try to copy the vhd to a managed disk. If you have any recommended way of uploading the vhd @codgician feel free to let me know I'll try without azcopy |
I also use azcopy for copying, the detailed command could be found here (you may need to change account name / container name / subscription id) accordingly. Uploading can also be done on Azure Portal. In storage account page -> Storage browser -> Blob containers, select your container and click "Upload" for uploading blob, just make sure the type is selected as "Page Blob". |
0657a3d
to
040cef4
Compare
Rebased with latest master branch to resolve merge conflicts. |
@codgician I cannot see that in your latest push yet, it still shows |
The device definition has to be preserved because it is directly referenced if
A line of |
040cef4
to
ec68a18
Compare
Rebased with latest master branch (where |
I've been able to do a quick test (systemd-boot x86) but I'm running into a few issues, I've used an old VHD (built before your last comment) and a freshly built one, I've noted the following :
… while calling the 'findFile' builtin
at /nix/store/pc1d5jpff9zqnkks7nkkk3snms7h57m6-source/nixos/default.nix:1:60:
1| { configuration ? import ./lib/from-env.nix "NIXOS_CONFIG" <nixos-config>
| ^
2| , system ? builtins.currentSystem
error: file 'nixos-config' was not found in the Nix search path (add it using $NIX_PATH or -I)
@codgician did you export specific I don't see where the original If I switch the
INFO Daemon Daemon Clean protocol and wireserver endpoint
INFO Daemon Daemon Wire server endpoint:168.63.129.16
INFO Daemon Daemon Fabric preferred wire protocol version:2015-04-05
INFO Daemon Daemon Wire protocol version:2012-11-30
INFO Daemon Daemon Server preferred version:2015-04-05
ERROR Daemon Daemon Event: name=WALinuxAgent, op=UnhandledError, message=[Errno 2] No such file or directory: '/usr/bin/openssl'
Traceback (most recent call last):
File "/nix/store/89p16phbqbxqlj54km8knsa2fkhhhjr1-waagent-2.12.0.2/lib/python3.12/site-packages/azurelinuxagent/daemon/main.py", line 83, in run
self.daemon(child_args)
File "/nix/store/89p16phbqbxqlj54km8knsa2fkhhhjr1-waagent-2.12.0.2/lib/python3.12/site-packages/azurelinuxagent/daemon/main.py", line 144, in daemon
self.provision_handler.run()
File "/nix/store/89p16phbqbxqlj54km8knsa2fkhhhjr1-waagent-2.12.0.2/lib/python3.12/site-packages/azurelinuxagent/pa/provision/cloudinit.py", line 53, in run
self.protocol_util.get_protocol() # Trigger protocol detection
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/nix/store/89p16phbqbxqlj54km8knsa2fkhhhjr1-waagent-2.12.0.2/lib/python3.12/site-packages/azurelinuxagent/common/protocol/util.py", line 299, in get_protocol
protocol = self._detect_protocol(save_to_history=save_to_history, init_goal_state=init_goal_state)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/nix/store/89p16phbqbxqlj54km8knsa2fkhhhjr1-waagent-2.12.0.2/lib/python3.12/site-packages/azurelinuxagent/common/protocol/util.py", line 220, in _detect_protocol
protocol.detect(init_goal_state=init_goal_state, save_to_history=save_to_history)
File "/nix/store/89p16phbqbxqlj54km8knsa2fkhhhjr1-waagent-2.12.0.2/lib/python3.12/site-packages/azurelinuxagent/common/protocol/wire.py", line 84, in detect
cryptutil.gen_transport_cert(trans_prv_file, trans_cert_file)
File "/nix/store/89p16phbqbxqlj54km8knsa2fkhhhjr1-waagent-2.12.0.2/lib/python3.12/site-packages/azurelinuxagent/common/utils/cryptutil.py", line 47, in gen_transport_cert
shellutil.run_command(cmd)
File "/nix/store/89p16phbqbxqlj54km8knsa2fkhhhjr1-waagent-2.12.0.2/lib/python3.12/site-packages/azurelinuxagent/common/utils/shellutil.py", line 288, in run_command
return __run_command(command_action=command_action, command=command, log_error=log_error, encode_output=encode_output)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/nix/store/89p16phbqbxqlj54km8knsa2fkhhhjr1-waagent-2.12.0.2/lib/python3.12/site-packages/azurelinuxagent/common/utils/shellutil.py", line 190, in __run_command
return_code, stdout, stderr = command_action()
^^^^^^^^^^^^^^^^
File "/nix/store/89p16phbqbxqlj54km8knsa2fkhhhjr1-waagent-2.12.0.2/lib/python3.12/site-packages/azurelinuxagent/common/utils/shellutil.py", line 255, in command_action
process = _popen(command, stdin=popen_stdin, stdout=stdout, stderr=stderr, shell=False)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/nix/store/89p16phbqbxqlj54km8knsa2fkhhhjr1-waagent-2.12.0.2/lib/python3.12/site-packages/azurelinuxagent/common/utils/shellutil.py", line 398, in _popen
process = subprocess.Popen(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/nix/store/zv1kaq7f1q20x62kbjv6pfjygw5jmwl6-python3-3.12.7/lib/python3.12/subprocess.py", line 1026, in __init__
self._execute_child(args, executable, preexec_fn, close_fds,
File "/nix/store/zv1kaq7f1q20x62kbjv6pfjygw5jmwl6-python3-3.12.7/lib/python3.12/subprocess.py", line 1955, in _execute_child
raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: '/usr/bin/openssl'
, duration=0
WARNING Daemon Daemon Daemon ended with exception -- Sleep 15 seconds and restart daemon I believe adding this line into the substituteInPlace azurelinuxagent/common/conf.py \
--replace-fail "/usr/bin/openssl" ${openssl}/bin/openssl |
@AkechiShiro Thank you for taking the time for testing. ❤️
The reason you are seeing 24.11 is because the default system channel is still set to
This is introduced recently by PR #359339. Since then the image name would follow
The
I did not export specific
I added a default |
Thanks a lot @codgician for the amazing work on bringing more Azure support for NixOS, I think it would be great to not rely on I'll attempt to do a retest the week-end that is coming (on x86, arm, and also run GRUB). Would you have also any idea on the process in Azure/MS side to sign the |
It is possible to add custom Secure Boot key for Azure image via ARM template, referring: https://learn.microsoft.com/en-us/azure/virtual-machines/trusted-launch-secure-boot-custom-uefi. The custom key binds to an Azure Compute Gallery Image, requiring users to do some manual setups. |
In the GitHub Actions of test flake (later switched to Garnix for native aarch64 building), I leveraged QEMU to emulate ARM to build the image so the nix store hashes could be consistent with building natively on ARM machines. To cross-compile you may need to override
I don't know any offline way of easily detecting whether VM is running on Azure yet. Azure runs a special version of Windows on their bare-metal and uses the same Hyper-V hypervisor found on any Windows client (with some additional special features). But I like this idea, I will try to do some research on whether such detection is possible in coming weeks. |
0a9ec12
to
67e4045
Compare
In order to crossCompile, I've added the following to my local configuration
This uses qemu to emulate arm in order to crossCompile the flake I believe, I was thus able to build the flake for I wasn't able to test Were you able to test |
The VM image also needs to be created as Arm64, which require you to create an Azure Compute Gallery. I personally use Terraform (Terranix) to deploy my Azure resources, you can refer here to get the overall idea: https://github.com/codgician/serenitea-pot/blob/main/packages/terraform-config/celestia/vms/lumine/image.nix Also this stackoverflow post could be helpful: https://stackoverflow.com/questions/75574115/create-azure-arm64-image-from-vhd |
Trying to build the latest image after a pull for the last commits in the test flake, I'm encountering these errors at the moment (not sure what is the root cause, as I have enough memory and space) : error while reading directory /build/root/etc: Cannot allocate memory
error while reading directory /build/root: Cannot allocate memory
error while reading directory /build/root/nix/var/nix/gcroots/per-user: Cannot allocate memory
error while reading directory /build/root/nix/var/nix/gcroots/auto: Cannot allocate memory
error while reading directory /build/root/nix/var/nix/gcroots: Cannot allocate memory
error while reading directory /build/root/nix/var/nix/temproots: Cannot allocate memory
error while reading directory /build/root/nix/var/nix/db: Cannot allocate memory
error while reading directory /build/root/nix/var/nix/profiles/per-user/root: Cannot allocate memory
error while reading directory /build/root/nix/var/nix/profiles/per-user: Cannot allocate memory
error while reading directory /build/root/nix/var/nix/profiles: Cannot allocate memory
error while reading directory /build/root/nix/var/nix: Cannot allocate memory
error while reading directory /build/root/nix/var: Cannot allocate memory
...
error: builder for '/nix/store/r7rny8wplk7sla4jhm9j2kl1cv6f8rhn-azure-image.drv' failed with exit code 1;
last 25 log lines:
> error while reading directory /build/root/nix/store/px6akw851gkq4101y5m9vv0x7zd2f2v4-unit-container-.service: Cannot allocate memory
> error while reading directory /build/root/nix/store/g9sk34viln219wbj9imrpbywsq5md0cw-python3.12-msgpack-1.1.0/lib/python3.12/site-packages/msgpack-1.1.0.dist-info: Cannot allocate memory
> error while reading directory /build/root/nix/store/g9sk34viln219wbj9imrpbywsq5md0cw-python3.12-msgpack-1.1.0/lib/python3.12/site-packages/msgpack/__pycache__: Cannot allocate memory
> error while reading directory /build/root/nix/store/g9sk34viln219wbj9imrpbywsq5md0cw-python3.12-msgpack-1.1.0/lib/python3.12/site-packages/msgpack: Cannot allocate memory
> error while reading directory /build/root/nix/store/g9sk34viln219wbj9imrpbywsq5md0cw-python3.12-msgpack-1.1.0/lib/python3.12/site-packages: Cannot allocate memory
> error while reading directory /build/root/nix/store/g9sk34viln219wbj9imrpbywsq5md0cw-python3.12-msgpack-1.1.0/lib/python3.12: Cannot allocate memory
> error while reading directory /build/root/nix/store/g9sk34viln219wbj9imrpbywsq5md0cw-python3.12-msgpack-1.1.0/lib: Cannot allocate memory
> error while reading directory /build/root/nix/store/g9sk34viln219wbj9imrpbywsq5md0cw-python3.12-msgpack-1.1.0/nix-support: Cannot allocate memory
> error while reading directory /build/root/nix/store/g9sk34viln219wbj9imrpbywsq5md0cw-python3.12-msgpack-1.1.0: Cannot allocate memory
> error while reading directory /build/root/nix/store/c0xl18ds7cf12aywlhfbm7j013k38xmk-unit-script-nixos-activation-start/bin: Cannot allocate memory
> error while reading directory /build/root/nix/store/c0xl18ds7cf12aywlhfbm7j013k38xmk-unit-script-nixos-activation-start: Cannot allocate memory
> error while reading directory /build/root/nix/store: Cannot allocate memory
> error while reading directory /build/root/nix: Cannot allocate memory
> error while reading directory /build/root: Cannot allocate memory
> error while reading directory /build/root/root/.nix-defexpr: Cannot allocate memory
> error while reading directory /build/root/root: Cannot allocate memory
> error while reading directory /build/root: Cannot allocate memory
> [ 83.174364] reboot: Restarting system
> thread 'main' panicked at src/main.rs:829:54:
> called `Result::unwrap()` on an `Err` value: InitSeccompContext
> note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
> thread 'main' panicked at src/main.rs:829:54:
> called `Result::unwrap()` on an `Err` value: InitSeccompContext
> note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
> qemu-system-aarch64: -chardev socket,id=store,path=virtio-store.sock: Failed to connect to 'virtio-store.sock': No such file or directory
|
This start to happen after rebasing with latest nixpkgs master, and seems to only happen when generating image with QEMU emulation. It does not happen when building on native aarch64 hardware (tested with NixOS in Parallels VM on my M2 Max MBP, and on Garnix.io: https://garnix.io/build/Q9PKxmbB which has native arm64 support referring garnix-io/issues#12). I haven't figured out the root cause yet but the exception seem to happen in virtio-fsd (which is unrelated to this PR itself). But to check whether it is an OOM issue, maybe try increasing nixpkgs/nixos/lib/make-disk-image.nix Line 154 in dba71e4
|
eac97da
to
efa73b6
Compare
Did some further testing and found that this is not related to OOM. Removed the option added for testing purpose. Considering this issue only happen when building aarch64 with QEMU emulation under x86_64, I tend to think this issue is not caused by this PR. I added a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe I've tested this PR enough, I think it's alright to merge as it is currently :
- Tested x64_86 (Systemd-boot)
- Tested ARM (Systemd-boot)
They boot all fine on my side, I haven't seen any issues on them and the serial console allows to select NixOS generations.
Thanks again for the work done @codgician
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just some nitpicking :D Thank you for your awesome work! Some maybe out-of-scope suggestions:
I think networking.useNetworkd
needs lib.mkDefault true
, because not setting one throws a warning:
evaluation warning: The combination of `systemd.network.enable = true`, `networking.useDHCP = true` and `networking.useNetworkd = false` can cause both networkd and dhcpcd to manage the same interfaces. This can lead to loss of networking. It is recommended you choose only one of networkd (by also enabling `networking.useNetworkd`) or scripting (by disabling `systemd.network.enable`)
I also consider making virtualisation.azure.acceleratedNetworking
true by default, it doesn't seem to break things: https://learn.microsoft.com/en-us/azure/virtual-network/accelerated-networking-overview?tabs=redhat#limitations-and-constraints
services.udev.extraRules = | ||
with builtins; | ||
concatStringsSep "\n" ( | ||
map (i: '' | ||
ENV{DEVTYPE}=="disk", KERNEL!="sda" SUBSYSTEM=="block", SUBSYSTEMS=="scsi", KERNELS=="?:0:0:${toString i}", ATTR{removable}=="0", SYMLINK+="disk/by-lun/${toString i}" | ||
'') (lib.range 1 15) | ||
); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
services.udev.extraRules = | |
with builtins; | |
concatStringsSep "\n" ( | |
map (i: '' | |
ENV{DEVTYPE}=="disk", KERNEL!="sda" SUBSYSTEM=="block", SUBSYSTEMS=="scsi", KERNELS=="?:0:0:${toString i}", ATTR{removable}=="0", SYMLINK+="disk/by-lun/${toString i}" | |
'') (lib.range 1 15) | |
); | |
services.udev.extraRules = lib.concatMapStrings (i: '' | |
ENV{DEVTYPE}=="disk", KERNEL!="sda" SUBSYSTEM=="block", SUBSYSTEMS=="scsi", KERNELS=="?:0:0:${toString i}", ATTR{removable}=="0", SYMLINK+="disk/by-lun/${toString i}" | |
'') (lib.range 1 15); |
|
||
# Please update the VM Generation to the actual value | ||
virtualisation.azureImage.vmGeneration = "v1"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# Please update the VM Generation to the actual value | |
virtualisation.azureImage.vmGeneration = "v1"; |
Overwriting an option with its default value seems redundant. Maybe put this to the comment above?
It boots and works on my own config. I had some hiccups because But then I can't get Accelerated Networking working even after |
Accelerated networking needs to be enabled in network interface configuration on Azure portal: Thank you for taking your time testing this change! |
…nsole for Gen 2 VM could work
9136e11
to
cda9f47
Compare
cda9f47
to
f5908be
Compare
For And for making And thank you for your awesome suggestions on code improvement, I have applied them in the latest commit. |
[azure-common.nix]
azure-common.nix
, because these settings are only intended for images generated usingazure-image.nix
. This will makeazure-common.nix
more generic so folks generating azure image using other methods (e.g. disko) can leverage it.headless.nix
to make the console work (which was left out by mistake from nixos/azure: add Gen 2 VM, aarch64 and accelerated networking support #333508). Thanks @nh2 for the great finding.[azure-image.nix]
azure-config-user.nix
), prompting user to set the correct vmGeneration before rebuilding. Doing this instead of dynamically generating to remain compatibility, as I found some open-source configurations on GitHub are referencing this file directly.[azure-agent.nix]
python39
because the current version of waagent is depending on a newer Python version which already exist in the environment given how it is packaged.Changes in this PR is testable via flake codgician/azure-aarch64-nixos (instructions for generating images provided in README.md). Tested for both Gen 2 grub and Gen 2 systemd-boot on aarch64.
Things done
nix.conf
? (See Nix manual)sandbox = relaxed
sandbox = true
nix-shell -p nixpkgs-review --run "nixpkgs-review rev HEAD"
. Note: all changes have to be committed, also see nixpkgs-review usage./result/bin/
)Add a 👍 reaction to pull requests you find important.