Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Illegal Instruction in amd64 .sif on macOS/aarch64 rosetta #3195

Open
n-io opened this issue Feb 5, 2025 · 12 comments
Open

Illegal Instruction in amd64 .sif on macOS/aarch64 rosetta #3195

n-io opened this issue Feb 5, 2025 · 12 comments
Labels
documentation Improvements or additions to documentation

Comments

@n-io
Copy link

n-io commented Feb 5, 2025

Description

I encountered Illegal instruction (core dumped) when running apps from an sdk that's provided in a .sif file. The sif is built for amd64, and my system is a aarch64 (M2) machine with macOS 15.3. This is what I'm running:

$ limactl start template://apptainer --vm-type=vz --rosetta --mount-writable --mount-type=virtiofs --name apptainer
$ limactl shell apptainer
me@lima-apptainer$ singularity shell /path/to/sdk.sif
singularity> vi /viewing/files/works/fine.txt
singularity> sdk_debug_shell
Illegal instruction (core dumped)

Using these commands, I can set up my lima vm with apptainer, get shell access to the sif, and can browse around to view files. However, running any of the sdk apps (python apps) will error as shown above. I'm trying to use rosetta/vz for performance reasons.

I get the same error when running with qemu in system-mode, where I can also get shell access to the sif and browse around, but running bigger apps will crash in the exact same way as shown above (replace the first line with limactl start template://apptainer --vm-type=qemu --arch=x86_64 --mount-writable --name apptainer).

However, the error disappears when adding --set '.cpuType.x86_64="max"'.

Please could you advise if there is a similar work-around for rosetta/vz?

@afbjorklund
Copy link
Member

afbjorklund commented Feb 5, 2025

Without having the app or preferably a small reproducer, it is hard to known why it is crashing in emulation.

For instance, it was discovered that AVX-512 returns SIGILL when running on macOS - see #3022 (comment)

But most apps should not try to run that (v4) without explicitly being asked to?

https://www.phoronix.com/news/Linus-Torvalds-On-AVX-512

Even the AVX (v3) doesn't seem to make much of a difference, performance-wise...

https://www.phoronix.com/news/RedHat-RHEL10-x86-64-v3-Explore

Even if we cannot show performance improvements for software included in RHEL,
it may still make sense to go ahead with the switch

@afbjorklund
Copy link
Member

afbjorklund commented Feb 5, 2025

Running a simple program using avx is enough to reproduce...

afb@lima-apptainer:~$ sudo apt install -y g++-x86-64-linux-gnu
afb@lima-apptainer:~$ x86_64-linux-gnu-g++ -O3 -march=x86-64-v4 -static test.cpp 
afb@lima-apptainer:~$ ./a.out 
Illegal instruction (core dumped)

While your problem might be different, it is the suspected reason.

It behaves the same way, when running on older real hardware.


Related: https://stackoverflow.com/questions/56621809/getting-illegal-instruction-while-running-a-basic-avx512-code

Real code is supposed to be able to target the current architecture at runtime, but the emulation complicates things.

@afbjorklund
Copy link
Member

Example CPU capabilities (cpuid):

Name: VirtualApple @ 2.50GHz
Vendor String: GenuineIntel
Vendor ID: Intel
PhysicalCores: 1
Threads Per Core: 1
Logical Cores: 1
CPU Family 6 Model: 44 Stepping: 0
Features: AESNI,CLMUL,CMOV,CMPXCHG8,CX16,FXSR,FXSROPT,LAHF,MMX,NX,OSXSAVE,POPCNT,RDTSCP,SSE,SSE2,SSE3,SSE4,SSE42,SSSE3,SYSCALL,SYSEE,X87,XSAVE
Microarchitecture level: 2
Cacheline bytes: 64
L1 Instruction Cache: 131072 bytes
L1 Data Cache: 131072 bytes
L2 Cache: 8388608 bytes
L3 Cache: 0 bytes
Frequency: 2500000000 Hz

https://github.com/klauspost/cpuid

@afbjorklund
Copy link
Member

afbjorklund commented Feb 5, 2025

Apparently macOS 15 adds support for AVX2 (v3) but not for AVX512 (v4)

https://en.wikipedia.org/wiki/Rosetta_(software) - "macOS Sequoia"

Here are some code examples that uses AVX or AVX2, for testing with:

https://github.com/kshitijl/avx2-examples

@afbjorklund
Copy link
Member

afbjorklund commented Feb 5, 2025

As a workaround for the older Rosetta, you can disable it and use qemu instead:

echo -1 | sudo tee /proc/sys/fs/binfmt_misc/rosetta

That will give you more CPU features, but AVX-512 is not yet supported by QEMU:

Name: QEMU TCG CPU version 2.5+
Vendor String: AuthenticAMD
Vendor ID: AMD
PhysicalCores: 0
Threads Per Core: 1
Logical Cores: 0
CPU Family 6 Model: 6 Stepping: 3
Features: ADX,AESNI,AMD3DNOW,AMD3DNOWEXT,AVX,AVX2,AVXSLOW,BMI1,BMI2,CLMUL,CMOV,CMPSB_SCADBS_SHORT,CMPXCHG8,CX16,ERMS,F16C,FMA3,FSRM,FXSR,FXSROPT,HYPERVISOR,IA32_ARCH_CAP,IBPB,IBRS,LAHF,LZCNT,MMX,MMXEXT,MOVBE,MOVSB_ZL,MPX,NRIPS,NX,OSXSAVE,POPCNT,PSFD,RDRAND,RDSEED,RDTSCP,SHA,SPEC_CTRL_SSBD,SSE,SSE2,SSE3,SSE4,SSE42,SSE4A,SSSE3,STIBP,STIBP_ALWAYSON,STOSB_SHORT,SVM,SVMNP,SYSCALL,SYSEE,VAES,WBNOINVD,X87,XGETBV1,XSAVE,XSAVEOPT
Microarchitecture level: 3
Cacheline bytes: 64
L1 Instruction Cache: 65536 bytes
L1 Data Cache: 65536 bytes
L2 Cache: 524288 bytes
L3 Cache: -1 bytes

i.e. the AVX/AVX2 programs will now run, but the AVX-512 will continue to crash:

qemu: uncaught target signal 4 (Illegal instruction) - core dumped

@afbjorklund afbjorklund added the documentation Improvements or additions to documentation label Feb 9, 2025
@n-io
Copy link
Author

n-io commented Feb 19, 2025

I managed to dig a bit deeper, and the error occurs during a python import:

$ limactl shell apptainer
me@lima-apptainer$ singularity shell /path/to/sdk.sif
Singularity> PYTHONFAULTHANDLER="1" python3
Python 3.8.16 (default, Mar 18 2024, 18:27:40) 
[GCC 8.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from vendor.sdk.debug.lib.instruction_trace import sdkinstrtracepybind
Fatal Python error: Illegal instruction

Current thread 0x00007ffffdf1f740 (most recent call first):
  File "<frozen importlib._bootstrap>", line 219 in _call_with_frames_removed
  File "<frozen importlib._bootstrap_external>", line 1132 in create_module
  File "<frozen importlib._bootstrap>", line 556 in module_from_spec
  File "<frozen importlib._bootstrap>", line 657 in _load_unlocked
  File "<frozen importlib._bootstrap>", line 975 in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 991 in _find_and_load
  File "<frozen importlib._bootstrap>", line 219 in _call_with_frames_removed
  File "<frozen importlib._bootstrap>", line 1042 in _handle_fromlist
  File "<stdin>", line 1 in <module>
Illegal instruction (core dumped)

Singularity> python3 --version
Python 3.8.16

I found the underlying .so file, but readelf -A /sdk/debug/lib/instruction_trace/sdkinstrtracepybind.cpython-38-x86_64-linux-gnu.so returned no architecture specifics.

The avx2-examples you mentioned above don't work with either the -march=x86-64-v3 or -march=x86-64-v4 flag.

@n-io
Copy link
Author

n-io commented Feb 19, 2025

I noticed in man arch that rosetta takes both -x86_64 and -x86_64h options, is there a way to control this from lima?

@jandubois
Copy link
Member

jandubois commented Feb 19, 2025

I noticed in man arch that rosetta takes both -x86_64 and -x86_64h options, is there a way to control this from lima?

I don't think this means that Rosetta supports it.

The MachO file format supports "universal binaries" that can contain multiple versions of the same program, compiled for different architectures. The arch command allows you to launch the variant for a specific architecture, assuming the host CPU supports it. This has nothing to do with Rosetta.

@afbjorklund
Copy link
Member

afbjorklund commented Feb 20, 2025

This seems related to https://sdk.cerebras.net/installation-guide#apple-silicon-mac-installation, maybe ask the vendor?

"Running the Cerebras SDK on an Apple Silicon Mac or other ARM machine requires x86 emulation.
Performance will be sluggish, and emulation bugs are possible."

It might be as simple as adding vmType: qemu to the template?

The VZ/Rosetta support is new, so maybe it didn't exist when created.

@n-io
Copy link
Author

n-io commented Feb 20, 2025

It might seem related to https://bugs.launchpad.net/lxml/+bug/2059910

I'm interested in getting vz to work for performance reasons. It all works fine with qemu as mentioned above, but is, well, sluggish.

@afbjorklund
Copy link
Member

But fixing vz is something for Apple, no? Like the additions in macOS 15

@n-io
Copy link
Author

n-io commented Feb 20, 2025

Ultimately, yes. It didn't work out of the box with qemu, but there was a workaround. I suppose my question was if a similar workaround might exist for vz, in case there is anything that can be configured in lima, but I understand this might be an upstream issue instead.

Thanks for taking the time btw!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

3 participants