-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NativeVariant.Cpu: add the capability to detect instruction-set extensions #37
Conversation
Reviewing last night's work in the light of day, I think Also, I still need to test |
@stephengold Thanks for working on this feature. In the next 24 hours, I will start looking into the requested changes. In the meanwhile, I would like to kindly ask you if it's possible to create a simple testcase to go to the |
I think this is doable in a CI/CD environment (I think I already have a setup for it here). And, also you are free to integrate and test Jolt-Jni in an example (I would actually prefer a real library that uses the feature to be in that test). |
I agree. The next release of jolt-jni ought to make a good test for jSnapLoader. Once I have extension detection working in my snap-jolt workbench, I'll copy one of those tests over to jSnapLoader. Today I discovered that the CPU feature strings returned by OSHI differ between Linux and Windows, even for the identical hardware! So I need to rework this PR in order for it work on Windows. |
If a check that is OS-specific, then you might think of the EDIT: |
My plan for today is:
|
@pavly-gerges Please authorize the Actions workflow.
Aecsocket's cpu-features-java library provides Java bindings to Google's popular cpu_features project. It has a couple drawbacks:
Nihui's ruapu provides JNI bindings and buildscripts for Java, however there are no artifacts at Maven Central. Either of these projects could be adapted to address this feature request in case OSHI proves unsuitable for some reason. |
Yes, that is true. It's here, and once a true predicate is found, the test terminates and file loading is commanded: jSnapLoader/snaploader/src/main/java/electrostatic4j/snaploader/NativeBinaryLoader.java Lines 126 to 129 in 375abc9
EDIT:
It would be nice to have at least a testcase of those on this PR. However, I know there is currently much stress on you delivering some work on Jolt-Jni; thus I don't want to put much pressure on this, and we can have it on another PR later, if you have other things to do. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, this is an initial review. I would like to know if you have any other opinions about the requested changes. Thanks again for your time and effort to deliver the feature.
snaploader/src/main/java/electrostatic4j/snaploader/platform/util/NativeVariant.java
Show resolved
Hide resolved
snaploader/src/main/java/electrostatic4j/snaploader/platform/util/NativeVariant.java
Outdated
Show resolved
Hide resolved
snaploader/src/main/java/electrostatic4j/snaploader/platform/util/NativeVariant.java
Show resolved
Hide resolved
If it's an official feature of the library (and not just an implementation detail) then it would be nice to document it as such.
Sometime this week I expect there'll be a jolt-jni release suitable for testing this feature. If you delay integration until the testcase is ready, I'll add it here. Otherwise I'll open another PR for it. Thanks for your review comments. I'm studying them now... |
snaploader/src/main/java/electrostatic4j/snaploader/platform/util/NativeVariant.java
Outdated
Show resolved
Hide resolved
FYI: the new release of jolt-jni is waiting on jrouwe/JoltPhysics#1545. After that release I'll submit some test code for jSnapLoader. |
In the long run, it seems better to directly invoke cpu-id instructions via JNA. This would mostly eliminate the confusion caused different OSes and OS versions. However, a thorough implementation looks to me like a lot of effort. I envision cross-compiling a small shared library for each platform on which jSnapLoader might run. However, Java runs on dozens of platforms. Also, testing would be a huge challenge. For instance, I currently lack access to MIPS, PPC, S/390, LoongArch, and RISC-V hardware. This PR establishes an API that ought to be compatible with such a future implementation. |
For these kind of rare platforms, almost every one of them would run a Linux OS or a variant of it. So, we could hack this by compiling the CPU-ID library on the runtime of jSnapLoader using a system call, then linking the code immediately. This could be all done cleanly with a simple internalized API. This however, will require CMake and GCC/G++ as System Dependencies (But they should be already there on any Linux OS).
That would be our most goal for now. But, if we cannot do this, it's okay, we could deprecate them later, if we have arrived with better solutions. |
snaploader/src/main/java/electrostatic4j/snaploader/platform/util/NativeVariant.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, this wraps up the work. Thanks for allocating the time and effort to work on this feature. It's really nice to have you here :-). Let me know the time you would like to get this merged. It's okay for me to keep it open until Jolt-Jni is ready to test this. Or, we could merge this, and arrive with a 1.1.0-alpha
version to ease the testing for you. Whichever the case, I don't mind.
EDIT:
We have a third option, as well. I could try deploying this branch as a 1.1.0-alpha
w/o merging the PR to ease the testing on your CI/CD setup.
Being curious, I dumped the CPU features on various GitHub Actions runners:
CPU features:
pf_avx2_instructions_available
pf_avx_instructions_available
pf_compare_exchange128
pf_compare_exchange_double
pf_fastfail_available
pf_mmx_instructions_available
pf_nx_enabled
pf_pae_enabled
pf_rdpid_instruction_available
pf_rdrand_instruction_available
pf_rdtsc_instruction_available
pf_rdtscp_instruction_available
pf_rdwrfsgsbase_available
pf_sse3_instructions_available
pf_sse4_1_instructions_available
pf_sse4_2_instructions_available
pf_ssse3_instructions_available
pf_virt_firmware_enabled
pf_xmmi64_instructions_available
pf_xmmi_instructions_available
pf_xsave_enabled
CPU features:
abm
adx
aes
aperfmperf
apic
arat
avx
avx2
bmi1
bmi2
clflush
clflushopt
clwb
clzero
cmov
cmp_legacy
constant_tsc
cpuid
cr8_legacy
cx16
cx8
de
decodeassists
dnowprefetch
erms
extd_apicid
f16c
flags
flushbyasid
fma
fpu
fsgsbase
fsrm
fxsr
fxsr_opt
ht
hypervisor
invpcid
lahf_lm
lm
mca
mce
misalignsse
mmx
mmxext
movbe
msr
mtrr
nonstop_tsc
nopl
npt
nrip_save
nx
osvw
pae
pat
pausefilter
pcid
pclmulqdq
pdpe1gb
pfthreshold
pge
pni
popcnt
pse
pse36
rdpid
rdpru
rdrand
rdseed
rdtscp
rep_good
sep
sha_ni
smap
smep
sse
sse2
sse4_1
sse4_2
sse4a
ssse3
svm
syscall
topoext
tsc
tsc_reliable
tsc_scale
umip
user_shstk
v_vmsave_vmload
vaes
vmcb_clean
vme
vmmcall
vpclmulqdq
xgetbv1
xsave
xsavec
xsaveerptr
xsaveopt
xsaves
CPU features:
adx
aes
apic
avx1
avx2
bmi1
bmi2
clfsh
clfsopt
cmov
cpu
cx16
cx8
de
dscpl
dtes64
em64t
erms
extfeatures
f16c
features
fma
fpu
fpu_csds
fxsr
gbpage
hle
htt
invpcid
ipt
lahf
leaf7_features
lzcnt
machdep
mca
mce
mmx
movbe
mpx
msr
mtrr
osxsave
pae
pat
pbe
pcid
pclmulqdq
pge
popcnt
prefetchw
pse
pse36
rdrand
rdseed
rdwrfsgs
rtm
seglim64
sep
sgx
smap
smep
ss
sse
sse2
sse3
sse4
ssse3
syscall
tpr
tsc
tsc_thread_offset
vme
vmm
vmx
x2apic
xd
xsave
CPU features:
advsimd
advsimd_hpfpcvt
arm
arm64
armv8_1_atomics
armv8_2_fhm
armv8_2_sha3
armv8_2_sha512
armv8_3_compnum
armv8_crc32
armv8_gpi
breakpoint
caps
feat_aes
feat_afp
feat_bf16
feat_bti
feat_csv2
feat_csv3
feat_dit
feat_dotprod
feat_dpb
feat_dpb2
feat_ecv
feat_fcma
feat_fhm
feat_flagm
feat_flagm2
feat_fp16
feat_fpac
feat_frintts
feat_i8mm
feat_jscvt
feat_lrcpc
feat_lrcpc2
feat_lse
feat_lse2
feat_pauth
feat_pauth2
feat_pmull
feat_rdm
feat_rpres
feat_sb
feat_sha1
feat_sha256
feat_sha3
feat_sha512
feat_sme
feat_sme2
feat_sme_f64f64
feat_sme_i16i64
feat_specres
feat_ssbs
feat_wfxt
floatingpoint
fp_syncexceptions
hw
neon
neon_fp16
neon_hpfp
optional
sme_b16f32
sme_bi32i32
sme_f16f32
sme_f32f32
sme_i16i32
sme_i8i32
ucnormal_mem
watchpoint |
I envision the API you built will work with general features. So, real CPU bare metal features and not OS-dependent. I see some OS stuff, for example "osxsave". What are these?! |
I investigated, and I have bad news. The data returned by hw.optional.arm.FEAT_FlagM: 1
hw.optional.arm.FEAT_FlagM2: 1
hw.optional.arm.FEAT_FHM: 1
hw.optional.arm.FEAT_DotProd: 1
hw.optional.arm.FEAT_SHA3: 1
hw.optional.arm.FEAT_RDM: 1
hw.optional.arm.FEAT_LSE: 1
hw.optional.arm.FEAT_SHA256: 1
hw.optional.arm.FEAT_SHA512: 1
hw.optional.arm.FEAT_SHA1: 1
hw.optional.arm.FEAT_AES: 1
hw.optional.arm.FEAT_PMULL: 1
hw.optional.arm.FEAT_SPECRES: 0
hw.optional.arm.FEAT_SB: 1
hw.optional.arm.FEAT_FRINTTS: 1
hw.optional.arm.FEAT_LRCPC: 1
hw.optional.arm.FEAT_LRCPC2: 1
hw.optional.arm.FEAT_FCMA: 1
hw.optional.arm.FEAT_JSCVT: 1
hw.optional.arm.FEAT_PAuth: 1
hw.optional.arm.FEAT_PAuth2: 0
hw.optional.arm.FEAT_FPAC: 0
hw.optional.arm.FEAT_DPB: 1
hw.optional.arm.FEAT_DPB2: 1
hw.optional.arm.FEAT_BF16: 0
hw.optional.arm.FEAT_I8MM: 0
hw.optional.arm.FEAT_WFxT: 0
hw.optional.arm.FEAT_RPRES: 0
hw.optional.arm.FEAT_ECV: 0
hw.optional.arm.FEAT_AFP: 0
hw.optional.arm.FEAT_LSE2: 1
hw.optional.arm.FEAT_CSV2: 1
hw.optional.arm.FEAT_CSV3: 1
hw.optional.arm.FEAT_DIT: 1
hw.optional.arm.FEAT_FP16: 1
hw.optional.arm.FEAT_SSBS: 1
hw.optional.arm.FEAT_BTI: 0
hw.optional.arm.FEAT_SME: 0
hw.optional.arm.FEAT_SME2: 0
hw.optional.arm.SME_F32F32: 0
hw.optional.arm.SME_BI32I32: 0
hw.optional.arm.SME_B16F32: 0
hw.optional.arm.SME_F16F32: 0
hw.optional.arm.SME_I8I32: 0
hw.optional.arm.SME_I16I32: 0
hw.optional.arm.FEAT_SME_F64F64: 0
hw.optional.arm.FEAT_SME_I16I64: 0
hw.optional.arm.FP_SyncExceptions: 1
hw.optional.arm.caps: 868632120666353663
hw.optional.floatingpoint: 1
hw.optional.neon: 1
hw.optional.neon_hpfp: 1
hw.optional.neon_fp16: 1
hw.optional.armv8_1_atomics: 1
hw.optional.armv8_2_fhm: 1
hw.optional.armv8_2_sha512: 1
hw.optional.armv8_2_sha3: 1
hw.optional.armv8_3_compnum: 1
hw.optional.watchpoint: 4
hw.optional.breakpoint: 6
hw.optional.armv8_crc32: 1
hw.optional.armv8_gpi: 1
hw.optional.AdvSIMD: 1
hw.optional.AdvSIMD_HPFPCvt: 1
hw.optional.ucnormal_mem: 1
hw.optional.arm64: 1 We should probably strip off the "hw.optional." prefix and ignore list entries that end with ": 0". |
Have you investigated if OSHI provide an algorithm to pull the CPU feature names out of those? I mean they could be aware of that and have solved it maybe. |
I did study the javadoc. I didn't search beyond that. |
One last change for your consideration: adding another The added source code would look like this: public PlatformPredicate(PlatformPredicate base, String... requiredNames) {
this.predicate = base.evaluatePredicate()
&& NativeVariant.Cpu.hasExtensions(requiredNames);
} This would allow efficient creation of new platform predicates that combine a pre-defined platform with one or more ISA extensions. For example: PlatformPredicate linuxWithFma
= new PlatformPredicate(PlatformPredicate.LINUX_X86_64, "avx", "fma"); |
Yep, you can proceed with the proper documentation. I am okay with it. I think it's better to replace |
I've just copied the HEAD of your PR branch commits, and released a 1.1.0-alpha version including a copy of these changes. To further automate your work, you can now integrate these changes on your CI/CD runners on Jolt-Jni using Gradle. I've staged and released it on the nexus manager, it would be available very soon. |
Thanks. Jorrit appears to be inactive this week, so I'll go ahead and cut a jolt-jni release without his fix. That'll give us something to test. |
Testing on the snap-jolt project uncovered one minor issue: if the application doesn't explicitly depend on OSHI, then it can crash at runtime with Error: Exception in thread "main" java.lang.NoClassDefFoundError: oshi/SystemInfo
> Task :PrintConfig FAILED
at electrostatic4j.snaploader.platform.util.NativeVariant$Cpu.readFeatureFlags(NativeVariant.java:280)
2 actionable tasks: 2 executed
at electrostatic4j.snaploader.platform.util.NativeVariant$Cpu.hasExtensions(NativeVariant.java:350)
at electrostatic4j.snaploader.platform.util.PlatformPredicate.<init>(PlatformPredicate.java:147)
at com.github.stephengold.snapjolt.PrintConfig.main(PrintConfig.java:52)
Caused by: java.lang.ClassNotFoundException: oshi.SystemInfo
at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:641)
at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:188)
at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:528)
... 4 more Adding an explicit dependency to the buildscript solves the problem: but it would be better if jSnapLoader simply included a transitive dependency on OSHI. |
This morning I added an app to snaploader-examples that test selection between native libraries based on CPU features. However, I'm getting some unexpected results. I need to investigate further before committing that code. |
Ah, I see what's happening. The 2 native libraries for my platform that I'm trying to select between are both named "libjoltjni.so". Once one of them is extracted to the working directory, the loader keeps loading that file. Is there an easy way to force jSnapLoader to replace the existing file with a freshly-extracted one? |
Sure, set a jSnapLoader/snaploader/src/main/java/electrostatic4j/snaploader/NativeBinaryLoader.java Line 156 in 375abc9
There is another better solution though, extract your |
Thank you. I think this PR is ready for final review. |
Perfect! The new changes look good to me. I am curious though, have you tried the submitted testcase on your GitHub runners? What was the output on different systems? |
I don't know if changing the CI/CD runners file on the master branch will affect here, and run this test. Let's try. EDIT: |
I've run similar code on the GitHub runners. In fact that's where I copied the expected outputs from. At GitHub, the Windows runners support AVX2 and the Linux runners support FMA. Note that, as currently written, the new test won't run on MacOS or ARM.
I'm ready when you are. |
I expect there would be another PR to extract the features for Some Mac Architectures (e.g., ARM-based). I suggest opening an issue for them, and also to specify whether they are essential for the 1.1.0-stable release. Otherwise, I will merge this PR within hours from now. |
@stephengold I appreciate your contributions. |
Here's my first cut at a solution for issue #36. I use the OSHI library to obtain CPU feature information. Any X86 ISA extensions that are present are cached in a private
TreeSet
that can be used to define custom predicates, like so:I assume the dynamic libraries are searched in the order they are passed to
registerNativeLibraries()
, and the first match is used. That way I can use predefined platform predicates as a fallback, like so:with the assurance that "linux/x86-64" will be loaded only if
linuxWithHaswell
isfalse
.