-
Notifications
You must be signed in to change notification settings - Fork 312
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JIT compiler for RISC-V #275
Conversation
I've got my Lichee Pi 4a now but haven't set it up yet. Will try this in a few days. |
The code looks good. How many hashes did you run the benchmark for? I think it needs to be tested with at least 10M hashes and the result must be identical to what x64/aarch64 versions produce. 10M hashes = 1.5 days of your board running at 75 h/s, so it's possible. |
I'm running 1M now (should take a couple hours) and then I'll try 10M. Then we'll have to repeat the runs with the native build. Do you have the correct hashes for 1M and 10M? |
Some notes about hardware AES: RISC-V actually has 2 different crypto extensions:
So it's possible that we'll have to support at least 2 different extensions in the future with the scalar one likely coming first. Actually, the scalar crypto AES instructions are split into two separate extensions Zkne (encryption only) and Zknd (decryption only). Hopefully, hardware designers will be sane and include both extensions. In order to limit the scope of this PR, I did not include hardware AES for now. It would not be very useful anyways since no chips you can buy today support it. |
1M This is what master branch randomx-benchmark shows. |
|
10M also matches with the default build
|
@felixonmars How many threads did you run on SG2042? The optimal number of threads is 32 there. |
Was going to say the same. If that run was with 64 threads, need to try again with 32. tho ... 1356 / 40 = 33. Probably won't make a huge difference. |
I did run with 64. The results vary quite much on each run and even down to only 2 hps sometimes. I'll retry with 32 later (it's fully loaded now). |
@felixonmars I can see from your JH7110 hashrates that you are probably not using hugepages. Try to use I ran the 10M hashes with the native build and the result also matches.
I will wait for someone to independently verify my hashes before merging this PR. |
Indeed. I tried with |
Got some results on Lichee Pi 4a. Numbers were quite slow on the shipped firmware, dated July 2023. Updated to September 20 2023 image and my results look more reasonable. Speed with 1 thread and no largePages was 34.89H/s, same as @felixonmars got. With largepages |
Hashes match here. With large pages on and threads set to 32, the performance is much lower though. |
Since the C920 caches are arranged in clusters per 4 cores, you'd need to be able to pin thread and memory allocations to particular cores to obtain optimal memory layout. Along the lines of what the numactl command does. The --affinity option would help here. |
Currently it assumes RV64GC as the baseline, which matches what Linux expects in terms of ISA extensions.
Additionally, when configured with
-DARCH=native
, it will check for the presence of two additional extensions:These two extensions give about a 3-5% speed-up compared to the base ISA and a 10% speed-up for cache initialization (Argon2 is heavy on rotations).
Two additional extensions will be useful in the future: V (for SIMD) and Zkn (for AES), but AFAIK there are currently no chips that support them.
Here are some benchmarks with my StarFive JH7110 based board:
The mining performance is about 75% of the Raspberry Pi 4, which is not bad considering the lack of vector instructions.
The verification performance is probably more useful as it's substantially faster than the interpreter, which takes over 1300 ms to verify a hash.