Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generate Rust bindings to Intel GKL for SmithWatermanAligner and PairHMM #21

Closed
rhysnewell opened this issue Oct 27, 2021 · 21 comments
Closed
Labels
enhancement New feature or request help wanted Extra attention is needed

Comments

@rhysnewell
Copy link
Owner

This is a non-trivial task but I think might be possible. The GATK makes use of some specially designed C libraries made by intel which greatly improves the performance of the SmithWatermanAligner and PairHMM classes. It seems like it is possible to make calls to native C libraries in Rust but there is a bit of work involved, see: https://medium.com/dwelo-r-d/using-c-libraries-in-rust-13961948c72a and https://doc.rust-lang.org/nomicon/ffi.html

I doubt I have the time to work on this at the moment, and Lorikeet is already faster than the GATK... BUT it could be faster. So if anyone sees this and has experience with this sort of thing, then your help would be much appreciated.

@rhysnewell rhysnewell added enhancement New feature or request help wanted Extra attention is needed labels Oct 27, 2021
@rhysnewell
Copy link
Owner Author

I've made this post on reddit: https://www.reddit.com/r/rust/comments/qhaaju/help_wanted_opensource_genomics_project_in_rust/

A lot of people have provided some great feedback and suggestions. Additionally, some users have pointed out what feasibility of getting the GKL library to work with Rust is. The problem with it is that it is written in JNI, which is not something you can directly use bindgen on and get Rust to use. But one user (brand_x) pointed out the follow

It's not quite so bad as it looked. While this project is, indeed, straight JNI (as in, there's no cleanly separated C library to point bindgen at), it looks like the separation is actually fairly clean within the project. All the files named Intel* are jni binding, and they happen to nicely document what would need to be exported into pure C interfaces to be fed to bindgen. Everything else appears to be pure C++.

The two algorithms mentioned have fairly clean interfaces in their (non-JNI) C++ header files, and it might be sufficient to edit the project files to exclude the JNI, bindgen those headers, and go for it.

Additionally, dannohung pointed out that this pull request IntelLabs/GKL#91 seems to have done a lot of the hard work already!

@philipc
Copy link

philipc commented Nov 13, 2021

I've created a simple binding for the pairhmm likelihood at https://github.com/philipc/gkl-rs. It contains a copy of the Intel GKL code, rather than trying to adapt its build system. Not published and no documentation yet, but I'm sure you can figure it out. See the tests for example usage. Let me know if this is useful to you, and I can do the SmithWatermanAligner too.

@philipc
Copy link

philipc commented Nov 13, 2021

I've also been contemplating converting the code to rust, mostly as an experiment with using SIMD in rust. Not sure if that would be useful to you too. I'm guessing you might prefer to track the upstream GKL instead.

@rhysnewell
Copy link
Owner Author

That's awesome! I'll try and implement this as soon as I can and see how it goes. I'll let you know if I have any questions

As to your second point, I'm sure there is definitely an audience of other users who would find use in you fully reimplementing it rust. I look forward to watching your progress on it if you get around to it

@rhysnewell
Copy link
Owner Author

Hi @philipc,

I think I've got the pairhmm refactor mostly working but I've just come across a strange issue I was hoping you could help me out with. The likelihood values returned by gkl-rs seem to be way off when I use it lorikeet, but the actual tests for gkl-rs pass perfectly fine when I run them myself. I've added in a new unit test that displays this issue for me (https://github.com/rhysnewell/Lorikeet/blob/gkl/tests/vector_pair_hmm_unit_tests.rs), as you can see the start of it is pretty much the same as the test you've got set up but the value the forward function returns are way off from the expected result.

The section I'm talking about looks like this:

        let avx_function = forward().unwrap();
        let result = avx_function(
            hap_bases,
            bases,
            base_quals,
            insertion_quals,
            deletion_quals,
            gcp
        );

        println!("Direct result {}", result);
        // This should work but doesn't??
        assert!((result - expected_result).abs() < 1e-5);

Am I missing something here? Is there perhaps a backend library that I need to update for gkl to give the correct result?

Cheers,
Rhys

@lnicola

This comment has been minimized.

@philipc
Copy link

philipc commented Nov 16, 2021

There was some missing initialisation, I've pushed a fix for that.

I briefly tried running miri, but it complains about file I/O and I didn't find the correct invocation yet.

@lnicola

This comment has been minimized.

@philipc
Copy link

philipc commented Nov 16, 2021

Thanks, that gets miri to run, but I suspect that it doesn't check anything because the AVX target features won't be enabled?

@lnicola

This comment has been minimized.

@rhysnewell
Copy link
Owner Author

@philipc Looks like that update fixed this issue! I'll post some new benchmarks in the morning with avx enabled.

The smith waterman stuff looks almost done as well but are you still working on that or should I try implementing that as well?

@philipc
Copy link

philipc commented Nov 16, 2021

I think the smith waterman is done. It doesn't have good tests yet though (GKL doesn't either). Also, let me know if anything can be done to make the API simpler for you (I'm not sure about the whole thing with returning a function pointer).

@rhysnewell
Copy link
Owner Author

Hey @philipc,

Sorry this took longer to get done than expected due to some server issues on our end.

I've got the smith waterman implemented and ran my unit tests using the AVX version and they were all green, so looks like it's good to use! As for the API, I'm happy with it although it does feel a bit odd to be unwrapping a pointer to a function but I think it is actually the best way to handle it. I like that it is nice and simple, just a couple of functions that slot neatly into what I've already got going.

So now for the benchmarks! Things look great, I'm really happy with your work here:

speed_benchmarks_combined_no_instrain

These are two different benchmarks on (A) a low depth sample, and (B) a high depth sample. The AVX version is about 3 times faster than the logless caching version, and maybe about 6 times faster than GATK AVX which is great. Additionally, the CPU usage is decreased dramatically especially on that low depth sample. RAM goes down as well which is nice to see. Another benchmark I ran that isn't included here is even more impressive, something that potentially was going to take days to complete managed to finish up in 1 hour and 20 minutes. So way more user friendly now!

I think I'll close this issue now as you have solved the issue. Would you be able to send me through your details to [email protected] and we'll make sure to include you as an author.

Thanks once again, brilliant stuff
Cheers,
Rhys

@philipc
Copy link

philipc commented Dec 4, 2021

I've got a rough rust implementation done for PairHMM, and it is significantly faster than the C version (25-35% improvement depending on the vector type). I expect this improvement is due to implementation differences that could probably be applied to the C version too.

I think the main drawback of the rust version is that AVX-512 support still requires nightly rust. Is this something that you would be interested in using? Maybe use a cargo feature to enable it in gkl-rs?

@rhysnewell
Copy link
Owner Author

That sounds great! I'd definitely be interested in a pure rust implementation.

That's a bit of a bummer about nightly, but kind of expected as well. I think a cargo feature would be the best way to go and then it would be super easy to make sure that worked with lorikeet

@rhysnewell
Copy link
Owner Author

Hi @philipc ,
Just wanted to check in and see how things were progressing with the rust implementations of these algorithms. I've poked around the GitHub and things seem fairly complete imo, but just wondering if you needed some help closing it out or something?

Cheers,
Rhys

@philipc
Copy link

philipc commented Feb 9, 2022

The main thing I still want to do is to get it all working on stable rust. This will need the inline asm support in rust 1.59, which releases on Feb 24. This is required for the AVX-512 support, which doesn't have stabilized instrinsics, so we'll need to use inline asm instead. I still need to check that the inline asm performance is just as good though. There are also a couple of llvm optimization bugs which I concluded needed to be worked around with inline asm. I'd also like to do a NEON implementation but that's not important right now.

@rhysnewell
Copy link
Owner Author

Ah that makes sense, I'm happy to continue waiting for stable rust so no rush. That sounds good to me, I'm sure the performance should be on par. Even if it isn't, having it all in pure stable rust is pretty nice bonus in my opinion.
Speaking of bugs, I've opened up an issue on gkl-rs regarding some maybe incorrect alignments being produced by the smith-waterman aligner that you may or not already be aware of.

I don't think I know what NEON is, what is it used for?

@philipc
Copy link

philipc commented Feb 9, 2022

NEON is the ARM SIMD instruction set. I don't know how commonly ARM is used for HPC.

@philipc
Copy link

philipc commented Feb 11, 2022

I was mistaken. Inline asm isn't enough to avoid needing some AVX-512 stabilisation too. We need the target features (avx512f etc) and some of the types (__m512 etc). Not sure where to go with that. Do you want to use nightly rust?

@rhysnewell
Copy link
Owner Author

I'd probably want to avoid the use of nightly for Lorikeet at the moment as it would make packaging the software through conda a lot more difficult for me I think. I'm happy to keep using gkl 0.1.1 for now though and you can switch gkl over to nightly so other people can make use of the fastest version, but that's up to you

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

3 participants