Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some OpenBLAS (blas extension) missing from JavaCPP presets #1570

Open
blueberry opened this issue Jan 20, 2025 · 16 comments
Open

Some OpenBLAS (blas extension) missing from JavaCPP presets #1570

blueberry opened this issue Jan 20, 2025 · 16 comments

Comments

@blueberry
Copy link
Contributor

Hi.

I've started using OpenBLAS, and I noticed that the latest (1.5.11) JavaCPP preset for OpenBLAS (0.3.28) is missing some very useful methods that are not in BLAS standard, but ARE in OpenBLAS's cblas.h, and in the OpenBLAS implementation.

Namely, the ones that I've noticed are:

  1. the ones from the OpenBLAS extension doc page (http://www.openmathlib.org/OpenBLAS/docs/extensions/), which are included in cblas.h from the OpenBLAS source (but are of course not in the standard netlib cblas.h):
  1. ?axpby s,d,c,z like axpy with a multiplier for y
  2. ?gemm3m c,z gemm3m
  3. ?imatcopy s,d,c,z in-place transposition/copying
  4. ?omatcopy s,d,c,z out-of-place transposition/copying
  5. ?geadd s,d,c,z ATLAS-like matrix add B = α*A+β*B
  6. ?gemmt s,d,c,z gemm but only a triangular part updated
  1. cblas_i?amin and cblas_i?amax, which are not explicitly mentioned as extension, but are included in OpenBLAS cblas.h

Most of these methods are also present in the MKL counterparts, are very useful, and, I believe, are expected to be found, if not in a "standard" package, at least as some form of "extension", where possible.

Since they are present in the OpenBLAS cblas.h, I believe they are missing because the automatic JavaCPP generator uses a netlib cblas.h,
but, on the other hand, some other methods that I see in OpenBLAS cblas.h, such as openblas_set_num_threads, are present in JavaCPP OpenBLAS preset package.

I hope there is an easy way to update the JavaCPP presets to include these methods, and that the way is that we make sure that the generator sees the headers that come with OpenBLAS, instead of using the netlib ones from the build.

What is the best way to fix this? (I'm of course willing to to do the work on that fix if that's necessary, but I'd rather follow expert instructions, instead of poking in the dark :)

@saudet
Copy link
Member

saudet commented Jan 20, 2025

The functions that get excluded are not in either Accelerate or MKL. It's possible some of them have been added since. Please open a pull request with the those functions "unskipped" and let's test it out.

@blueberry
Copy link
Contributor Author

You are right that these functions are not present in MKL, but only under cblas_ prefix. MKL does include them (maybe not all, but the majority), and even javacpp-presets has them, only under MKL_ prefix.

For example, mkl_rt.mkl_somatcopy (https://bytedeco.org/javacpp-presets/mkl/apidocs/org/bytedeco/mkl/global/mkl_rt.html#mkl_somatcopy-byte-byte-long-long-float-float:A-long-float:A-long-),

Some in MKL, such as cblas_saxpby are even under cblas_ prefix, even though they are extensions (https://bytedeco.org/javacpp-presets/mkl/apidocs/org/bytedeco/mkl/global/mkl_rt.html#cblas_saxpby-int-float-float:A-int-float-float:A-int-)

I guess that the issue here is that these functions are not really the same as in MKL, but equivalent, but still formally different, functions. Is this a technical showstopper, consideirng that the OpenBLAS preset is primarily dealing with OpenBLAS? Is there a way of grouping them in a subclass, so that they are built, and used on request, without polluting the standard api that might be re-used with Accelerate and MKL?

I am referring to these functions exactly beacuse I used them through javacpp's mkl preset, so I know they are there, and they are working well :)

In the case of openblas, I couldn't find the skip() call in the OpenBLAS generator, but now I see that these functions are explicitly skipped in openblas_nolapack, so I know where to look at.

So, just to confirm whether this is a good sequence how I should proceed from here:

  1. I should clone javacpp preset source for openblas, configure the compilation requirements, and try to build the openblas preset as-is.
  2. if 1 works, I should just remove the desired functions from the skip() call, and see whether that works
  3. if 2 works, I should issue a pull request

Is it OK if these functions are enabled, despite not being present in accelerate and MKL? Will the generated binaries work with Accelerate, provided that the offending functions present only in openblas are not explicitly called by the client?

@saudet
Copy link
Member

saudet commented Jan 21, 2025

So, just to confirm whether this is a good sequence how I should proceed from here:

Yes, more or less, but you don't need to be able to build by yourself. You can leave that to GitHub Actions. Being able to build it locally will shorten the amount of time you need to wait to see if the build fails or not though.

Is it OK if these functions are enabled, despite not being present in accelerate and MKL? Will the generated binaries work with Accelerate, provided that the offending functions present only in openblas are not explicitly called by the client?

If we make the presets link with functions that don't exist in Accelerate or MKL, users won't be able to use Accelerate or MKL. However, that doesn't seem as important as it used to since both Intel and Apple are not investing so much in Accelerate and MKL anymore while OpenBLAS has been improving much more.

@blueberry
Copy link
Contributor Author

Is there a way to have the cake and eat it too? What about a subclass of the OpenBLAS class, for the "extra" functions only?
Does that make any difference?

@saudet
Copy link
Member

saudet commented Jan 21, 2025

Sure, we can create wrapper functions in C++ and have that new API get exposed automatically via JavaCPP through the presets as per this log_callback() function:
https://github.com/bytedeco/javacpp-presets/blob/master/ffmpeg/src/main/resources/org/bytedeco/ffmpeg/include/log_callback.h#L29

We can also do that directly in Java, but the amount of code that we need to write is a lot larger because JavaCPP doesn't do that automatically as per blas_set_num_threads() here:
https://github.com/bytedeco/javacpp-presets/blob/master/openblas/src/main/java/org/bytedeco/openblas/presets/openblas_nolapack.java#L206

@blueberry
Copy link
Contributor Author

blueberry commented Jan 21, 2025

In the meantime I got another idea. It looks neater to me, but let's hear what you think about it:

Maybe we can create (by copypasta) an openblas-full preset as an independent preset. In that scenario, the existing openblas stays compatible with Accelerate and MKL, while openblas-full gets everything enabled. Then, whoever needs to use openblas as a lowest common denominator for OpenBLAS, Accelerate, and MKL, uses the openblas preset. The users who commit to OpenBLAS only can get all available functionality.

What do you think about that?

@saudet
Copy link
Member

saudet commented Jan 21, 2025

Sure, it doesn't need to be another presets though. We can add an openblas_full.java here for that:
https://github.com/bytedeco/javacpp-presets/tree/master/openblas/src/main/java/org/bytedeco/openblas/presets

@blueberry
Copy link
Contributor Author

I guess by overriding the map method?

@saudet
Copy link
Member

saudet commented Jan 21, 2025

Sure, whatever works

@blueberry
Copy link
Contributor Author

In the meantime, just reporting that the build with 20-odd "extra" functions "unskipped" passed on linux x86_64.
I'll create an openblas-full in a few days (it's 2AM here, so I'm just doing a quick testrun) and when I test it, I'll create a pull request.

Now, a technical question related to releases. If this works as I hope it does, is it realistic to expect a non-snapshot release of the preset? I know you usually do only a few releases a year. Is that a hard limit of sorts, or this could be a candidate for 1.5.12, once it is properly tested?

@blueberry
Copy link
Contributor Author

Hi. I've added openblas_full.java, which now passes the linux build. Before I test the build on other platforms and wrap it up into a pull request, I have one dilemma that you might resolve.

I've made openblass_full inherit the openblas class, redefining only the map() method:
https://github.com/blueberry/javacpp-presets/blob/master/openblas/src/main/java/org/bytedeco/openblas/presets/openblas_full.java

The idea is that whoever wants the "old" functionality still has it, untouched, while the userst that need openblas "extra" methods can use the full functionality through the new class.

Dilemma: IF someone wishes to use Accelerate/MKL for the openblas portion, they will configure this through jvm options. All good. BUT, what if they want to combine the two, by calling the core methods form the openblas class, while calling the "extra" methods from the full class?

My hunch is that that will fail, since the openblas library will try to delegate everything to Accelerate/MKL. I've noticed that the mixing possibility might be hinted in the implementation of the openblas_nolapack class here (https://github.com/blueberry/javacpp-presets/blob/2bc0cdecb62726f3d594aababfd0d9d1ac290e80/openblas/src/main/java/org/bytedeco/openblas/presets/openblas_nolapack.java#L78) but, not being a JavaCPP expert, I am not sure how to configure that.

Any comments and pointers?

@saudet
Copy link
Member

saudet commented Jan 21, 2025

It's not possible to load more than a single implementation of BLAS in the same process. You need to choose one.

@blueberry
Copy link
Contributor Author

OK, good to know. I'll run the builds on all platforms, and create a pull request if everything passes.

@blueberry
Copy link
Contributor Author

All builds pass. Here's the pull request: #1571

@blueberry
Copy link
Contributor Author

Hi @saudet Thanks for helping with this. Unfortunately, it seems that the final versions with functionsToSkip() did not work as intended. I've included the last SNAPSHOT in a project; the openblas_full class has been generated, but it does not contain any of the "extra" methods that we supposed to "unskip".

Looking at the code, Java-wise, I can't see why openblas_full is not getting an empty string array to skip, but JavaCPP-wise, there might be a gotcha.

Is this worth pursuing, or should we revert to the first solution that repeats some code? (but who knows, maybe the same gotcha is relevant there?)

@blueberry
Copy link
Contributor Author

BTW, the openblas_full class repeats the methods that are already in the openblas class. I'm not sure is that the JavaCPP generator is supposed to do that when properly configured (judging by the openblas_nolapack and openblas, it is not...).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants