OpenBLAS ( openblas_pthreads0 ) disables OpenMP for whatever it's linked into

Hello, there

I’ve been chasing a nasty bug here at work where, following us moving from Ubuntu (eeew) to OpenSUSE, a lot of our stuff seems to run randomly single-threaded.
After much fun sectioning off things, I’ve found the actual culprit, and it is libopenblas_pthreads.so.0

The code that is being affected relies on OpenMP and, sure enough, calling omp_num_procs() returns 1 in those cases.

From having hacked away into our own code I believe that what’s happening is that at some point the code for that lib is going on and just
“Well, I don’t want to ifdef everything so if we’re not using OpenMP just define this as returning 1”, something I DID see in the a different library we had.

I’m still playing around and investigating the exact way this happens and precisely what the solution is. For us, update-alternatives seems to not be helping because
we end up bringing openblas not directly but through OpenCV, which seems to link directly against the pthreads version.
(EDIT: Additionally, it lists libblas.so but the CMake config scripts which is what I believe even OpenCV would be using, provide libopenblas.so instead)

I’ll continue to investigate and want to find out what’s the best way of either fixing or working around this, as it spreads very unwanted behaviour pretty wantonly.

What I’m looking for help with is to track the actual package in the OBS as I’m still not super familiar with it, so if indeed the solution is a quick patch, I can work this with the
actual package maintainers, as well as making sure I’m looking at the right code and build scripts to see if I’m on the right track with my current assumptions as I tweak and test.

Would this be the correct place?
https://build.opensuse.org/package/show/openSUSE:Leap:15.0/openblas

Thx

Hi
Yes and no, work with the development project here;

https://build.opensuse.org/package/show/science/openblas

Then it would be a bug report and maintenance update for openSUSE Leap 15.0.

Ty ty.

Will keep chasing this and take it from there

Just wondering but have no real idea what is correct (I wish there was some Wiki somewhere that describes the various OpenBLAS packages),

The README for your installed openblas-pthreads package says that the pthreads package is the correct package for installing on x86 machines, the serial package is for some reason deprecated (but apparently still available) and the openmp package then I assume might be appropriate for non-x86 platforms.

Admittedly I’m trying to fill in the blanks about openmp, but could your error suggest that

https://software.opensuse.org/package/openblas_openmp

Mind that what I’ve suggested is coming from someone who has no reliable info and is guessing, but I don’t know that any authoritative info exists unless you can ask the package maintainers directly.

And BTW -
The openblas Wiki clearly says that it’s very important to configure your openblas to be running multi-threaded or single-threaded, so what you’ve detected could be caused by any variety of possibilities… Auto-configuration, error, deadlocking, who knows (at least beyond my current understanding)
https://github.com/xianyi/OpenBLAS/wiki/faq#multi-threaded

TSU

Hey there.

It’s all good man, I’m guessing a lot of stuff myself.

So the app is very much yes built to run in x86 (well, x64). I know it cause we’re building it, and I’m the one who has to whip the CI into behaving =P

Using the openblas-openmp version 100% solves the problem. Everything works well, peace returns to the mushroom kingdom, all the good stuff. And I can, sure, just wrangle our own stuff into using the “correct” version of openblas, I don’t even think we’re using it directly.

The problem here comes, though, from the fact that, it’s not so much that “openblas runs single-threaded”. Is that “if openblas gets linked in at any point it kills multithreading through OpenMP for the entire thing” and some of the libraries we rely on use OpenMP very heavily. For the specific project that I’m looking on right now, we bring it in through OpenCV, I know of at least another one which will bring it in through Ceres… And since the pthreads version is the default, these system libraries link to it directly (which you can check through ldd).

This means that even if I “don’t want to make fixing this my problem”, and work around it providing alternative packages for us to use (rebuilding OpenCV to link to the OpenMP version of OpenBLAS instead, for example), we would still be in risk of the pthreads version getting pulled in in the future and then we’re back with this problem.

I’m pretty confident that openblas should NOT be killing OpenMP just because it’s doing multithreading through pthreads instead. As I mentioned, the one other time I’ve seen this happen, one of the libraries we use had a header that went “well, if _OPENMP is not defined, then just inline int omp_get_num_procs() {return 1;}” which had the same effect. I was kinda hoping to see something this incredibly dumb in OpenBLAS’ code as well but alas, it seems it’s not quite as trivial. So I’m giving a shot at figuring this out as it’s a massive performance issue that can hit people with very little warning

Well, I’ve been fiddling and crying… I couldn’t find anything that looked suspicious in the code from the science repository.

After building it decided to “eh, let’s see if this would be a drop-in replacement” and just force installed the package generated by osc.
Lo and behold, omp_get_num_procs() now returns my expected 8 on my test app (basically just include omp.h and output the result of that).
Didn’t even rebuild. Intrigued, I opened YasT and asked zypper to revert the package to the official one. And the problem was back.

So whatever was wrong, it seems to have been fixed, perhaps in the version bump that openblas got, from 0.2.2 to 0.3.4. So, possibly magically fixed, I guess?

I know this requires actual testing and stuff, but this really warrants an update. If not actually just updating the lib version, then chasing down “what was fixed”.

Is there a specific channel I could bring all this info to in the spirit of getting the package updated on the main repo?
On my end, I might end up just either providing a different version of the package on our own private repos or maybe adding one of the factory repos to our installs, maybe, just to provide OpenBLAS.
I do feel, however, that this is a nasty bug that really needs to be weeded out of the main release

Submit your findings and request for updating as a bug to https://bugzilla.opensuse.org.

Just clarifying, did you actually try the OpenBLAS package with OpenMP as I suggested?
I don’t know that the pthreads package you’re using was compiled with OpenMP support (both OpenBLAS packages are available from the science repo).
Then again, as I mentioned before, there doesn’t seem to be any documentation anywhere that states authoritively what is or is missing in each package.

Apparently even now you could install another version of OpenBLAS from the openSUSE repos and use update-alternatives to switch between them.

TSU

Yes I did try and, it works perfectly. My problem is not really with what happens inside the pthreads OpenBLAS itself. Instead, the issue is that it “poisons” everything it touches so, in our case,
our own software which just happens to incidentally bring OpenBLAS in through an OpenCV dependency gets all OpenMP functionality shut down for what initially seems to be no reason

I did check the update-alternatives. It changes the libblas symlink (IIRC) which still doesn’t help because LDDing libopencv_core reveals IT wants directly libopenblas_pthreads0 so it gets brought in anyway (unless you start hacking deeper there).

Will file a bug in the bugzilla

Couldn’t find and edit so sry for the double post but, I’ve filed the bug in the tracker.

Thx tsu2 and malcomlewis for the pointing me in the right direction.

Bug report is here:
https://bugzilla.opensuse.org/show_bug.cgi?id=1119469

Hi
Thanks for the feedback and bug report number :slight_smile:

FWIW, I blame it all on my old work building being knocked down (BP House) and the water going down the plug hole the wrong way…