Maybe, but note that this is the Intel MKL. A library developed and maintained b...

kxyvr · on Nov 11, 2020

Thanks for the links. If anyone is wondering about some of the hoops that need to be jumped through to make it work, here's another guide [1].

One question in case you or anyone else knows: What's the story behind AMD's apparent lack of math library development? Years ago, AMD and ACML as their high-performance BLAS competitor to MKL. Eventually, it hit end of life and became AOCL [2]. I've not tried it, but I'm sure it's fine. That said, Intel has done steady, consistent work on MKL and added a huge amount of really important functionality such as its sparse libraries. When it works, AMD has also benefited from this work as well, but I've also been surprised that they haven't made similar investments.

Also, in case anyone is wondering, ARM's competing library is called the Arm Performance Libraries. Not sure how well it works and it's only available under a commercial license. I just went to check and pricing is not immediately available. All that said, it looks to be dense BLAS/LAPACK along with FFT and no sparse.

[1] https://www.pugetsystems.com/labs/hpc/How-To-Use-MKL-with-AM...

[2] https://developer.amd.com/amd-aocl/

microtonal · on Nov 11, 2020

Eventually, it hit end of life and became AOCL [2]. I've not tried it, but I'm sure it's fine.

It's ok. I did some experiments with transformer networks using libtorch. The numbers on a Ryzen 3700X were (sentences per second, 4 threads):

OpenBLAS: 83, BLIS: 69, AMD BLIS: 80, MKL: 119

On a Xeon Gold 6138:

OpenBLAS: 88, BLIS: 52, AMD BLIS: 59, MKL: 128

OpenBLAS was faster than AMD BLIS. But MKL beats everyone else by a wide margin because it has a special batched GEMM operation. Not only do they have very optimized kernels, they actively participate in the various ecosystems (such as PyTorch) and provide specialized implementations.

AMD is doing well with hardware, but it's surprising how much they drop the ball with ROCm and the CPU software ecosystem. (Of course, they are doing great work with open sourcing GPU drivers, AMDVLK, etc.)

gnufx · on Nov 11, 2020

If you care about small matrices on x86_64, you should look at libxsmm, which is the reason MKL now does well in that regime. (Those numbers aren't representative of large BLAS.)

my123 · on Nov 11, 2020

A free version of the Arm Performance Libraries is available at:

https://developer.arm.com/tools-and-software/server-and-hpc/...

rurban · on Nov 12, 2020

> What's the story behind AMD's apparent lack of math library development?

I don't see a story. AMD supports a proper libm for gcc and llvm, has its own libm, BLAD, LAPACK, ... at https://developer.amd.com/amd-aocl/

Just their rdrand intrinsic is broken on most ryzens if you didn't patch it. Fedora firmware doesn't patch it for you.

gnufx · on Nov 11, 2020

You just run MKL from the oneapi distribution, and it gives decent performance on EPYC2, but basically only for double precision, and I don't remember if that includes complex.

ACML was never competitive in my comparisons with Goto/OpenBLAS on a variety of opterons. It's been discarded, and AMD now use a somewhat enhanced version of BLIS.

BLIS is similar to, sometimes better than, ARMPL on aarch64, like thunderx2.