There probably isn't anything in CUDA that makes it special. They are well optimised math libraries and the math for most of the important stuff is somewhat trivial. AI seems to be >80% matrix multiplication - well optimised BLAS is tricky to implement, but even a bad implementation would see all the major libraries support AMD.
The vendor "lock in" is because it takes a few years for decisions to be expressed in marketable silicon and literally only Nvidia was trying to be in the market 5 years ago. I've seen a lot of AMD cards that just crashed when used for anything outside OpenGL. I had a bunch of AI related projects die back in 2019 because initialising OpenCL crashed the drivers. If you believe the official docs everything would work fine. Great card except for the fact that compute didn't work.
At the time I thought it was maybe just me. After seeing geohotz's saga trying to make tinygrad work on AMD cards and having a feel for how badly unsupported AMD hardware is by the machine learning community, it makes a lot of sense to me that it is a systemic issue and AMD didn't have any corporate sense of urgency about fixing those problems.
Maybe there is something magic in CUDA, but if there is it is probably either their memory management model or something quite technical like that. Not the API.
The magic is that CUDA actually works well. There is no reason to pick OpenCL, ROCm, Sycl or others if you get a 10x better developer experience with CUDA.
> The vendor "lock in" is because it takes a few years for decisions to be expressed in marketable silicon and literally only Nvidia was trying to be in the market 5 years ago.
It's crazy, because even 10 years ago it was already obvious that machine learning was big and is only going to become more important. AlphaGo vs Lee Sedol happened in 2016. Computer vision was making big strides.
5 years ago, large language model hadn't really arrived on the scene yet, at least as impressively as today, but I think eg Google was already using machine learning for Google Translate?
I'd have been happy to use OpenBLAS if it worked on a GPU. Any API is good enough for me. I have yet to see anything in the machine learning world that required real complexity, the pain seems to be in figuring out black box data, models and decyphering what people actually did to get their research results.
The problem I had with my AMD card was that SYCL, like every other API, will involve making calls to AMD's kernel drivers and firmware that would crash the program or the computer (the crash was inevitable, but how it happened depended on circumstances).
The AMD drivers themselves are actually pretty good overall, if you want a desktop graphics card for linux I recommend AMD. Open source drivers have a noticeably higher average quality than the binary stuff Nvidia puts out. Rock solid most of the time. But for anything involving OpenCL, ROCm or friends I had a very rough experience. It didn't matter what, because the calls eventually end up going through the kernel and whatever the root problem is lives somewhere around there.
The biggest problem with SyCL is that AMD doesn’t want to back a horse that they don’t control (same reason they opposed streamline) so they won’t support it. When the #2 player in a 2-player market won’t play ball, you don’t have a standard.
Beyond that, AMD’s implementation is broken.
Same as Vulkan Compute - SPIR-V could be cool but it’s broken on AMD hardware, and AMD institutionally opposes hitching their horse to a wagon they didn't invent themselves.
This is why people keep saying that NVIDIA isn't acting anticompetitively. They're not, it's the Steam/Valve situation where their opponents are just intent on constantly shooting themselves in the head while NVIDIA carries merrily on along getting their work done.
The vendor "lock in" is because it takes a few years for decisions to be expressed in marketable silicon and literally only Nvidia was trying to be in the market 5 years ago. I've seen a lot of AMD cards that just crashed when used for anything outside OpenGL. I had a bunch of AI related projects die back in 2019 because initialising OpenCL crashed the drivers. If you believe the official docs everything would work fine. Great card except for the fact that compute didn't work.
At the time I thought it was maybe just me. After seeing geohotz's saga trying to make tinygrad work on AMD cards and having a feel for how badly unsupported AMD hardware is by the machine learning community, it makes a lot of sense to me that it is a systemic issue and AMD didn't have any corporate sense of urgency about fixing those problems.
Maybe there is something magic in CUDA, but if there is it is probably either their memory management model or something quite technical like that. Not the API.