Not really a problem. NVIDIA sells cards based around 32-bit (and now increasingly, 16-bit) ALUs for desktop usage, while offering more expensive ones with more 64-bit-focused ALUs for workstations and compute. Compute is important enough to their bottom line to justify it.
The real problem is that NVIDIA has compute locked down with CUDA. Mobile chipset vendors can't expand into compute if they're barred from entry at the API level.
Vulkan/SPIR-V looks promising just needs chip vendors (ARM, Qualcomm, AMD, Intel) to come together and invest in CuDNN equivalents.
Although I reckon deep learning on mobile (at least for some use cases like cameras) will use dedicated silicon from Movidius etc and ultimately be embedded in the camera chips directly
The cross-platform nature is actually part of the problem--the whole point of doing GPGPU work is that you're playing to the hardwares' strengths, which can be difficult when the hardware can be nearly anything from a CPU to a GPU to an FPGA.
It doesn't help that until recently, AMD hasn't tried to push OpenCL nearly as hard as nVIDIA pushes CUDA.
Modern AMD and NVIDIA GPUs are fairly similar hardware-wise, and it is not hard to write OpenCL code that executes efficiently on both. I agree that it is pretty hopeless to write performance-portable OpenCL across entirely different architectures, however.
Sure, but if you go with nVIDIA, you also get access to all the other goodies they distribute (thrust, cudaFFT, cudaDNN, etc) and all the CUDA-compatible stuff other people have written, like Theano and TensorFlow.
It does seem like people have gotten a little more interested in OpenCL lately, but it still lags pretty far behind. As dharma1 says below, AMD seems weirdly uninterested in catching up. If I were in change of AMD, I'd be throwing money and programmers at this: "Want to port your library to OpenCL? Here, have a GPU! We'll help."
AMD management has completely missed the memo on deep learning. No mention of deep learning or FP16 perf yesterday when Polaris was announced - it was all around VR.
They are just not turning up to the party and as a company are running out of time if Polaris and Zen dont sell.
> Given the quality of OpenCL and its cross platform nature
I'm sorry, WHAT? OpenCL is absolute shit. Cumbersome API definition, lack of low-level control, stringly typed programs (all programs are provided as strings and kernels are identified with those too). Which means nearly no compile-time feedback, it's hard to embed GPU kernels into a single binary. The API is woefully lacking in flexibility (no dynamic launch), OpenCL 2.0 is better, (EDIT: Apparently AMD supports it now, I'd have to check whether Intel/NVidia have also added support), but no one supports it so it's also irrelevant.
Not only that, AMD hardware is terrible. Atomics on NVidia's maxwell are orders of magnitude faster than on AMD (to the point of being comparable to non-atomic operations with low contention).
CUDA's environment provides: Better documentation, better feature support, saner development and debugging, possibility to ship both generic & specialised binary kernels, JITtable kernels in intermediate representation, better compile time sanity checking, the ability to generate your own IR/CUDA assembler from non CUDA languages...
The reason everyone does CUDA and uses NVidia is because there's zero real competition. AMD is the only company that cares about OpenCL, Intel and NVidia just implement the bare minimum to have AMD's OpenCL code be portable to them. Intel has OpenMP and TBB for the Phi, NVidia has CUDA.
To me it's crazy that anyone keeps mentioning OpenCL as a serious alternative. In theory I agree that an open standard would be nice, but over here in reality where I have to actually write code there is no realistic alternative to CUDA if you want to stay sane.
You write OpenCL if you want to target anything other than AMD/NVIDIA/Intel. If you're writing code for an embedded application (with some heterogeneous core), or for a mobile application, you absolutely have to write OpenCL code, as there's no alternative. OpenCL is shit, but it's cross platform shit.
If your aim is to get 100% performance in a GPU heavy cluster, then sure, you're going to need to write CUDA code, and buy some NVIDIA GPUS, however there are a lot of applications which run in entirely different environments which _only_ support OpenCL.
Does anyone actually implement OpenCL 2.0 yet? Last I checked not even AMD supported it, and they're the only company that has a reason to care about advancing OpenCL.
The real problem is that NVIDIA has compute locked down with CUDA. Mobile chipset vendors can't expand into compute if they're barred from entry at the API level.