The cross-platform nature is actually part of the problem--the whole point of doing GPGPU work is that you're playing to the hardwares' strengths, which can be difficult when the hardware can be nearly anything from a CPU to a GPU to an FPGA.
It doesn't help that until recently, AMD hasn't tried to push OpenCL nearly as hard as nVIDIA pushes CUDA.
Modern AMD and NVIDIA GPUs are fairly similar hardware-wise, and it is not hard to write OpenCL code that executes efficiently on both. I agree that it is pretty hopeless to write performance-portable OpenCL across entirely different architectures, however.
Sure, but if you go with nVIDIA, you also get access to all the other goodies they distribute (thrust, cudaFFT, cudaDNN, etc) and all the CUDA-compatible stuff other people have written, like Theano and TensorFlow.
It does seem like people have gotten a little more interested in OpenCL lately, but it still lags pretty far behind. As dharma1 says below, AMD seems weirdly uninterested in catching up. If I were in change of AMD, I'd be throwing money and programmers at this: "Want to port your library to OpenCL? Here, have a GPU! We'll help."
AMD management has completely missed the memo on deep learning. No mention of deep learning or FP16 perf yesterday when Polaris was announced - it was all around VR.
They are just not turning up to the party and as a company are running out of time if Polaris and Zen dont sell.
The cross-platform nature is actually part of the problem--the whole point of doing GPGPU work is that you're playing to the hardwares' strengths, which can be difficult when the hardware can be nearly anything from a CPU to a GPU to an FPGA.
It doesn't help that until recently, AMD hasn't tried to push OpenCL nearly as hard as nVIDIA pushes CUDA.