I often run into situations where I hope pandas were 50-100x faster.
Dask can help, but introduces quite a bit of additional complexity.
I'm also looking forward to stricter data models than what pandas currently uses, in particular proper null support for all dtypes and less random type conversion.
That blog post is about how to load more data into pandas via Arrow. RAPIDS is about how to then compute on it. It's all the same people working on Arrow and GoAi. So... Yes :)
Yep! Companies in the GPU Open Analytics initiative like Conda (dask, conda), MapD, Ursa Labs (Arrow, Pandas), BlazingDB, and ourselves (Graphistry!) are making a full ecosystem of GPU analytics, similar to what happened with Hadoop for the last 10-15 years. There is little point for all of us to keep reimplementing basic things like in-GPU database scans (PyGDF), and we want fast interop for ecosystem composition / network effects (GPU take on Arrow). Members are contributing pieces we believe are / should be commodity for the GPU wave to get here faster and bigger.
The RAPIDS team has been really stepping up here -- PyGDF, and various bindings & IO helpers -- and collaborating with many of us to get them right. Another member was intending to share as the open compute core, but was insufficiently open (kept multigpu proprietary?), and had little uptake, so Nvidia stepped up with RAPIDS. The result is a more neutral solution, and already with demonstrated framework dev uptake uptake. And hopefully, more GPU compute everywhere, faster :)
Many of us are in the GoAi Slack, tho more for syncing. Most of it is per-project work -- our JS work for Arrow/GoAi is in the Graphistry Slack, PyGDF is in the GoAi one, GPU Arrow is in Arrow mailing list or GoAi Slack, MapD is in MapD, etc.
2018 has really been internally focused for pulling GPU islands into a GPU mountain. Expecting 2019 to be way more externally focused. Each milestone like this gets us closer :)
Looks interesting. The performance chart seems a bit misleading, as you can get a good 100 core CPU setup for $20k but a DX-2 costs 20 times that at $400k.
It makes me sad that CUDA is more popular than OpenCL. I would be willing to sacrifice some performance for portability, openness and avoiding vendor lock-in.
> MIOpen[1] is a step in this direction but still causes the VEGA 64 + MIOpen to be 60% of the performance of a 1080 Ti + CuDNN based on benchmarks we've conducted internally at Lambda. Let that soak in for a second: the VEGA 64 (15TFLOPS theoretical peak) is 0.6x of a 1080 Ti (11.3TFLOPS theoretical peak). MIOpen is very far behind CuDNN.
That performance penalty is a bit too steep for me. Vega64 should be 1.3x perf compared to 1080ti, but instead it is 0.6x. 50% lower performance is quite a sacrifice.
> Most people in the world can't afford 600$ Nvidia GPUs.
Ok? I'm not what your point is here - most people can't afford a $600 amd gpu either. If you can't get a $600 gpu, buy a cheaper gpu - gtx 1060 is $250 or gtx 960 is $50.
Performance costs a premium, and that is just how is is. It would be great if we lived in a world where you could buy a Titan v for $1, but in the real world valuable things cost more money, and that unfortunately means not everyone can buy them.
I don't think performance is the achilles heel of OpenCL. There are other problems that I have when working with:
The implementations ended up becoming a "write once, debug everywhere" ecosystem. Making the compiler a part of the driver meant that OpenCL kernels that worked on platform A wouldn't even compile on platform B. SPIR-V should have been part of OpenCL from version 1.0.
Profiling OpenCL kernels is a stumbling block too - IMHO, you shouldn't ship a toolkit without a profiler and claim it's intended for HPC.
IEE 754 compliance should also have been part of 1.0 and not an afterthought. Without it, it's harder to verify that your kernel works correctly, since it's permitted to deliver different results than a C reference implementation.
With CUDA, at least you're guaranteed* that your code that runs on your CUDA equipped machine, also runs on any other machine that supports CUDA - regardless of OS, regardless of GPU. With OpenCL, not so much.
* Within the same restrictions as CPU code - instruction set, RAM and compiler bugs.
I never cared to play around OpenCL, because they were stuck with C only mentality, plus the stone age idea from GLSL that I should somehow read text files, compile and link them programmatically, and a pat on the back for debugging.
CUDA right from the start supported C, C++, Fortran, with a bytecode format for other compiler backends, and a nice debugging experience.
Obvious which one would gain the hearts of developers that have moved beyond C.
It took Khronos up to OpenCL 2.0 to fix this, by then it was too late.
Even on mobile devices, Google created their own C99 dialect (Renderscript) instead of adopting OpenCL.
Love the implication that "any GPU" means it works on expensive Nvidia chips and also slightly less expensive Nvidia chips.