Rapids – Open GPU Data Science

gok · on Oct 13, 2018

"Scaling Out on Any GPU"

Love the implication that "any GPU" means it works on expensive Nvidia chips and also slightly less expensive Nvidia chips.

deepGem · on Oct 13, 2018

I wonder how much of this is driven by market need. Pyarrow + Pandas is significantly fast already.

http://wesmckinney.com/blog/high-perf-arrow-to-pandas/

Also Pandas 2.0 is going to roll in a lot more utulities for parallel computing. Is there really a need for 50-100x speedups today ?

ah- · on Oct 13, 2018

I often run into situations where I hope pandas were 50-100x faster.

Dask can help, but introduces quite a bit of additional complexity.

I'm also looking forward to stricter data models than what pandas currently uses, in particular proper null support for all dtypes and less random type conversion.

wesm · on Oct 13, 2018

RAPIDS is partly powered by Apache Arrow. So we are all collaborating on a common next-generation computation ecosystem.

lmeyerov · on Oct 13, 2018

That blog post is about how to load more data into pandas via Arrow. RAPIDS is about how to then compute on it. It's all the same people working on Arrow and GoAi. So... Yes :)

minimaxir · on Oct 13, 2018

Is this from NVIDIA itself? The only indication that's the case is from a line in the blog post.

lmeyerov · on Oct 13, 2018

Yep! Companies in the GPU Open Analytics initiative like Conda (dask, conda), MapD, Ursa Labs (Arrow, Pandas), BlazingDB, and ourselves (Graphistry!) are making a full ecosystem of GPU analytics, similar to what happened with Hadoop for the last 10-15 years. There is little point for all of us to keep reimplementing basic things like in-GPU database scans (PyGDF), and we want fast interop for ecosystem composition / network effects (GPU take on Arrow). Members are contributing pieces we believe are / should be commodity for the GPU wave to get here faster and bigger.

The RAPIDS team has been really stepping up here -- PyGDF, and various bindings & IO helpers -- and collaborating with many of us to get them right. Another member was intending to share as the open compute core, but was insufficiently open (kept multigpu proprietary?), and had little uptake, so Nvidia stepped up with RAPIDS. The result is a more neutral solution, and already with demonstrated framework dev uptake uptake. And hopefully, more GPU compute everywhere, faster :)

ris · on Oct 13, 2018

> so Nvidia stepped up with RAPIDS. The result is a more neutral solution

How on earth is anything played in NVidia's yard a "more neutral solution"?

dman · on Oct 13, 2018

Is there a mailing list / forum that I can join to come upto speed with the efforts?

lmeyerov · on Oct 13, 2018

Many of us are in the GoAi Slack, tho more for syncing. Most of it is per-project work -- our JS work for Arrow/GoAi is in the Graphistry Slack, PyGDF is in the GoAi one, GPU Arrow is in Arrow mailing list or GoAi Slack, MapD is in MapD, etc.

2018 has really been internally focused for pulling GPU islands into a GPU mountain. Expecting 2019 to be way more externally focused. Each milestone like this gets us closer :)

tmostak · on Oct 13, 2018

And fwiw MapD is now OmniSci.

breck · on Oct 13, 2018

Looks interesting. The performance chart seems a bit misleading, as you can get a good 100 core CPU setup for $20k but a DX-2 costs 20 times that at $400k.

twtw · on Oct 13, 2018

1 node != 1 core.

breck · on Oct 14, 2018

Whoops! I read core and not node. That makes more sense :). Thanks.

brian_herman · on Oct 13, 2018

I don't understand it uses CUDA as a cornerstone? How is that open?

antoineMoPa · on Oct 13, 2018

It makes me sad that CUDA is more popular than OpenCL. I would be willing to sacrifice some performance for portability, openness and avoiding vendor lock-in.

twtw · on Oct 13, 2018

From https://news.ycombinator.com/item?id=18197988:

> MIOpen[1] is a step in this direction but still causes the VEGA 64 + MIOpen to be 60% of the performance of a 1080 Ti + CuDNN based on benchmarks we've conducted internally at Lambda. Let that soak in for a second: the VEGA 64 (15TFLOPS theoretical peak) is 0.6x of a 1080 Ti (11.3TFLOPS theoretical peak). MIOpen is very far behind CuDNN.

That performance penalty is a bit too steep for me. Vega64 should be 1.3x perf compared to 1080ti, but instead it is 0.6x. 50% lower performance is quite a sacrifice.

antoineMoPa · on Oct 13, 2018

But did you ever care about openness, vendor lock-in and portability? Most people in the world can't afford 600$ Nvidia GPUs.

twtw · on Oct 13, 2018

> Most people in the world can't afford 600$ Nvidia GPUs.

Ok? I'm not what your point is here - most people can't afford a $600 amd gpu either. If you can't get a $600 gpu, buy a cheaper gpu - gtx 1060 is $250 or gtx 960 is $50.

Performance costs a premium, and that is just how is is. It would be great if we lived in a world where you could buy a Titan v for $1, but in the real world valuable things cost more money, and that unfortunately means not everyone can buy them.

nl · on Oct 14, 2018

I care about openness a lot.

I also care about performance.

It's too bad AMD doesn't put in the effort NVidia does to make sure computation on their GPUs is as easy to do as it is on NVidias.

swerner · on Oct 13, 2018

I don't think performance is the achilles heel of OpenCL. There are other problems that I have when working with: The implementations ended up becoming a "write once, debug everywhere" ecosystem. Making the compiler a part of the driver meant that OpenCL kernels that worked on platform A wouldn't even compile on platform B. SPIR-V should have been part of OpenCL from version 1.0. Profiling OpenCL kernels is a stumbling block too - IMHO, you shouldn't ship a toolkit without a profiler and claim it's intended for HPC. IEE 754 compliance should also have been part of 1.0 and not an afterthought. Without it, it's harder to verify that your kernel works correctly, since it's permitted to deliver different results than a C reference implementation.

With CUDA, at least you're guaranteed* that your code that runs on your CUDA equipped machine, also runs on any other machine that supports CUDA - regardless of OS, regardless of GPU. With OpenCL, not so much. * Within the same restrictions as CPU code - instruction set, RAM and compiler bugs.

profquail · on Oct 13, 2018

It may be possible for the code in these libraries to be made portable using the ROCm HIP compiler: https://github.com/ROCm-Developer-Tools/HIP

pjmlp · on Oct 13, 2018

I never cared to play around OpenCL, because they were stuck with C only mentality, plus the stone age idea from GLSL that I should somehow read text files, compile and link them programmatically, and a pat on the back for debugging.

CUDA right from the start supported C, C++, Fortran, with a bytecode format for other compiler backends, and a nice debugging experience.

Obvious which one would gain the hearts of developers that have moved beyond C.

It took Khronos up to OpenCL 2.0 to fix this, by then it was too late.

Even on mobile devices, Google created their own C99 dialect (Renderscript) instead of adopting OpenCL.

twtw · on Oct 13, 2018

Same way code that runs on x86_64 is open, I suppose. neither the isa nor implementation is open, and yet the code is, right?

blihp · on Oct 13, 2018

It's an Open Markitechure? Only half kidding: Open <blah> sounds more inclusive and cool than CUDA <blah>