Well consider the following c code. Demonstrates how to allocate memory on gpu, ...

scott_s · on May 18, 2015

It can be quite a bit more than just two lines, particularly considering error handling. But when you start refactoring common patterns, you can often end up with higher level abstractions. In your example, Thrust is exactly that: https://developer.nvidia.com/Thrust

im3w1l · on May 19, 2015

I'll proceed as if that was implemented on top of cuda, I have no idea if that is actually the case, but for the sake of argument.

That code does not initialize the library, instead the handle is stored as a global variable that is lazily initialized when using thrust methods. This makes it easier to use but has some drawbacks too. It can not use more than one card at once. It means every call will check if library is initialized safer but slower. Resources are not freed as soon as possible. A good trade off in the majority of cases.

A device vector is created and assigned a value in one call. Size is automatically tracked. Nice.

Errors are not handled. All code looks cleaner if you don't handle errors.