Well consider the following c code. Demonstrates how to allocate memory on gpu, transfer data in and out, how to operate on the memory, and how to then free it. I could see myself copy pasting that to several places, but modified to perform different functions.
It can be quite a bit more than just two lines, particularly considering error handling. But when you start refactoring common patterns, you can often end up with higher level abstractions. In your example, Thrust is exactly that: https://developer.nvidia.com/Thrust
I'll proceed as if that was implemented on top of cuda, I have no idea if that is actually the case, but for the sake of argument.
That code does not initialize the library, instead the handle is stored as a global variable that is lazily initialized when using thrust methods. This makes it easier to use but has some drawbacks too. It can not use more than one card at once. It means every call will check if library is initialized safer but slower. Resources are not freed as soon as possible. A good trade off in the majority of cases.
A device vector is created and assigned a value in one call. Size is automatically tracked. Nice.
Errors are not handled. All code looks cleaner if you don't handle errors.
http://docs.nvidia.com/cuda/cublas/#example-code
edit: I guess if you write C++, you could protect the resources with RAII, and shave two lines of copy-pasted de-initialization.