This is really cool, but something doesn't smell right.
1) They give numbers from a single-core CPU implementation, and from a many-core GPU implementation, and they discuss numbers from a many-core CPU implementation as well, like some of the new four- or eight-core chips; without many-core CPU numbers, it's comparing apples to oranges.
2) What indices are on the table? Does the GPU have any indices? A read-only database is certainly going to have indices, and (hopefully) covering indices for all the big queries.
3) SQLite has strongly-typed values not columns (you can stick a string in an integer column and that's just fine) and it seems like they disregarded much of the SQLite type system, only focusing on integer values. I'd like to know how much time is spent on type checking values.
The queries are extremely simple (in appendix after references) and, in the absence of indexes, require independent evaluation of all of the rows in the table. This maps extremely well to the CUDA architecture which has hundreds of memory-coupled independent processing elements.
But this isn't anywhere near the typical usage scenario of SQL databases. If it was, scaling databases would be trivial by sharding tables across many servers. In real life, you have indexes and joins, which translate into random access patterns, which do not perform very well on CUDA.
Heh. When I first saw the document I thought there is no way I am going to read all that now. Surprisingly, the sections that I did end up reading were quite easy to digest (and it's 5am here). I recommend bookmarking it and reading it later if you can't do it now. This is genuinely interesting.
The gem which you might be looking for is:
The queries executed on the GPU were an average of 35X faster than those executed through the serial SQLite virtual machine.
(my apologies to authors for reducing the entire document to that)
The contribution of this paper is to implement and demon-
strate a SQL interface for GPU data processing. This in-
terface enables a subset of SQL SELECT queries on data
that has been explicitly transferred in row-column form to
GPU memory.
This is fascinating stuff, but having to keep the entire DB in GPU memory poses some obviously non-trivial problems in terms of practical applications.
I was going to give you a snarky answer, because one of the preconditions for the paper is that it's CPU vs GPU operating on an in-memory database (so there's no disk access by definition), but in thinking about it perhaps it's not even CPU bound but cache-miss bound, and you could trade the latter for the former by compressing the data into a trie or something smarter than just a B+tree.
1) They give numbers from a single-core CPU implementation, and from a many-core GPU implementation, and they discuss numbers from a many-core CPU implementation as well, like some of the new four- or eight-core chips; without many-core CPU numbers, it's comparing apples to oranges.
2) What indices are on the table? Does the GPU have any indices? A read-only database is certainly going to have indices, and (hopefully) covering indices for all the big queries.
3) SQLite has strongly-typed values not columns (you can stick a string in an integer column and that's just fine) and it seems like they disregarded much of the SQLite type system, only focusing on integer values. I'd like to know how much time is spent on type checking values.