The basic metric for this kind of comparison is how long can you get your 'compu...

skermes · on Aug 6, 2009

The big problem is that you can work with some fairly large amounts of data and still have the bus be your bottleneck. I was doing some GPU work a few summers ago that focused on matrix multiplications. We were sending matrices with 8k numbers on a side to the GPU for multiplication and still ending up with the bus being the slowest part of the computation.

Retric · on Aug 6, 2009

How much RAM was on the card? 64bit numbers * 8000^2 = 512 MB. Granted today you can have 4GB per card, but back then you where probably stuck with a fraction of that.

Still, PCIe 2.0 x16 is limited to 8GByte/s so I guess the real question is how many matrixes where you multiplying?