On the CPU you probably get implicit parallel execution with pipelines and re-ordering etc, and on the GPU you can set up something similar.
On the CPU you probably get implicit parallel execution with pipelines and re-ordering etc, and on the GPU you can set up something similar.