Thanks for the explanation. The only SIMD programming I've seen is where the pro...

dragontamer · on Oct 20, 2020

> Sounds like what you are saying that fork join model translates easely by the compiler to these SIMD instructions?

Why do you think CUDA has become so popular recently? That's exactly what CUDA, OpenCL, and ISPC does.

> Some compilers can also vectorize plain loops, but you would advocate for fork join?

CUDA style / OpenCL style fork-join is clearly easier than reading compiler output, trying to debug why your loop failed to vectorize. That's the thing about auto-vectorizers, you end up having to grok through tons of compiler output, or check out the assembly, to make sure it works.

ALL fork-join style CUDA / OpenCL code automagically compiles into SIMD instructions. Ditto with ISPC. Heck, GPU programmers have been doing this since DirectX 7 HLSL / OpenGL decades ago.

There's no "failed to vectorize". There's no looking up SIMD-instructions or registers or intrinsics. (Well... GPU-assembly is allowed but not necessary). It just works.

-------

If you've never tried it, really try one of those languages. CUDA is for NVidia GPUs. OpenCL for AMD. ISPC for Intel CPUs (instead of SIMD intrinsics, ISPC was developed for an OpenCL-like fork-join SIMD programming environment).

And of course, Julia and Python have some CUDA plugins.

peheje · on Oct 20, 2020

Must admit never tried it. Thanks for the insights I'll have a go at some point.

dragontamer · on Oct 20, 2020

If you got an OpenMP 4.5 or later compiler (and GCC / CLang both support OpenMP), you can also use #pragma omp simd.

https://www.openmp.org/spec-html/5.0/openmpsu42.html

Its not as reliable as a dedicated language like OpenCL or ISPC. But this might be easier for you to play with rather than learning another language.

OpenMP is just #pragmas on top of your standard C, C++, or Fortran code. So any C / C++ / Fortran compiler can give this sort of thing a whirl rather easily.

---------

OpenMP always was a fork-join model #pragma add on to C / C++. They eventually realized that their fork-join model works for SIMD, and finally added SIMD explicitly to their specification.

leephillips · on Oct 20, 2020

And Fortran co-arrays, no?

TheRealKing · on Oct 21, 2020

Fortran Coarray is far beyond simple Fork-Join. It enables one-sided remote memory access, something that is impossible with OpenMP or CUDA, as far as I am aware, and requires the highest levels of skill to do it right in MPI.