It's a step in the right direction. I can't wait to see more accessible languages like lisp built on top of vulkan and opencl.
Programming for GPGPU and heterogeneous computing is difficult, unwieldy, and feels primitive (Partly by design). With better languages more people will be able to write powerful abstractions that are also highly concurrent when the resources are available.
Pyopencl is also a step in the right direction. It allows you to descend into opencl for the heavy lifting while remaining in the python ecosystem for everything else.
Lift does not has python interop. the lift code run in the python shell is interpreted by naive python code. It runs very slow, and consume lots of memory than you might ever expected. As in the matmul example, dot product is calculated in two steps, there will be an 8x8x8 array when interpreted. When compiled, this huge array will be eliminated with the great help of isl ( http://isl.gforge.inria.fr/ ).
Can I write code in futhark which shares memory with objects being rendered in OpenGL? In other words, can it operate directly on objects being rendered?
I'd like to be able to write physics simulations that are visualized in 3d without having to copy the points in and out of futharks memory (killing the performance)
Not yet. But you can already write code that shares memory with objects that you also access from hand-written OpenCL code, and since OpenCL can interop with OpenGL, what you are asking for should eventually be possible.
All simulation and rendering is done in Futhark, with Python+Pygame for gluing things together. All the particle information stays on the GPU at all times. The only thing being copied back to the CPU is the rendered bitmap, which is then immediately moved back with a Pygame blit operation...
After doing some benchmark or source code reading you'll find out that the author was just lying about its performance, since he even lacks basic knowledge of GPGPU optimization techniques and made wrong use of isl to generate low quality but obfuscated OpenCL kernel code, which is hard to see through at first. Also he mistook "S-expression" for lisp, which is ridiculous.
the OpenCL generation is mostly stolen from ppcg ( http://ppcg.gforge.inria.fr/ ) Unlike ppcg, right now, the local memory is not properly handled, you can see that there are some complicated expression here.
__kernel
void
kernel0(
__global float v0[8][8],
__global float v2[8][8]){
__local float local_v0[2][2][16];
float private_v2[2][2];
int b0 = get_group_id(0);
int b1 = get_group_id(1);
int t0 = get_local_id(0);
int t1 = get_local_id(1);