Lift – Lisp Flavoured Tensor

rcarmo · on Sept 4, 2016

This is begging for using http://hylang.org instead of vanilla Python.

eschaton · on Sept 4, 2016

This isn't so much "Lisp" as "S-expressions via Python."

I was expecting it to actually be implemented in Lisp.

gnarbarian · on Sept 4, 2016

It's a step in the right direction. I can't wait to see more accessible languages like lisp built on top of vulkan and opencl.

Programming for GPGPU and heterogeneous computing is difficult, unwieldy, and feels primitive (Partly by design). With better languages more people will be able to write powerful abstractions that are also highly concurrent when the resources are available.

Pyopencl is also a step in the right direction. It allows you to descend into opencl for the heavy lifting while remaining in the python ecosystem for everything else.

Athas · on Sept 4, 2016

What about something like Futhar[0] (which I am working on)? It also has Python interop[1], although not quite as elegant as Lift.

[0]: http://futhark-lang.org

[1]: http://futhark-lang.org/blog/2016-04-15-futhark-and-pyopencl...

bhuztez · on Sept 5, 2016

Lift does not has python interop. the lift code run in the python shell is interpreted by naive python code. It runs very slow, and consume lots of memory than you might ever expected. As in the matmul example, dot product is calculated in two steps, there will be an 8x8x8 array when interpreted. When compiled, this huge array will be eliminated with the great help of isl ( http://isl.gforge.inria.fr/ ).

gnarbarian · on Sept 4, 2016

Can I write code in futhark which shares memory with objects being rendered in OpenGL? In other words, can it operate directly on objects being rendered?

I'd like to be able to write physics simulations that are visualized in 3d without having to copy the points in and out of futharks memory (killing the performance)

Athas · on Sept 4, 2016

Not yet. But you can already write code that shares memory with objects that you also access from hand-written OpenCL code, and since OpenCL can interop with OpenGL, what you are asking for should eventually be possible.

gnarbarian · on Sept 4, 2016

ahh yes. I have done something like that before with pyopencl but it was not pretty:

https://www.youtube.com/watch?v=lnOmy1ly6M0&list=PLCN-Ml6vUJ...

Athas · on Sept 4, 2016

I have exactly that example here: https://github.com/HIPERFIT/futhark-benchmarks/tree/master/a...

All simulation and rendering is done in Futhark, with Python+Pygame for gluing things together. All the particle information stays on the GPU at all times. The only thing being copied back to the CPU is the rendered bitmap, which is then immediately moved back with a Pygame blit operation...

bhuztez · on Sept 5, 2016

lift does not use pyopencl. It intends to generate C code that could be easily linked.

bhuztez · on Sept 5, 2016

It depends heavily on isl ( http://isl.gforge.inria.fr/ ). There is no Lisp bindings around.

mlhulk · on Sept 5, 2016

After doing some benchmark or source code reading you'll find out that the author was just lying about its performance, since he even lacks basic knowledge of GPGPU optimization techniques and made wrong use of isl to generate low quality but obfuscated OpenCL kernel code, which is hard to see through at first. Also he mistook "S-expression" for lisp, which is ridiculous.

Athas · on Sept 4, 2016

Can someone show how the generated code for matrix multiply looks?

bhuztez · on Sept 5, 2016

the OpenCL generation is mostly stolen from ppcg ( http://ppcg.gforge.inria.fr/ ) Unlike ppcg, right now, the local memory is not properly handled, you can see that there are some complicated expression here.

__kernel void kernel0( __global float v0[8][8], __global float v2[8][8]){ __local float local_v0[2][2][16]; float private_v2[2][2]; int b0 = get_group_id(0); int b1 = get_group_id(1); int t0 = get_local_id(0); int t1 = get_local_id(1);

for(int c2 = 0; (c2 <= 15); c2 = c2 + 1){ if(((((((((30 * t0) + (31 * t1)) + (16 * b0)) + (28 * c2)) + 31) % 32) >= 16) || (b1 == t0))){ local_v0[t0][t1][c2] = (v0[((((2 * t0) + t1) + (4 * c2)) / 8)][((((2 * t0) + t1) + (4 * c2)) % 8)]); } }

barrier(CLK_LOCAL_MEM_FENCE | CLK_GLOBAL_MEM_FENCE); for(int c0 = (2 * b0); (c0 <= 7); c0 = c0 + 4){ for(int c1 = (2 * b1); (c1 <= 7); c1 = c1 + 4){ private_v2[(((-2 * b0) + c0) / 4)][(((-2 * b1) + c1) / 4)] = 0.000000; for(int c2 = 0; (c2 <= 3); c2 = c2 + 1){ for(int c5 = (2 * c2); (c5 <= ((2 * c2) + 1)); c5 = c5 + 1){ private_v2[(((-2 * b0) + c0) / 4)][(((-2 * b1) + c1) / 4)] = ((private_v2[(((-2 * b0) + c0) / 4)][(((-2 * b1) + c1) / 4)]) + ((local_v0[(c2 % 2)][((-2 * c2) + c5)][(((2 * t0) + (2 * c0)) + (c2 / 2))]) * (local_v0[b1][t1][((((-2 * b1) + c1) / 4) + (2 * c5))]))); } } private_v2[(((-2 * b0) + c0) / 4)][(((-2 * b1) + c1) / 4)] = (private_v2[(((-2 * b0) + c0) / 4)][(((-2 * b1) + c1) / 4)]); } }

for(int c0 = 0; (c0 <= 1); c0 = c0 + 1){ for(int c1 = 0; (c1 <= 1); c1 = c1 + 1){ v2[(((2 * b0) + t0) + (4 * c0))][(((2 * b1) + t1) + (4 * c1))] = (private_v2[c0][c1]); } }

barrier(CLK_LOCAL_MEM_FENCE | CLK_GLOBAL_MEM_FENCE); }

gnarbarian · on Sept 5, 2016

Man I think it dropped one if these )

Chris2048 · on Sept 5, 2016

Title feels like it was Markov-chain generated :-)

TylerE · on Sept 4, 2016

Might want to consider a name change...

http://www.liftweb.net/ is a well established project dating back years.

andrewchambers · on Sept 4, 2016

The thing you need to understand, is that simpsons already did everything years ago, and thats ok.