Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Fun fact; AMDGPU driver can't run when built with LLVM right now because they:

1. build their driver with a stack alignment of 16B (standard for x86_64 userspace).

2. use floating point in the kernel, which the kernel has limited support for.

3. The rest of the x86_64 kernel uses 8B stack alignment.

Code from 3 calls into 1. And then they have sse2 instructions in their driver (double precision arithmetic). Guess what happens?



This has been patched and will be fixed in 5.4 if I recall correctly, right?


Heh, no, sorry, my fault. I wrote the patch you're referring to; I got it to link. When AMD went to test it on hardware (which I don't have), they hit alignment related general protection faults that I traced back to a movaps instruction. I'm still working with CrOS and AMD folks to figure out what to do, but there's a few questionable things that are going on in their driver that have a few kernel devs confused. For now, people are just happy to see AMD participate more in upstream kernel dev, but definitely some parts of their driver will need to be rewritten I fear at this point.


Fun fact - they do work fine... under FreeBSD :-)


Why does it work with gcc?


GCC just so happens to not select any instructions with alignment requirements at `-msse -mno-sse2`, while clang emits calls to soft-fp functions that aren't implemented in the kernel. At -msse2, clang emits sse2 instructions with alignment requirements. You'd think their 16B-aligned-stack driver would be ok, but its caller (the rest of the kernel) has a different ABI (8B-aligned-stack) which is kind of insane to mix TUs with two different ABIs in the same executable and make calls between them.

Then again, all x86_64 processors support sse2, so I'm not surprised that Clang at -mno-sse2 doesn't work better. There's numerous bugs filed against Clang where it crashes when -mno-sse2 is set, but double precision floating point is used.

(I've spent almost all of this week debugging this, and working with CrOS and AMD engineers on this. It's not clear to me at this point whether the 16B-stack-alignment ABI change was well intentioned or a stroke of luck).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: