Are there any numbers as to what difference this will make to the compiled size and speed of the kernel? I imagine the difference in size will be tiny.
Did they profile before optimising[1] and identify it as a problem?
There is a Linus quote in the article: "It generates much more code, and much _slower_ code (and more fragile code),". "generates much more code" means a size difference.
If he had just said what you quoted he would be wrong in the general case.
He actually said it is all those things in comparison to doing fixed allocation. The reason that is relevant is the kernel stack is so small you can only use a VLA when you know in advance the upper bound on the size is small, but if it's small you may as well use the fixed upper bound, and if you do that it is always faster, smaller and less fragile than using a VLA.
He's wrong in the general case, because user land C programmers will replace a VLA with:
if (!(array = malloc(n * sizeof(array[0]))) fatal("I'm out of memory");
which compared to the VLA generates more code and is slower. In both the VLA and malloc() case if you run out of memory the program will die nice and deterministically (unlike the kernel), but in the malloc() case that will only happen if you remember to do the check whereas in the VLA case the compiler ensures it always happens.
Removing VLAs in structs (define a struct type in a function, use variables from the function in an array field's bounds) was a requirement, but "normal" VLAs are supported by clang just fine.
Did they profile before optimising[1] and identify it as a problem?
[1] http://wiki.c2.com/?ProfileBeforeOptimizing