Yes, the 8086 microcode does get some parallelism, but it's a pretty short Very Long Instruction Word :-) An 8086 micro-instruction is just 21 bits long, while VLIW words are much longer. I'm studying the IBM System/360 Model 50 and its micro-instructions are 90 bits long. A Model 50 micro-instruction is so complex with 28 different fields that it isn't represented by a line of code, but an 11-line block of code.
The 8086 is just starting to pipeline so it is starting small.
Recently I did some reading about the Transport Triggered Architecture and for the first time thought "I could make a special purpose microprocessor" and might actually try it with an FPGA. That got me thinking a lot about processor architecture, particularly the ability for one (real or micro) instruction to simultaneously control multiple subsystems.
What I see as problematic though is that the TTA stalls when it is waiting for memory, I think a high performance system based on the TTA would need some kind of programmable memory access engine that would try to schedule fetches ahead of time programmatically... But I think then it is getting pretty hard.