In my last project, I just big-banged a port to load up the configuration bits in a 4K iCE40, something like 131KBytes; this was just a .h file that was included in the bit-banger; the static array ended up in Flash (the ST MPU had 2 MB flash, so no problem), and it only took a second or so to load the FPGA bits before it was ready-to-go. So, from my perspective, what you describe is already here. If even that's too much trouble, there's always TinyFPGA BX https://tinyfpga.com/ You can use the open source yosys or you can use Synplify and the Lattice dev system, which is free w/free license.
Dropping in a midsize MCU with 256kB of Flash just to program a single FPGA is not viable in a margin-constrained commercial product. It works great if it's already there, of course, but the applications I'm thinking of have been the ones where it isn't.
Not to mention there are many FPGA applications where one purpose of the FPGA is to avoid having software in the path. If software is only responsible for configuration load, it's better, but still can be a problem.