There is another traditional FPGA use case where you need real time data capture or signal generation. That seems to be getting eaten from the bottom now that there are really high speed MCUs that are easier to program. It's less efficient, but easier to develop for.
The other problem with using an FPGA here is that microcontrollers are cheap and have great cheap dev boards. FPGAs, not so much. I've wanted to just "drop in" a small FPGA in several designs, the way you can drop in a microcontroller, but there's no available FPGA that's not a massive headache in that use case. Trust me, I've looked.
The iCE40 series is almost there but not quite. It's a bit pricey (this is sometimes okay, sometimes a dealbreaker) but
its care and feeding is too annoying. Who wants to source a separate configuration memory? Sometimes I don't have the space for that crap.
If any company can bring a small, cheap, low power FPGA to the market, preferably with onboard non-volatile configuration memory, a microcontroller-like peripheral mix (UART, I2C, SPI, etc.), easy configuration (re)loading, and with good tool and dev board support, they'll sell a lot of units. They don't even have to be fast!
The MiniZED is $89 and a ton of fun! It has an ARM processor (Xilinx Zynq XC7Z007S SoC), Arduino compatible daughterboard connectors, microcontroller-like peripheral mix, and runs linux.
The XC7Z007S is $46 in volume at distributors (though with no volume discounts; Xilinx pricing is weird).
Zynq chips are beautiful parts. But they are not "low-cost drop-in" anything. They are chips that you can architect an entire system around and replace a dozen other chips with. I know; I've done it. (But they didn't bite on our proposal, so my sketched architecture remained just a detailed sketch.)
In my last project, I just big-banged a port to load up the configuration bits in a 4K iCE40, something like 131KBytes; this was just a .h file that was included in the bit-banger; the static array ended up in Flash (the ST MPU had 2 MB flash, so no problem), and it only took a second or so to load the FPGA bits before it was ready-to-go. So, from my perspective, what you describe is already here. If even that's too much trouble, there's always TinyFPGA BX https://tinyfpga.com/ You can use the open source yosys or you can use Synplify and the Lattice dev system, which is free w/free license.
Dropping in a midsize MCU with 256kB of Flash just to program a single FPGA is not viable in a margin-constrained commercial product. It works great if it's already there, of course, but the applications I'm thinking of have been the ones where it isn't.
Not to mention there are many FPGA applications where one purpose of the FPGA is to avoid having software in the path. If software is only responsible for configuration load, it's better, but still can be a problem.
Crowd Supply has an endless variety of hobbyist-friendly variously FPGA / USB / MCU / PCIE / SDR combination boards.
It's ridiculous for anybody to insist that programming an FPGA isn't writing software. By definition, anything you can put in a text file that ends up controlling what some piece of hardware does is software. Probably almost all of what is wrong with FPGA ecosystems comes from failure to treat it like software.
It's not much like your typical C program, but that's a very parochial viewpoint. The languages available to program FPGAs in are abysmal, a poor match to the hardware: actually too much like ordinary programming languages, to their detriment. A person who makes an FPGA do something is going to be an engineer, and to an engineer any microprocessor and any FPGA are just two different state machines. Somebody who studied "computer science" will be disoriented, but that is just because the field has narrowed, as network effects pared down the field of computing substrates until practically nothing is left.
FPGAs emulating ASICs or von Neumann CPUs is the greatest waste of potential anywhere. If the architecture of (some) FPGAs could be elucidated, it could fuel a renaissance of programming formalisms. We could begin program them in a language actually well-suited to the task, and vary their configuration in real time according to the instantaneous task at hand.
FPGAs aren't state machines or processors. Not inherently, anyway, even if you can build those things out of them or if they sometimes are sold co-packaged.
What's less well documented, at least publicly, is the routing, but on some level that's less interesting since it's "just" how you get the electrons from point A to point B, not about choosing A or B. But even the routing is decently well described, though you have to look in some fairly obscure places (like the device floorplan viewer).
I'm not sure why you think FPGAs emulating ASICs is a "waste of potential". By definition, ASICs are strictly more capable and more powerful than FPGAs, so you're climbing up the potential ladder, not down!
Why? Because ASICs do one thing from the first time they are powered up until they are finally ground up into sand. But an FPGA could, if programmed right, do completely different things from one millisecond to the next. Their ability to do that is never exploited because our tooling is still much too primitive, and current devices' internal connectivity probably can't route signals to the places needed.
If you think an FPGA is not inherently and necessarily a state machine, no matter how it is programmed (provided power and clock are in specified bounds), that only means you don't know what a state machine is. All clocked digital devices are state machines, and can never be anything other than state machines.
(There is an argument to be made that an FPGA is, itself, an ASIC: an IC whose Specific Application is to be an FPGA. But such an argument would be transparent sophistry.)
There's also plenty of unclocked stuff in the FPGA... like the LUTs that do all the work. There's enough of this and it's important enough that I believe thinking of FPGAs as "just state machines" is dumb. But then I also believe that digital electronics are not "just digital circuits", but better thought of as "bistable analog circuits", so what do I know....
If the results of the LUTs don't end up clocked into a register, where do they go?
Of course everything is analog, and ultimately quantum-electrodynamic, but the languages FPGAs are programmed in don't provide access to those domains.
I think Cypress had a product line that combined a CPU and a small programmable array, just big enough to implement your own custom IO and protocols and maybe some minimal logic beyond that.
You're probably thinking of the Cypress PSoC, Programmable System on Chip.
Those things are fantastic for hobbyists and can be nice for low-volume production. But they're kind of crap for higher volume work:
* Expensive
* Physically fragile/easy to kill: personal experience suggests they are noticeably more fragile than their competition; ALWAYS add pull resistors and ESD diodes to their JTAG/SWD pins and use a real voltage supervisor, not the internal PoR/brownout, no matter what the datasheet says because it does not speak the truth
* Actually, just add external ESD diodes to anything even the least bit sketchy
* On-chip analog not good enough for serious applications or stupidly limited (just give me two of those please? no?)
* On-chip routing is very, very limiting
* Weak MCU cores
* Few large parts (high GPIO, fast core, ...); the 5LP is better but needs a refresh with bigger, better, cheaper flagships
* More digital blocks (UDBs). They use a crappy old macrocell architecture, which wouldn't be a problem except they only give you TWO of them!
I've actually whined about the last one to the Cypress FAE (great guy!) and he just started laughing. Turns out, he's repeatedly said that to their higher-ups and gotten shot down... only to have customers like me ask for it again, over and over....
Hopefully under Infineon the PSoC line will be better managed. It could be a huge powerhouse, but right now it just does not have a good enough lineup of sane models.
Yeah, not bad at all. A little annoying, but above average for the HW side of things.
But that's PSoC Creator, used for their PSoC 4 and 5 lines. (Avoid the 3 and older -- they're really old.) The newer 6 requires Modus Toolbox, which I think doesn't support the 4 or 5 lines (STUPID). I have no experience with that one. It's Eclipse based, so who knows.
In the hobbyist space, I also see a fair amount of CPLDs used when something like a GAL (https://en.m.wikipedia.org/wiki/Generic_array_logic) would be much cheaper and easier. Doesn't work for everything, but they can be handy.
I good example of this is XMOS. Their chips are divided into "tiles" which can simultaneously run code, together with multiple interfaces such as USB, i2s, i2c, and GPIO. Latency is very deterministic because the tiles are not using caches, interrupts, shared buses etc.
Their development environment is Eclipse based with numerous libraries such as audio processing, interface management, DFU etc. They use a variant of C (xc) that lets you send data between channels/tiles, and easily parallelize processing.
An example use is in voice assistants where multiple microphones need to be analyzed simultaneously, echo and background noise has to be eliminated, and the speaker isolated into a single audio stream. I've used it for an audio processing product that needed match hardware timers exactly, provide USB access, matched input and output etc.