Re. The “hybrid algorithms” bit:
I was at this talk. The example she gave was a physics sim like CFD, iterating between a fast/approximate ML-based algorithm and slow/accurate classical physics algorithm, with the output of each feeding in as the starting point of the next round. But this was just an example, clearly there are lots of area you could apply a similar approach.
This has been something I've been incredibly pleased with on the apple silicon SOC's. Albeit slowly, being able to load large datasets or blender scenes on a portable, efficient laptop and still being able to use the GPU is a nice touch.
Of course performance wise it doesn't touch the $1k+ graphics cards with crazy amounts of ram, but for students and if I need to do something quick on the go, its a really useful tool.
I expect Xilinx's AI engines to never be integrated into anything AMD. Because Xilinx AI engines are VLIW - SIMD machines running their own instruction set.
----------
AMD is doing the right thing with Xilinx tech: they're integrating it into ROCm, so that Xilinx AI engines / FPGAs can interact with CPUs and GPUs. But there's no reason why these "internal core bits" should be shared between CPU, GPU, and FPGA.
Afaik, having worked in HPC, one area where this can be employed is error bias correction of CFD models. E.g. weather models have various biases that need to be corrected for - so far this is just done with some relatively simple statistics afaik.