I certainly can't speak to your specific uses or issues, but I mean we've really...

bufo · on June 6, 2023

Yes, you are correct in that the ANE does have the equivalent of tensor cores and that I didn’t mention that. I just don’t expect it to be usable beyond inference because the number of compute units will not work for batches in medium/large/huge networks. That’s obviously by design! The ANE silicon size is tiny compared to the GPU area. I wouldn’t be actually surprised if Apple strategically only invests in using their GPU for LLM (1B+ params) work.

Note that if you are currently using CoreML for LLMs all the work is done in the GPU.