Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

There's also the 512-bit matrix units which would be more like 3840 FLOP/cycle (single precision) for workloads that can be expressed that way


I'm not sure if that plays to the advantage of POWER10.

NVidia 2070 Super also has FP16 matrix-multiplication units, achieving 57 FP16-matrix TFLOPs. These are the NVidia "Tensor Cores".

Ampere (rumored to be released within a few weeks...) even has sparse matrix-multiplication (!!) units being added in.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: