There's also the 512-bit matrix units which would be more like 3840 FLOP/cycle (... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		gok on Aug 17, 2020 \| parent \| context \| favorite \| on: Next-Generation IBM POWER10 Processor There's also the 512-bit matrix units which would be more like 3840 FLOP/cycle (single precision) for workloads that can be expressed that way

dragontamer on Aug 17, 2020 [–]

I'm not sure if that plays to the advantage of POWER10.

NVidia 2070 Super also has FP16 matrix-multiplication units, achieving 57 FP16-matrix TFLOPs. These are the NVidia "Tensor Cores".

Ampere (rumored to be released within a few weeks...) even has sparse matrix-multiplication (!!) units being added in.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact