Isn't it known for quite a few years now that these CNN networks (ResNet, VGG et...

hedgehog · on May 23, 2023

It looks like they're formalizing the behavior of pure 16-bit training which is different from the mixed precision pipelines I'm aware of.

tysam_and · on May 23, 2023

hashtag-til · on May 23, 2023

Yes, it is well known in the industry that both FP16 as well as int16 have advantages. I don't see anything really new in the paper either.

It's like a lot of arXiv papers these days, as they only serve as an "Instagram for researchers".