So "Transformers" are part of the attention-based systems, which are a approach ...

AdamDKing · on April 23, 2019

You seem to be saying this work is based on convolutional neural networks. That's incorrect. It uses the same attention mechanisms from natural language processing which involve no convolution operations.

Convolutions have a different set of weights for each position offset (with a fixed window size), and reuse those weights across the entire input space.

Transformer-based networks like this work compute attention functions between the current position's encoding and every previous position, then use the outputs to compute a weighted sum of the encodings at those positions. Hence they can look at an arbitrarily large window and the number of parameters they have is independent of the size of that window.

skdotdan · on April 23, 2019

Are Transformers based on convolutions?

joe_the_user · on April 23, 2019

"Convolutional neural networks", whose connection to other means of convolution is a bit tenuous.

https://en.wikipedia.org/wiki/Convolutional_neural_network