Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

So "Transformers" are part of the attention-based systems, which are a approach for modeling input-output relationships that is an alternative to Recurrent Neural Networks. These are instead based on Convolutional Neural Networks.

The innovation here is that the transformer is compressed, allowing the system to deal with longer sequences.



You seem to be saying this work is based on convolutional neural networks. That's incorrect. It uses the same attention mechanisms from natural language processing which involve no convolution operations.

Convolutions have a different set of weights for each position offset (with a fixed window size), and reuse those weights across the entire input space.

Transformer-based networks like this work compute attention functions between the current position's encoding and every previous position, then use the outputs to compute a weighted sum of the encodings at those positions. Hence they can look at an arbitrarily large window and the number of parameters they have is independent of the size of that window.


Are Transformers based on convolutions?


"Convolutional neural networks", whose connection to other means of convolution is a bit tenuous.

https://en.wikipedia.org/wiki/Convolutional_neural_network




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: