If I understood correctly, in practice they show actual speed improvement on hig...

AlexCoventry · 2025-03-15T01:47:33 1742003253

The SEDD architecture [1] probably allows for parallel sampling of all tokens in a block at once, which may be faster but not necessarily less computationally demanding in terms of runtime times computational resources used.

[1] Which Inception Labs's new models may be based on; one of the cofounders is a co-author. See equations 18-20 in https://arxiv.org/abs/2310.16834