In natural language processing tasks you see a lot of non-CNN architectures. The...

In natural language processing tasks you see a lot of non-CNN architectures. These usually are designed to be able to deal with sequential data, so some kind of "memory" is needed.

Sometimes you see this combined with a CNN. There has been a few question answering systems that have one or more CNN layers. In don't entirely understand these designs, but presumably the convultional layers are an attempt to understand the different orders of words.

There are lots of techniques that people use to try to make deep networks work well. Mostly theses are about making errors backprog better. One of the most successful recent innovations is the ResNet architectures (https://arxiv.org/abs/1512.03385), and the related highway networks.