They do mention that the network is partially learning the words themselves:
> "On the other hand, the network is not merely classifying sentences, since performance is improved by augmenting the training set even with sentences not contained in the testing set (Fig. 3a,b). This result is critical: it implies that the network has learned to identify words, not just sentences, from ECoG data, and therefore that generalization to decoding of novel sentences is possible."
More than enough to control window focus on my computer and such. I'd be happy to have a system that responded to a few hundred thoughts: "Left desktop", "Right desktop", "Last focused window", "Lock screen", "What time is it?"
The translation is restricted to a vocabulary of 30 to 50 unique sentences.