I actually made a little thing for emacs that does something similar, actually tagging on the parsed parts of speech instead of just tokens. It requires you to select the text first instead of automatically highlighting, though. It uses coreNLP which was pretty lovely to work with.
https://github.com/cosmicexplorer/speech-tagger