Those are Unlabelled Accuracy score results, but Redshift is quietly computing the dependency labels, while parser.py does not. Running Redshift in unlabelled mode gives very fast parse times, but about 1% less accuracy. The labels are really good, both as features and as a way to divide the problem space.
The data sets are the _Stanford_ labels, where the main results in Zhang and Nivre refer to MALT labels. Z&N do provide a single Stanford accuracy in their results, of 93.5% UAS.
Sentences per second should be just over 100. I use k=8 with some extra features referring to words further down the stack, where Z&N use k=64. Right at the bottom of the post, you can find the commit SHA and the commands for the experiment.
Parser Accuracy Speed (w/s) Language LOC
Stanford 89.6% 19 Java > 50,000[1]
parser.py 89.8% 2,020 Python ~500
Redshift 93.6% 2,580 Cython ~4,000
Are these the labeled parsing results you are referring to? How many sents/sec? Using same PTB data sets as Zhang and Nivre '11?