Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

First things first: hats off on a concise, accurate and efficient implementation in python.

But the Cython parser is not doing labeled parsing.. a significant number of feature templates are relevant to labeled parsing.

To reach Nivre and Zhang's 2011 results I think you'll find labeled parsing will degrade performance significantly and you'll need more complex code

1. you have more than an order of magnitude more transitions (80 vs 4)

2. you need to add beam search, with early update for training. beam size = 64 to be consistent with zpar

3. an order of magnitude more feature templates, with more complex features

So now you'll be evaluating 20x more transitions, on ~64x more candidates, with ~10x more feature templates. You can't multithread properly in python, you can maybe get some more juice by using some form of SIMD (w/ or w/o GPU). Your model is much larger in memory, so any luck you had in L2/L3 caching is probably gone. Python has limits and this is the point you start hitting them.

Edit: I see you've updated your comment, I guess we're on the same page. It's nice to see a concise implementation. I wrote one in golang for my masters thesis. I achieved perfect parity with zpar, but it took two order of magnitude more LOC.



Sorry for the lack of clarity. The Cython parser results that I quoted uses labelled parsing with beam search. There are various versions in the repo, but there's definitely a configuration that runs the Zhang and Nivre (2011) experiment verbatim, with exactly the same feature templates, and the same results that they get. Replicating those results was essential to implementing the parser correctly.


Apologies, could you add yet more clarity? According to the blog post your results are:

Parser Accuracy Speed (w/s) Language LOC

Stanford 89.6% 19 Java > 50,000[1]

parser.py 89.8% 2,020 Python ~500

Redshift 93.6% 2,580 Cython ~4,000

Are these the labeled parsing results you are referring to? How many sents/sec? Using same PTB data sets as Zhang and Nivre '11?


Those are Unlabelled Accuracy score results, but Redshift is quietly computing the dependency labels, while parser.py does not. Running Redshift in unlabelled mode gives very fast parse times, but about 1% less accuracy. The labels are really good, both as features and as a way to divide the problem space.

The data sets are the _Stanford_ labels, where the main results in Zhang and Nivre refer to MALT labels. Z&N do provide a single Stanford accuracy in their results, of 93.5% UAS.

Sentences per second should be just over 100. I use k=8 with some extra features referring to words further down the stack, where Z&N use k=64. Right at the bottom of the post, you can find the commit SHA and the commands for the experiment.


Hi, how does this parser compare to clearnlp? it's supposed to be also super fast and very accurate to?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: