Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It’s funny how there is continuous reinvention of parsing approaches.

Why isn’t there already some parser generator with vector instructions, pgo, low stack usage. Just endless rewrites of recursive descent with caching optimizations sprinkled when needed.



Hardware also changes across time, so while something that was initially fast, people with new hardware tries it, finds it now so fast for them, then create their own "fast X". Fast forward 10 more years, someone with new hardware finds that, "huh why isn't it using extension Y" and now we have three libraries all called "Fast X".


Because you have to learn how to use any given parser generator, naive code is easy to write, and there are tons of applications for parsing that aren't really performance critical.


I'd say because parsing is very specific kind of work heavily dependent on the grammar you're dealing with


A parser spends time:

1. Consuming tokens.

2. Recognizing the grammar.

3. Producing AST nodes.

Steps 1 and 3 are heavily dependent on the data types that make the most sense for the previous (lexing) and next (semantic analysis) phases of the compiler. There is no one Token type that works for every language, nor one AST type.

The recognizing the grammar part is relatively easy, but since so much of the code is consuming and producing datatypes that are unique to a given implementation, it's hard to have very high performance reusable libraries.


There are good parser generators, but potentially not as Rust libraries.



Meanwhile C++ has more than a hundred, with a focus on production-ready rather than innovative design patterns.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: