It’s funny how there is continuous reinvention of parsing approaches.
Why isn’t there already some parser generator with vector instructions, pgo, low stack usage. Just endless rewrites of recursive descent with caching optimizations sprinkled when needed.
Hardware also changes across time, so while something that was initially fast, people with new hardware tries it, finds it now so fast for them, then create their own "fast X". Fast forward 10 more years, someone with new hardware finds that, "huh why isn't it using extension Y" and now we have three libraries all called "Fast X".
Because you have to learn how to use any given parser generator, naive code is easy to write, and there are tons of applications for parsing that aren't really performance critical.
Steps 1 and 3 are heavily dependent on the data types that make the most sense for the previous (lexing) and next (semantic analysis) phases of the compiler. There is no one Token type that works for every language, nor one AST type.
The recognizing the grammar part is relatively easy, but since so much of the code is consuming and producing datatypes that are unique to a given implementation, it's hard to have very high performance reusable libraries.
Why isn’t there already some parser generator with vector instructions, pgo, low stack usage. Just endless rewrites of recursive descent with caching optimizations sprinkled when needed.