Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> One way to deals with this is to only recognise in one rule all identifiers including keywords (something like "[a-z_][a-zA-Z0-9_]+" and to use a hash table of keywords to check whether a match is a keyword or an identifier (and which one).

A lot of production parsers I've seen do this with two state machines. They'll have a top-level state machine (usually hand-implemented using a giant switch statement) that handles all of the main different token types—numbers, strings, punctuators, etc. That has a single type for all identifiers, including keywords.

Then, after it determines a token is a identifier, it runs another little tiny optimized state machine to see if it's a reserved word or not.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: