[From all the bad things I hear about PHP, the code is very readble without any ...

[From all the bad things I hear about PHP, the code is very readble without any previous experience - nice].

Here are some things a lexer for a programming language might have to deal with:

1. Comments (some even do nested - which means regular expressions are out for that).

2. Continuation lines.

3. Includes (if done at the lexical level).

4. Filename/line/column number for nice error messages (can really hurt with branch mispredictions).

5. Evaluation of literals: decimal/hex/octal/binary integers, floats, strings (with escapes), etc.

6. Identifiers.

So matching keywords is mostly the straightforward part. However I have found that matching many keywords is the perfect (and in my experience so far, the only) use case for a perfect hashing tool like gperf - it would normally be much faster than any pointer-chasing trie. gperf mostly elminated keyword matching from the profile of any lexer I've done.