Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

An advice to those who like OP first encounter these topics: try to gain the historical perspective, you'll understand everything much better. Find the sources of the original C compilers, marvel that they are probably smaller than the Yacc and Lex (at least that's how I remember them) and then understand that both Yacc and Lex were never needed for these C compilers. You learn about Yacc and Lex in the school more because of their "educational value" than because they are easiest tools to make a C compiler or parser.

As the original authors wrote the compiler the result of not having the context free grammar was just having a few lines of the code more. Their goal was certainly not an academic "parser" purity. Parsing is one of quite uninteresting parts when you're making UNIX and C some 40 yeas ago.



Amateurs use parser-generators.

Professionals write recursive-descent parsers.

Real men code finite state machines by hand: http://galileo.phys.virginia.edu/classes/551.jvn.fall01/fsm....

But seriously, the mystery and ignorance that continues to surround Yacc to this day needs to be fought. If you're using Yacc, you're doing it wrong. As far as parser-generators go, it is totally obsolete, and even when it wasn't, it was the wrong choice for almost all parsing problems. I learned this one day when someone recoded a Yacc-based parser I wrote in recursive-descent style, and it was less lines of code than my specification.

If you need to write a really simple, or a really complicated, parser, recursive-descent is usually the way to go. You might try Henry Baker's META (http://home.pipeline.com/~hbaker1/Prag-Parse.html) as a first alternative. If you really, really feel the need for a parser-generator, take a look at parsing expressions grammars/packrat parsers: http://pdos.csail.mit.edu/~baford/packrat/

But the real lesson here is to design your syntax to be easy to parse and manipulate in the first place. C's grammar is the clearest testament to how little thought Dennis Ritchie put into the design of C.


I have an interest in making parsers, but all mine so far are either hacks or made with ANTLR. Does this fall under "doing it wrong"? (I presume the answer is yes) How would you recommend I go about learning how to do it right?


ANTLR is one of the parser generators that makes Yacc obsolete. So you're doing something right. (I personally think ANTLR is too complicated.)

I've tried working with other parser generators, but I always keep coming back to hand-coded recursive descent. The biggest advantage IMO is that they're both easier to debug, and can be made to report errors in input much better than parser-generator generated ones.


Just don't "make parsers." Make something that does something, better than some other software. Parser is just a piece of that what you make. Once you have the goal, to make the parser piece, use whatever you like, but produce the results you wanted. That's how you really learn something.


Thanks for the condescending advice... one of the last parsers I wrote was for a website with 125,000 active users, they are far from the be-all and end-all for me.


I agree completely about Lex & Yacc, and this is the conclusion I came to at the end of the article. Yacc (leaving Lex out for a moment, since it's a different story) is being taught for its interesting educational value, but once in the wild, you just find that most real-world compilers don't use it.

Regarding historical perspective, unfortunately it isn't so easy to gain. I actually consulted a lot of comp.compiler discussions from the early 1990s, but most links in them point to various non-existent FTP sites :-/


Do you a favor and search the net for the oldest preserved sources of C compilers and UNIX. Then take a look at them, especially how concise they are. I'd enjoy reading your post about that experience. :)


Since you appear to be knowledgeable in these matters, perhaps you could point out to such a source :)

FWIW I did mention 'tcc' in the article, the "tiny c compiler" with rather compact and clean source code, probably smaller than the code for Bison


The full Unix V source is here as tar:

http://unixarchive.tliquest.net/PDP-11/Distributions/researc...

In 500 K gzipped tar you have at least the whole unix, the whole stdlib and the whole C compiler. For the book that explains the sources see:

Book: Lion's Commentary on the Sixth Ed Unix

http://news.ycombinator.com/item?id=2506176


Primeval C: two very early compilers

http://news.ycombinator.com/item?id=2506032


Great, thanks!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: