Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Yes, the parser needs to know about this information. Example:

    (A) * B
This is either A multiplied by B, or type A casting the dereferenced value of B: http://en.wikipedia.org/wiki/The_lexer_hack


Fair enough. Would the problem be fixed if the C standard said that x (without spaces) had to be dereferencing and that x y (with spaces) had to be multiplication?


Yeah, if you could purely lexically distinguish dereferencing from multiplication, either because they used a different symbol, or had different mandatory whitespace rules, then there'd be no ambiguity, at least in this example.

That's basically what C++ did by requiring the space in:

   Foo<Bar<Baz> > var;
to make it easy to lexically distinguish the '>>' right-shift operator from the '> >' sequence of two successive template-parameter closing symbols.

C++0x is changing that though, due to the unpopularity of making programmers accomodate what looks like a parser-implementation hack.


It was as if a million C programmers cried out and were suddenly silenced.

Significant whitespace is evil, at least in the context of C where it is not significant anywhere else.


Whitespace is significant in C. Consider the difference between "f o o" and "foo" :)


Idon'tknowwhatyou'retalkingabout! :)


You like your Commodore 64 BASIC (or Ye Olde Fortran)?


No, consider int x; sizeof(x); and typedef int x; sizeof(x);.


That's not a syntactic ambiguity. Typically (when you're not trying to compress multiple passes, be clever and fast etc.), identifiers are not resolved during parsing, so it doesn't matter whether x denotes a type or a variable when the sizeof operator is applied.


You can write "A * B" and depending on nature of A, that would be a declaration of B as pointer to type B, or multiplication of A and B. You could say that this multiplication would be 'void' (as in 'not assigned to anything') and I could come up with even more complicated example, like "A * B();" where this is either function declaration or multiplication of A and function named B() with side-effect let's say. But that's not the point: if parser has to do heuristics like that to parse language properly, it is already context-dependent or at least not LALR(0) or LALR(1).


I was talking about sizeof.


I think it does matter, although my example fails to make clear why. Type identifiers have different syntactic requirements to regular identifiers (the parens in sizeof(typename) are required, but the parens in sizeof(varname) are optional).


Yes, but now you're getting deeper into requirements for context-sensitive grammar; go far enough, and you might as well go all Van Wijngaarden. A context-free parse is free to create an AST like (sizeof (parens (ident "x"))) for one, and (sizeof (ident "x")) for the other, and disambiguate based on the symbol table lookup of "x" later.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: