One of the most often neglected areas of study is regular expressions, they are looked at as a kind of black art by many developers, even senior developers. The reality is though regular expressions can save volumes of code when applied to certain pattern matching problems. I cannot stress enough how powerful they are. If you master regular expressions you will be ahead of 90% of developers in your ability to process text for patterns, which comes up as a common need in the web many times. The mastering series has a book on regular expressions and it has high ratings, but I will stop short of recommending it because I personally have not read it. Someone else may chime in on whether or not the ratings are valid for it.
Yes, also a basic understanding of how regex works under the hood is really helpful in cementing the concept (and also understanding things like why you can't properly parse HTML with them alone).
So I'm talking about stuff like NFAs (nondeterministic finite state automata) DFAs (deterministic finite state automata).These concepts are nowhere near as complicated as their names make them sound they are in fact generally quite intuitive, a reasonable grounding in basic set theory will help you here.
It will also give you a much better understanding of how programming languages are interpreted 'under the hood', in fact one of the first things any compiler/interpreter does to your code is essentially run it through a fancy regex engine.
This leads you on to making realizations not only about the performance implications of pattern matching in text, but they are also keys to solving a variety of problems in a simple way.
In reference to a specific text, I learned the concept from the early chapters of 'the dragon book' but I'm sure gentler introductions are available.
He did mention the dragon book which is actually titled Compilers: Principles, Techniques, and Tools but is better known by the dragon title. It is a seminal work and still regarded today even though it is older. But it is by no means a light read, it will definitely give you a deep understanding of computing, but it's a little heavy if you just want to develop web apps.
Well I think RE's are horribly over used. I think this comes from an unjustified fear about parsers or complete ignorance that they even exist by most programmers. For example lets write a RE to recognize and extract a phone number, here is your spec: http://en.wikipedia.org/wiki/Local_conventions_for_writing_t...
Have you screamed yet, for example in the US the exact same phone number can be written like so, some of the variants I have seen:
(516)-123-4567
1(516)-123-4567
1-516-123-4567
123-4567 old fashioned but valid
1.516.123.4567
now with a parser I can use RE for the simple bits that turn a text stream into bits of data and then hand it off to the to the parser to figure out if it is a good phone number. So the lexer, where the simple RE live, can turn a stream of text into a stream of tokens that the parser can reason about, for my US example above here is the list:
1: a string of digits called NUMBER and what they are
2: a ( called LP
3: a ) called RP
4: a - called DASH
5: a . called DOT
Then I can parse the 5 things above to figure out is it a valid phone number in nice readable and maintainable code.
Doing the above also makes you much more immune to changes in the RE libs behavior, I have gotten bit by greediness changes in perl before on bug fix releases because I did not follow my above advice. Another benefit is that in 6 months is that you can figure out what you did and so can the next guy
You raise a good point, I tend to like them for parsing but they are not silver bullet for sure. They do have their weaknesses, that being said, when I use a regular expression, I always document it for myself and other developers, even if it is redundant information I document it, due to the fact that they are not readable and I tend to not like magic code.
Don't get me wrong I like REs, its just I like them like salt in my food. It is very easy to ruin the food by putting in too much salt, but put in the right amount and delicious.