Wow, incredibly cool. I did the same thing with collection of curl / grep / sed / awk, and it was awful. I later redid it with some python library, and then with hpricot, and most recently with scrubyt. Each step was a little bit better, but I really should have been looking towards making a more generalized solution like this.
Well done!