Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

AFAICT, YQL can only handle scraping individual pages that way.

Upton can scrape a whole set of pages. If you have a page that lists the pages you're interested in; suppose you're interested in HN commenters on front page posts, you could specify the front page URL and a selector for links to comment pages, and Upton would automatically scrape those pages and return them to you.

Upton could even write the commenter names to a CSV for you with just a filename and a CSS selector/XPath expression.

It's not stuff you couldn't do with YQL or Python/BeautifulSoup. But it's stuff that I didn't want to have to write over and over each time I wrote a new scraper.



Makes sense! Thanks for clarifying that.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: