Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It’s a fair and totally reasonable question but clashes with reality. Many hosts have data that others want/like to scrape (eBay, Amazon, Google, airlines, etc.) and they setup anti-scraping mechanisms to try and prevent scraping. Whether or not to respect those desires is a bigger question but not one for the scraping library - it’s one for those doing the scraping and their lawyers.

The fact is - many many people want to scrape these sites and there is massive demand for tools to help them do that, so if APIFY/Crawlee decide to take the moral ground and not offer a way around bot detection, someone else will.



Ah yes, the old 'if I don't build the bombs for them, someone else will'. I don't think this is taking the moral high ground, this is saying we don't care whether it's moral, there's demand and we'll build it.


There are many legitimate and legal use cases where one might want to circumvent blocking of bots. We believe that everyone has the moral right to access and fairly use non-personal publicly available data on the web the way they want, not just the way the publishers want them to. This is the core founding principle of the open web, which allowed the web to become what it is today.

BTW we continuously update this exhaustive post covering all legal aspects of web scraping: https://blog.apify.com/is-web-scraping-legal/



It’s an “old” law that did not consider many intricacies of internet and the platforms that exist on it and it’s mostly made obsolete by EU case law, which has shrunk the definition of a protected database under this law so much that it’s practically inapplicable to web scraping.

(Not my opinion. I visited a major global law firm’s seminar on this topic a month ago and this is what they said.)


I'm not gonna feel bad if a corporation gets its data scraped (whenever it's legal to do so, and this is another kind of question I'm not knowledgeable enough to face) when they themselves try to scrape other companies' data


You seem to have a massive category error here. To my understanding, this is not only going to circumvent the scraping protection of companies that scrape other people's data.


Google and Amazon where built on scrapped data, who are you kidding?


There's a bidirectional benefit to Google at least. That's why SEO exists. People want to appear in search results.


I make sure to enroll in projects which scrape Google/Amazon en-masse just for the satisfaction.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: