Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I'm less convinced. Are you saying it's unethical to automate browsing a site?

Because if you save the pages you browse on some site, they're yours (authors don't own your cache).

Perhaps you're arguing that if you wrote a lightweight script/browser (which is just your user agent) to save some website for offline use, that'd be unethical and GDPR violating? Again, I don't think so but maybe I'm missing something. But perhaps this turns on what defines a "user agent".

Perhaps this becomes a "depth of pre-fetch" question. If your browser prefetches linked pages, that's "automated" downloading, akin to the script approach above. Downloading. To your cache. Which you own. (Where I struggle to see an ethical violation)

Genuinely curious where the line is, or what exactly here is triggering ethics, GDPR and practical standards?



Maybe a good illustration would be ClearView AI. They are scraping websites, extracting information (images), and training ML models to learn embeddings (distance between faces). They indiscriminately collect personal data without opt-in, but a limited opt-out mechanism.

In this case, if this tool is used to scrape a website, there are too direct issues: 1/ no immediate way for the website owner to exclude this particular scraper (what is the useragent?) 2/ no way for data subjects (whose data is present on the website) to search whether the scraper learned their personal data in the embeddings. Data being available publicly doesn't mean it can be widely used [at least outside the US, where we have much stricter rules on scraping].




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: