Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> This isn't a vacuum charged with crawling the web, it's an adhoc GET request.

Doesn't matter. The robots-exclusion-standard is not just about webcrawlers. A `robots.txt` can list arbitrary UserAgents.

Of course, an AI with automated websearch could ignore that, as can webcrawlers.

If they chose do that, then at some point, some server admins might, (again, same as with non-compliant webcrawlers), use more drastic measures to reduce the load, by simply blocking these accesses.

For that reason alone, it will pay off to comply with established standards in the long run.



In the limit of the arms race it's sufficient for the robot to use the user's local environment to do the browsing. At that point you can't distinguish the human from the robot.


That's not how many of these services work though. The websearch and subsequent analysis of the results by an LLM are done from the servers of whoever supplies the solution.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: