Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Then you’ve fundamentally misunderstood what a robots.txt file does or is even intended to do and should reevaluate if you should be in charge of how access is or is not prevented to such systems.

Absolutely nothing has to obey robots.txt. It’s a politeness guideline for crawlers, not a rule, and anyone expecting bots to universally respect it is misunderstanding its purpose.



> Absolutely nothing has to obey robots.txt

And absolutely no one needs to reply to every random request from an unknown source.

robots.txt is the POLITE way of telling a crawler, or other automated system, to get lost. And as is so often the case, there is a much less polite way to do that, which is to block them.

So, the way I see it, crawlers and other automated systems have 2 options: They can honor the polite way of doing things, or they can get their packets dropped by the firewall.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: