Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

If a human triggers a browser by pressing a button, should it ignore robots.txt?


Are you arguing that these are equivalent actions?

The entire web was built on the understanding that humans generally operate browsers, and robots.txt is specifically for scenarios in which they do not.

To pretend that the automated reading of websites by AI agents is not something different…is quite a stretch.


I see it as very different. I the human want the data from that request. I am using a tool to get it for me.

Should I not be able to execute curl to download a webpage because the "understanding that humans generally operate browsers"?


> I the human want the data from that request. I am using a tool to get it for me.

Isn't this a bit of an oversimplification, though? Especially when the tool you're using completely alters the relationship between the content author and the reader?

I hear this argument often: "it's just another tool and we've always used tools". But would you acknowledge that some tools change the dynamics entirely?

> Should I not be able to execute curl to download a webpage because the "understanding that humans generally operate browsers"?

Executing curl to download a webpage is nothing new, and compared to a traditional browser, has about the same impact. This is still drastically different than asking an AI agent to gather information and one of the pages it happens to "read" is the one you were previously navigating to with a browser or downloading with curl.

If you're a content creator who built a site/business based on a pre-LLM understanding of the dynamics of the ecosystem, doesn't it seem reasonable to see these types of "readers" differently?


No, whether I curl it, or I use a browser, or an LLM, it is essentially ALL the same, unless of course the LLM crawls it by itself, without human interaction.

If the scale bothers you, block it, just like how you would block any other crawlers.

Other than that, we all wanted "ease-of-access" (not me though), and now we have it. It does not change anything.


What if the crawlers are faking their identity (As they are doing it right now)


Well, how do we deal with it in terms of DDoS?


It's reasonable for the content creator to see it differently, but I don't think it's reasonable to expect everyone around the content creator to contort any new approach to the needs of the pre-existing business model.


I agree. This came up in terms of copyright either, or who is pressing the shutter and who owns the copyright to the photo taken. I personally think that the copyright belongs to me, because I, a human, made the detailed prompt, the tool just generated it. Do I not own the copyright if I make something using Photoshop? As far as I know, I do. So, how is AI any different that needs human action (i.e. be prompted)? Because it is better than Photoshop? That is not a good argument, IMO.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: