These policies are much clearer than they were when last I looked, which is good... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		cjf101 on Oct 2, 2024 \| parent \| context \| favorite \| on: Why I'm leaving Medium: AI policy These policies are much clearer than they were when last I looked, which is good. On the other hand. Perplexity appeared to ignore robots.txt as part of a search-enhanced retrieval scheme, at least as recently as June of this year. The article title is pretty unkind, but the test they used pretty clearly shows what was going on. https://www.wired.com/story/perplexity-is-a-bullshit-machine... It takes this sort of critical scrutiny, otherwise mechanisms like robots.txt do get ignored, whether willfully or mistakenly.

Ukv on Oct 4, 2024 [–]

> The article title is pretty unkind, but the test they used pretty clearly shows what was going on.

I believe this article is around the same misunderstanding - it doesn't appear to show any evidence of their crawler, or web scraping used for training, accessing pages prohibited by robots.txt.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact