These policies are much clearer than they were when last I looked, which is good. On the other hand. Perplexity appeared to ignore robots.txt as part of a search-enhanced retrieval scheme, at least as recently as June of this year. The article title is pretty unkind, but the test they used pretty clearly shows what was going on.
> The article title is pretty unkind, but the test they used pretty clearly shows what was going on.
I believe this article is around the same misunderstanding - it doesn't appear to show any evidence of their crawler, or web scraping used for training, accessing pages prohibited by robots.txt.
https://www.wired.com/story/perplexity-is-a-bullshit-machine...
It takes this sort of critical scrutiny, otherwise mechanisms like robots.txt do get ignored, whether willfully or mistakenly.