>Unlike many, I don’t sit on the fence. I hate it with a passion. I think it’s an affront to humanity and will ultimately cause our downfall. If an article uses AI, even just for header images, I don’t read it.
This seems just as silly of a take as the fervent supporters of sticking AI everywhere.
AI helps in some areas (e.g. assisting medical diagnoses) and is shitty in others (e.g. spam content).
I wonder if the author would refuse to receive a diagnosis from a doctor if the diagnosis was AI-assisted?
I'm somewhere between the author and a true-neutral stance. Something can be "an affront to humanity [that] will ultimately cause our downfall" and also "help in some areas". Those aren't mutually exclusive, logically, despite sounding that way.
Indeed the latter is kind of a precursor to the former: AI has to be helpful enough for key use cases in order to become used in whatever critical ways cause it to serve as humanity's downfall.
Setting aside the apocalyptic prediction in particular, this sort of thing is already the case with many things that have become irritating or pernicious at scale, like automated phone systems and SEO. The latter of those examples already includes AI more and more. If AI weren't a helpful device for reducing writing costs, it wouldn't be playing a role in filling the web with worthless SEO-spam garbage.
> AI has to be helpful enough for key use cases in order to become used in whatever critical ways cause it to serve as humanity's downfall.
I'm sure the strong proponents of AI in all the places have never griped about false positive account permabans from cloud services or app stores.
What could possible go wrong with turning a function over to an automated system that's <100% correct, realizing cost savings by downsizing the human support backing it, and then eventually just engineering the process so all appeals are piped to /dev/null?
PS: What we need are mandatory rights to appeal to a human, when automated decision systems are used, with legally defined response time SLAs and arbitration options.
This seems misplaced in a discussion about generative AI. Automated systems for handling appels are already a thing and don't necessarily rely on generative AI like LLMs.
I agree that bans and appeal handling needs a human in the loop to work as intended, to have some way of handling the false positives. Without a human in the loop the appeal process can just as well be removed entirely, because it won't fulfill its purpose.
Whether it's misplaced or not kinda depends on how big of a deal you think recent and upcoming generative AIs are. If they're really incredible, their recognition as something that makes moderation automation cheaper and/or better is likely to inject new energy into the preexisting drive to automate tasks like moderation and policy appeals. In the same way, if they make it possible to automate things that strictly required humans before (which seems to be the case for LLMs already), new classes of tasks will be automated that weren't before. Those things make things like a general right to appeals before competent, human reviewers more urgent, even if it's been a good idea for a long time.
On the other hand, if you don't see recent and upcoming generative AIs as that world-changing, they don't really change the picture as to whether or not the enshrinement of such a right into policy is warranted.
The question isn't whether or not there are benefits to be had from the technology. The question is whether or not the cost/benefit ratio is favorable. It's not clear to me what the answer to that actually is.
It's coherent (just) for something to be an affront to humanity and helpful for some tasks. It's emphatically not coherent for you to consider something an affront to humanity and to condone using it.
I disagree. Generically speaking that's true, but specifically speaking, AI has a wide variety of use cases, and they do not all, from the perspective of a normal user, appear to help spread what they perceive to be the bad parts.
Obviously if we could somehow separate the good from the bad, we should, but it seems like an intractable problem.
This dichotomy (polychotomy?) is also aggravated by the fact that when people talk about AI, they are probably 99% referring to generation of prose, art, and code (as opposed to e.g. medical diagnosis).
> they are probably 99% referring to generation of prose, art, and code
That was what I took TFA to be speaking about. It's ambiguous, probably because author didn't expect an audience of HN readers, but most of the enumerated concerns and uses cases are specific to "AI as replacement for human creativity, in artistic endeavors."
> It's emphatically not coherent for you to consider something an affront to humanity and to condone using it.
A lot of people (particularly Americans) seem to think of nuclear bombs in exactly this way. They typically see the 'some task' to be of extremely outsized necessity and importance, and I think they'd say that makes the two ideas cohere.
Then those people do not seriously think nuclear weapons are an affront to humanity. It doesn't just mean "very bad". It's against human dignity. You shouldn't do it, ever. Killing isn't an affront to humanity, but torture is, for example.
If you think using nuclear weapons can be justified, then you don't think they're evil, you just think they're dangerous.
I absolutely agree about nukes. The US' use of nukes was barbaric imperialism and when USians' say it was justified I think that's a transparent post-hoc rationalization. And I agree totally that the position I outlined is deeply hypocritical.
I'm less sure that it's an outright logical contradiction, but I'm inclined to agree with you there as well.
GenAI will just be a way to reduce labor costs and capture/monetize customer data even further. IMO, everyone should be against non-local GenAI in any product and they should be against even local GenAI when its downstream effects clash with human interests.
Does GenAI monetarily disincentivize the self-expression of humanity through art and other creative endeavors, such endeavors that were already difficult to make a living on pre-GenAI? I think it does.
I thought so too. But then I changed my mind [0]. Especially after
asking some other people what they thought. In a nutshell it's about
association. People see an AI image and start thinking... hey maybe
the prose and video had a little 'help' too. Still haven't got around
to replacing all the generative thumbnails. Once AI stuff get's into
your content it's like pollution and a royal PITA to sieve out.
> I wonder if the author would refuse to receive a diagnosis from a doctor if the diagnosis was AI-assisted?
I can't speak for the author, obviously, but personally my answer would be "it depends". If the diagnosis came from a doctor who happened to use AI as one of their tools, I'm OK with that (as long as it was a locally-hosted AI, but that's a different issue). If the diagnosis came from AI without a substantial amount of analysis from a doctor, then I'd absolutely reject that.
I'm 100% in agreement with the author, and to answer your question: if I found out my doctor had based their diagnosis on output from an AI, I'd find another doctor.
How do you reconcile that you interact with AI when you may not even realise? This could be in the form of a recommendation feed, a newsreader delivering a story on the 6 o'clock news or using software that was built by it?
There are two different meta-uses of AI that we're talking about here.
#1 AI as final step
#2 AI as upstream step, with subsequent human review/adjustment
In the case of medical AI, we're almost always talking about the latter: models preclassify or review results, which are then subsequently reviewed by a human.
The cases people seem to have problems with are where no human eyes/hands provide oversight to probabilistically-correct model output*.
It's an important enough distinction that I'd be in favor of mandating companies declare which they're using (for a specific use case).
* As distinct from previous deterministically-correct expert systems / rules-engines
> AI helps in some areas (e.g. assisting medical diagnoses)
Increasingly (and I think certainly in the above case) AI is used as shorthand for genAI (which is unsurprising, as up until recently most AI-ish things got called ML anyway). I certainly hope no-one's using LLMs for medical diagnoses...
> I wonder if the author would refuse to receive a diagnosis from a doctor if the diagnosis was AI-assisted?
It really depends on what you mean by 'AI-assisted', IMO. If you mean that the doctor had asked a chatbot, I'd very much be looking for a second opinion. What sort of AI assistance did you have in mind.
Not sure if you mean me but I wasn't interested in clicks when writing this article. In fact, I was getting thousands of clicks and comments on Medium and now I get almost none because I deleted all my articles there. I wasn't making money either. So, not about clicks. I'm just genuinely concerned about the future of life on this planet.
> AI helps in some areas (e.g. assisting medical diagnoses)
Uhhh, I don't know about that one. Have we seen studies that show real predictive capabilities? Anecdotal evidence is not helpful, and it seems rather risky to depend on something that has not been thoroughly vetted when it comes to people's lives.
>In 2020, Zhang’s team developed an AI imaging-assisted diagnosis system for COVID-19 pneumonia and published in Cell. Based on the 500,000 copies of CT images that the team studied, the system was able to distinguish COVID-19 from other viral pneumonias within 20 seconds, with an accuracy rate of more than 90%.
>AI improves the lives of patients, physicians, and hospital managers by doing activities usually performed by people but in a fraction of the time and the expense. [...] Not only that, AI assists physicians in detecting diseases by utilizing complicated algorithms, hundreds of biomarkers, imaging findings from millions of patients, aggregated published clinical studies, and thousands of physicians’ notes to improve the accuracy of diagnosis.
I mean if I was given a COVID diagnosis or non-diagnosis based on the above, I'd be reporting the doctor, because it's far worse accuracy than generally available methods (even antigen methods)... Even if visual methods were the only means of diagnosing COVID (as was the case for the first few weeks after the virus emerged), sorry, but I'd be asking for an actual radiologist. Radiologists were able to do this with better accuracy.
Really, the only way this could possibly be useful were in a hypothetical case where only visual methods were available and there weren't enough radiologists, and even then it wouldn't be _very_ useful, given the high failure rate.
> Not only that, AI assists physicians in detecting diseases by utilizing complicated algorithms, hundreds of biomarkers, imaging findings from millions of patients, aggregated published clinical studies, and thousands of physicians’ notes to improve the accuracy of diagnosis.
This sounds very fuzzy. Can you appoint to specific approved applications, or is this all hypothetical?
>Can you appoint to specific approved applications, or is this all hypothetical?
One of those links is a systemic literature review of a bunch of other existing literature.
If that is not enough for you, that is totally understandable! But if I'm honest here, I don't really care enough to go digging even further and trying to find additional studies or approved applications from your preferred country, with an ultimate goal of convincing you.
I would be fine if my current doctor used AI to assist in their diagnosis of me. You don't have to be, it's completely your right to choose that.
In diagnosis there is no prediction. The AI can find a pattern somewhere and that’s it. It’s actually very efficient in some fields (like recognizing stuff in a scanned body) but doctors always have the final say. It’s like a hint, and it should stay like that.
I've been thinking a lot lately about why I don't like AI and ultimately I think it's because of its tone. I don't know why OpenAI made ChatGPT so wordy and almost unctuous.
I've realised I don't actually care about people using it for programming or brainstorming or whatever. It's just I feel so inslulted when I read something that is in the default AI voice.
So I don't know that I agree entirely with the writer of the piece but I get where he's coming from. AI writing is unpleasant to read. And I hope Medium reverses their decision.
The reason people "dislike AI because of its tone" is that they only recognize AI-written text when it's poorly composed. It's likely that you're already interacting with AI-generated text without realising it.
I disagree. The "tone" of the vast majority of AI to me has a clear identity, and is not related to poor composition. It is in fact "high quality", technically speaking, and is a separate problem from that of bad/strange genAI writing. In fact, some real people do write in a similar tone, which makes it hard to tell apart sometimes, which is the problem you describe, but that doesn't obscure the fact that it is the "AI tone".
It takes too long to get anything said and it has a very weak sense of topicality that it papers over with repetitious use of literal words and phrases from the things it's asked.
Afaict, it's 'high quality' only in the sense of not containing many grammatical mistakes or spelling errors, which of course is not 'high quality' so much as 'basic literacy'.
I think I know what you mean about other elements of the
tone (staid and obsequious) not necessarily being related to style but they don't quite form a clear identity for me. The defects of the writing are still more marked than its personality.
The possibility of well-written AI prose is a big reason why I dislike AI text generation. Writing for an audience refines the ideas being communicated. If the author doesn't do that refinement themselves, then I'm not reading what the author is thinking, I'm reading what the LLM could patch together from their ideas. If I want the LLM's opinion on how to make a concept work, I can ask it directly. If I'm reading something by an author, I want to know what the author is actually thinking!
If you wanted a chair, you could build it yourself. So why do most people buy chairs?
To be direct:
Your post assumes that all users are equally good at producing content with AI/LLMs — but if they’re not, then people who are better providing that to people who are worse will become a market.
usually I expect an instructional article/blog post/etc to have actually been tried and tested. If it has the format of "I had x problem, chatGPT suggested solution y, it actually worked, here it is and a bit about why it works" then of course that's fine and good. But you'll only have to google a few times before you see a 100% synthetic SEO optimised useless article that does nothing but waste your time.
This assumption would often be false, which is the problem. I already have access to LLMs (that I leverage often), I’m using a search engine because I’m looking for high-quality, detailed info or real-world examples from someone who knows what they are talking about.
This blind cheerleading of LLM-generated content filling the web is what pushes people to hate them.
While I'm not about to defend AI blog spam, what you just said it completely applicable to human-generated blogs as well.
You should not just assume something is trustworthy or will solve your problem just because it was written by a human. And I have read plenty of cruft-filled blogs/articles/etc. from humans.
If a human has written it I can be somewhat confident the code was at least compiled and tested a bit.
The same is not true for AI generated prose, in which the author often can't even be bothered to remove AI boilerplate. In such a case it's obvious the code has never been tested, and it might not even compile.
> While I'm not about to defend AI blog spam, what you just said it completely applicable to human-generated blogs as well.
It kind of depends on the subject. If the blog is about politics, say, then you may assume that the human author is likely to be willing to lie. If it's about a non-contentious subject, though, then it is highly unlikely the human author will lie, and if they don't know, then, well, okay, _some_ people will write articles about stuff that they don't understand, but it's not common. Most non-psychopaths, if they don't know the answer, won't just make something up.
Whereas the magic robot will very happily spew nonsense on any subject.
This seems like too much of an absolutist stance regarding AI. AI is a tool and as such has both good and bad uses. For example, maybe I have something I feel important to share with the world but I am not a very good writer. If AI can help me express my own ideas more clearly and clean up my grammar, that to me is a great use of AI. On the contrary, letting AI just churn out articles wholesale to me would be an abuse rather than good use. Correct me if I’m wrong, but I think that also is consistent with Medium’s policy.
Medium's policy does allow accounts where AI can churn out articles wholesale, they just can't monetized. And the author does make the point that he would rather read imperfect human writing than AI assisted or otherwise 'the flaws make the personality'.
I think it's a personal stance so the degree of absoluteness doesn't matter. It's what he prefers.
We can tell him it's too harsh when he feels the pinch of cutting of all writers using LLMs. As far as use of LLMs to clean up ones language usage is concerned, there's a difference between editing the content it generates, and learning grammar patterns and word usages from it and applying them while writing on your own. When the latter is done, no one can tell.
I don't really think AI actually helps people express their ideas. I think it makes them less human and so it's not even them expressing ideas any more.
The quality of Medium articles (and comments) has really gone down over the past few years. Lots of attention grabbing headlines “5 ways to X” “Stop doing X” and less well-written content overall. I’m not sure if most writers hopped over to Substack but it feels like a cheaper place than it used to be.
There was a brief period where seeing a Medium article in a search result made me excited. Now, I avoid them, because of too many experiences with shallow, incorrect, or LLM-generated articles.
>> AI assistance empowers an author to level up — to make their ideas clearer, for example, or help them express themselves in a second language
> And also, I vehemently disagree with this statement. Flaws express personality.
So for the same reasons, does he wish to not read content that has been assisted by spelling or grammar checking? Or an editor, proof reader or fact checker? Or a thesaurus or dictionary? Or is he only concerned when AI is applied to those roles?
Well, I think that a spell-checker is a lot different than AI because AI can change the entire tone of an article whereas a spell and grammar checker generally does not.
I sort of love that he takes a strong stance here. I think even if you think there are some applications for AI, you should be able to strongly state where it is not useful. Having AI churn out bloat text in the form of terrible blogposts and misleading listicles that makes it harder for genuine information to be found online, is not a good use case for AI. If you want to ask ChatGPT to summarise a topic for you, you can literally just ask ChatGPT to do that. There is zero benefit to having a third party pumping that into websites that we go to to find real human opinions and hopefully a few genuinely great, expert articles - neither of which ChatGPT can produce.
> I am absolutely against artificial intelligence.
I'm curious where someone like him draws the line. AI is an ambiguous term. He's an author and photographer, so perhaps AI has just come to mean LLMs and image generators?
I might even agree with the thrust of his concerns, but this kind of diatribe always come off a bit rage-blind, and maybe brings the strength of the discussion down a little.
The industry only has itself to blame for this, because, for a decade or so, anything 'AI-ish' has almost always been branded as ML (presumably due to the previous AI winter, where the term 'AI' became poisonous to VCs). If people equate AI with generative AI, it is only because, well, _so does the industry_.
I avoid AI whenever I can, including machine-learning assisted noise reduction. But my argument against AI is not against specific technologies, but the conglomerate of technologies that represents an ideal that pushes efficiency beyond what I consider useful.
Contrary to your accusation of being rage-blind, I have studied this topic in detail and have carefully thought about it and read many articles in philosophy about it. I am still trying to articulate my ideas but my general feelings go way beyond emotions.
I see where the author is coming from - like any new technology, AI is getting abused significantly: Better scams, stealing creator IP, developing thought crutches, emotional dependencies on artificial companions, vast energy usage, and the collapse of personality and knowledge across the societal mean.
Some people say that LLMs make you more productive, but guess what? The value captured by your productivity will only enrich the companies that employ you. Companies will hire fewer people and eliminate positions all for "shareholder value" and to give a small raise to a C-suite executive.
But at the same time, we could definitely envision a world where these models accelerate human creativity - smaller high performance LLMs working on the edge, responsible usage for learning and skill development, helping people build their own voice.
The key is to identify the subtle line between dependence and empowerment, and to know when one is accruing AI-debt at the expense of their own abilities.
> But at the same time, we could definitely envision a world where these models accelerate human creativity
I feel like the scarcity and difficulty of creating something is part of the charm of art, instead of it being a matter of mass production. And for what? To show more ads?
I agree with what your saying, and actually that is another strike against Medium for me, but I was willing to tolerate the platform because at least there was a small subset of writers trying to go beyond hype headlines.
There's Google's blogspot, but that's not trendy. There's Wordpress, but that's not trendy. There are a host of blogging tools out there, but they're not trendy.
The question isn't what other simple platform is out there: the question is what blogging software do you want to be seen using. People don't want blogspot.com at the end of their domain names, and they don't want to look like they're using blogspot.com. They didn't mind Medium, and that's the difference. It's not a question of technology, which has existed for decades; it's a question of marketing and brands.
Several years ago static site generators were all the hotness. Around then I switched to Hugo [1] from Wordpress and it's been a good experience. I do all editing locally with the CLI then chuck it to Git to be built and hosted by Netlify.
I mean... It was bound to happen. The dead internet theory became a reality and I don't blame people for being disappointed. Twitter is full of ai bots(verified at that mind you), so is reddit and I recon most of social media platforms. Medium is no different in that regard - they want two things: content and engagement. AI gives them an ungodly amount of content. Crap content, no doubt about it, but currently they are still banking on quantity and not quality, which will bite them back sooner or later - selling 10 burgers for 50 bucks each is a much better financial decision and an easier execution than to sell 100 burgers at 5 bucks a pop. A lot of people(Medium included) don't get that. The sad part is that LLM's can only juggle concepts around by predicting the most likely following token, which would work in a static and unchanging world. I get where the author is coming from and I feel for him. The reality is that large platforms are becoming bottomless pits of ai-generated content and it would never pay off in any meaningful way to try and compete with a regular, consumer grade GPU that can spit out 20 articles an hour, while you spend a week writing a single one.
The way I see it, the only true solution(for both writer or reader) is to go back to the good old-fashioned self-hosted blogs, expose an RSS and rely on RSS readers(yes, those still exist). You will never be able to compete with the exposure that you'd get on Medium or any other platform but at the end of the day, I'd rather pay 50 bucks for a good burger than 5 bucks for a cheap bun from Lidl with some very questionable substances inside.
HTML only websites still work that you can host yourself trivially
These platforms are all the same: put your middling content here and you’ll be a star
In my experience most people have valueless content and target an affinity group audience. They produce enough content to have a viral hit, that viral hit will then become your first ratchet. You then adjust your content type based on the virality and then you just continue the ratchet. Iirc this is the playbook of eg MeBeast etc… and theres a deterministic outcome: create enough attention that marketing income consumes your lifestyle and drives all future ratchet iterations
If you have anything of value people will figure out how to get it but it won’t be fast and it will be mostly ignored
Why this writer ever thought Medium ever cared about specifically human-written stories is beyond me. Medium is a bad idea for many reasons but AI doesn’t make my list at all. Also the stanch anti-AI stance is incredibly off putting, saying things like they won’t even read an article with an AI-generated image or AI-assisted (as if they can actually tell with certainty). AI (LLMs) are a tool, nothing more, no different than technology itself is a tool. I find all these anti-AI takes extremely tiring and repetitive.
My eyes about rolled out of my head reading through this.
My money is on either this writer fading into obscurity or using AI tools in the future when they realize their stance was silly.
> My money is on either this writer fading into obscurity or using AI tools in the future when they realize their stance was silly.
I will never use AI tools, never. And I am more than happy to fade into obscurity. I just want to make enough money (without AI) so I can take care of my family in peace, and also articulate my ideas against AI to whomever will listen. I am absolutely not concerned about becoming famous.
In all seriousness though, there isn’t a competitive platform out there that won’t adopt AI or any other technology if it meant making money. If they chose not to, they would be giving ground to someone who would. That’s capitalism in a nutshell.
You aren’t their customer. It’s the eyeballs they’re after and, any particular authors word isn’t really precious. Enough authors/writers out there who are willing to replace you.
Remember kids: “Content” is the word for things _between_ the ads.
What I tried to express is how we will look back on this in the future. Demanding a ban of AI will look just as insane as it would be to demand a ban of the wheel, metal, electricity, machines, computers or the internet.
Or the nuclear bomb? Yet we do have a sort of "ban" on it. Some things are just too powerful, and not all technological inventions are good. What about nerve gas? The world has effectively banned that too.
We have been through this exact scenario hundreds of times with different technologies. As usual, someone's loss is another's gain. For example, people who have ideas to express but are not native English speakers, or those who struggle with expressing themselves in writing, will absolutely benefit from this technology.
> We have been through this exact scenario hundreds of times with different technologies
Yes, and we left people behind repeatedly. You'll notice people who may have intellectual or learning disabilities can no longer find a job with a decent wage in the US. Previously they could, because manual labor jobs were plenty. Now those are automated.
When we further automate and automate, we raise the barrier to entry, because robots or AI can perform the bottom most tasks. So, people who are not capable are left behind, and eventually there will be nobody to work jobs.
If the average job requires reading and writing, most people can do it. If the average job requires a high school diploma, less can do it. If the average job requires a degree, even less can do it, and so on and so on.
Leisure time? You mean “leisure time” digging for food in dumpsters and begging for scraps of change? Or is there some magical money machine that the main benefactors of AI will provide for the “fortunate” people removed from the economy and provided with so much “leisure” time?
> people having more and more leisure time is a bad thing?
What could possibly make you think this is the end result?
In the past 50 years, productivity across the whole economy has went up over an order of magnitude. People in the 70s were working 40 hours a week - are you now working 4 hours?
We don't have precedent for this, is my point. The solution to lost work is make people work something else. My point is what happens when there is no something else for those people to work?
Will we have leisure? Who will pay for uncapable people to have leisure? This sounds like UBI.
> all work being done by technology and all people having all of their time to spend on fun activities
Maybe in the end, but what happens before that? Like, say, 25% of people are "smart" enough to still work a job, what happens to the 75%? Or even 90% of people work but 10% can't, do we just like... kill those 10% or?
I'll add another analogy of my own: "guy harvests his own ice, is mad at others for buying it from the store"
He is mad at others using AI on his favorite blogging website. I think it is an insult to himself, because most AI produced articles etc. are unreadable and full of useless filler information - I think the guy can do better.
The analogy compares a writer complaining about generative AI to an ice delivery person demanding a ban on refrigerators. Here's the breakdown:
Ice delivery guy = Traditional writer
Going on strike = Complaining or protesting
Demanding ban of fridges = Calling for restrictions on generative AI
The analogy suggests that the writer's complaint about generative AI is as futile and outdated as an ice delivery person trying to ban refrigerators. Just as refrigerators made ice delivery largely obsolete, the implication is that generative AI might be seen as a technological advancement that could potentially impact traditional writing roles.
Leave and go where. Seems like a losing battle, eventually all platforms will be co-opted to feed AI. Even if you have a private web site (blog), it can be scraped, and how are the little guys going to fight it?
There was a bunch of reporting on how AI companies and researchers were using tools that ignored robots.txt. It's a "polite request" that these companies had a strong incentive to ignore, so they did. That incentive is still there, so it is likely that some of them will continue to do so.
CommonCrawl[0] and the companies training models I'm aware of[1][2][3] all respect robots.txt for their crawling.
If we're thinking of the same reporting, it was based on a claim by TollBit (a content licensing startup) which was in turn based the fact that "Perplexity had a feature where a user could prompt a specific URL within the answer engine to summarize it". Actions performed by tools acting as a user agent (like archive.today, or webpage-to-PDF site, or a translation site) aren't crawlers and aren't what robots.txt is designed for, but either way the feature is disabled now.
These policies are much clearer than they were when last I looked, which is good. On the other hand. Perplexity appeared to ignore robots.txt as part of a search-enhanced retrieval scheme, at least as recently as June of this year. The article title is pretty unkind, but the test they used pretty clearly shows what was going on.
> The article title is pretty unkind, but the test they used pretty clearly shows what was going on.
I believe this article is around the same misunderstanding - it doesn't appear to show any evidence of their crawler, or web scraping used for training, accessing pages prohibited by robots.txt.
The EU's AI act points to the DSM directive's text and data mining exemption, allowing for commercial data mining so long as machine-readable opt-outs are respected - robots.txt is typically taken as the established standard for this.
In the US it is a suggestion (so long as Fair Use holds up) but all I've seen suggests that the major players are respecting it, and minor players tend to just use CommonCrawl which also does. Definitely possible that some slip through the cracks, but I don't think it's as useless as is being suggested.
Funny. If I can browse to it, it is public right? That is how some people's logic goes. And how OpenAI argued 2 years ago when GPT3.5/ChatGPT first started getting traction.
> Technically, robot.txt isn't enforcing anything, so it is just trust.
There's legal backing to it in the EU, as mentioned. With CommonCrawl you can just download it yourself to check. In other cases it wouldn't necessarily be as immediately obvious, but through monitoring IPs/behavior in access logs (or even prompting the LLM to see what information it has) it would be possible to catch them out if they were lying - like Perplexity were "caught out" in the mentioned case.
> Funny. If I can browse to it, it is public right? That is how some people's logic goes. And how OpenAI argued 2 years ago when GPT3.5/ChatGPT first started getting traction.
If you mean public as in the opposite of private, I think that's pretty much true by definition. Information's no longer private when you're putting it on the public Internet.
If you mean public as in public domain, I don't think that has been argued to be the case. The argument is that it's fair use (that is, the content is still under copyright, but fitting statistical models is substantially transformative/etc.)
This seems just as silly of a take as the fervent supporters of sticking AI everywhere.
AI helps in some areas (e.g. assisting medical diagnoses) and is shitty in others (e.g. spam content).
I wonder if the author would refuse to receive a diagnosis from a doctor if the diagnosis was AI-assisted?