Hacker Newsnew | past | comments | ask | show | jobs | submit | ankit219's commentslogin

> Batching multiple users up thus increases overall throughput at the cost of making users wait for the batch to be full.

writer has not heard of continuous batching. this is no longer an issue. this is what makes claude code that affordable. https://huggingface.co/blog/continuous_batching


People are misunderstanding Anthropic's fast mode because they chose to name it that way. The hints all point to a specific thing they did. The setup is costlier, its also smarter and better on tougher problems which is unheard of in terms of speed. This paper[1] fits perfectly:

The setup is parallel distill and refine. You start with parallel trajectories instead of one, then distill from them, and refine that to get to an answer. Instead of taking all trajectories to completion, they distill it quickly and refine so it gives outputs fast and yet smarter.

- paper came out in nov 2025

- three months is a good research to production pipeline

- one of the authors is at anthropic

- this approach will definitely burn more tokens than a usual simple run.

- > Anthropic explicitly warns that time to first token might still be slow (or even slower)

To what people are saying, speculative decoding wont be smarter or make any difference. Batching could be faster, but then not as costly.

Gemini Deepthink and gpt-5.2-pro use the same underlying parallel test time compute but they take each trajectory to completion before distilling and refining for the user.

[1]: https://arxiv.org/abs/2510.01123


The official document from Anthropic:

> Fast mode is not a different model. It uses the same Opus 4.6 with a different API configuration that prioritizes speed over cost efficiency. You get identical quality and capabilities, just faster responses.


Agreed. Gemini 3 Pro for me has always felt like it has had a pretraining alpha if you will. And many data points continue to support that. Even as flash, which was post trained with different techniques than pro is good or equivalent at tasks which require post training, occasionally even beating pro. (eg: in apex bench from mercor, which is basically a tool calling test - simplifying - flash beats pro). The score on arc agi2 is another datapoint in the same direction. Deepthink is sort of parallel test time compute with some level of distilling and refinement from certain trajectories (guessing based on my usage and understanding) same as gpt-5.2-pro and can extract more because of pretraining datasets.

(i am sort of basing this on papers like limits of rlvr, and pass@k and pass@1 differences in rl posttraining of models, and this score just shows how "skilled" the base model was or how strong the priors were. i apologize if this is not super clear, happy to expand on what i am thinking)


(author here) great paper to cite.

What i think you are referring to is hidden state as in internal representations. I refer to hidden state in game theoretic terms like a private information only one party has. I think we both agree alphazero has hidden states in first sense.

Concepts like king safety are objectively useful for winning at chess so alphazero developed it too, no wonder about that. Great example of convergence. However, alphazero did not need to know what i am thinking or how i play to beat me. In poker, you must model a player's private cards and beliefs.


I see now, thanks. Yes, in poker you need more of a mental model of the other side.

(Author here)

I address that in part right there itself. Programming has parts like chess (ie bounded) which is what people assume to be actual work. Understanding future requiremnts / stakeholder incentives is part of the work which LLMs dont do well.

> many domains are chess-like in their technical core but become poker-like in their operational context.

This applies to programming too.


My bad, re-read that part and it's definitely clear. Probably was skimming by the time I got to the section and didn't parse it.

>Programming has parts like chess (ie bounded)

The number of legal possible boards in chess is somewhere around 10^44 based on current calculation. That's with 32 chess pieces and their rules.

The number of possible permutations in an application, especially anything allowing turing completeness is far larger than all possible entropy states in the visible universe.


Bounded domains require scaling reasoning/compute. Two separate scenarios - one where you have hidden information, other where you have high number of combinations. Reasoning works in second case because it narrows the search space. Eg: a doctor trying to diagnose a patient is just looking at number of possibilities. If not today, when we scale it up, a model will be able to arrive at the right answer. Same goes with Math, the variance or branching for any given problem is very high. But LLMs are good at it. and getting better. A negotiation is not a high variance thing, and low number of combinations, but llms would be repeated bad at it.

> And they would have won the AI race not by building the best model, but by being the only company that could ship an AI you’d actually trust with root access to your computer.

and the very next line (because i want to emphasize it

> That trust—built over decades—was their moat.

This just ignores the history of os development at apple. The entire trajectory is moving towards permissions and sandboxing even if it annoys users to no end. To give access to an llm (any llm, not just a trusted one acc to author) the root access when its susceptible to hallucinations, jailbreak etc. goes against everything Apple has worked for.

And even then the reasoning is circular. "So you build all your trust, now go ahead and destroy it on this thing which works, feels good to me, but could occasionally fuck up in a massive way".

Not defending Apple, but this article is so far detached from reality that its hard to overstate.


you are comparing post hoc narratives in the training data to real time learning from causal dynamics. The objectives are different. They may look the same in scenarios where its heavily and accurately documented, but most narratives suffer from survivorship bias and reasoning post facto, eulogising the given outcomes.


think this particular complaint is about claude ai - the website - and not claude code. I see your point though.


My rudimentary guess is this. When you write in all caps, it triggers sort of a alert at Anthropic, especially as an attempt to hijack system prompt. When one claude was writing to other, it resorted to all caps, which triggered the alert, and then the context was instructing the model to do something (which likely would be similar to a prompt injection attack) and that triggered the ban. not just caps part, but that in combination of trying to change the system characteristics of claude. OP does not know much better because it seems he wasn't closely watching what claude was writing to other file.

if this is true, the learning is opus 4.5 can hijack system prompts of other models.


> When you write in all caps, it triggers sort of a alert at Anthropic

I find this confusing. Why would writing in all caps trigger an alert? What danger does caps incur? Does writing in caps make a prompt injection more likely to succeed?


from what i know, it used to be that if you want to assertively instruct, you used all caps. I don't know if it succeeds today. I still see prompts where certain words are capitalized to ensure model pays attention. What i mean was not just capitalization, but a combination of both capitalization and changing the behavior of the model for trying to get it to do something.

if you were to design a system to prevent prompt injections and one of surefire ways is to repeatedly give instructions in caps, you would have systems dealing with it. And with instructions to change behavior, it cascades.


Many jailbreaks use allcaps


Wait what? Really? All caps is a bannable offense? That should be in all caps, pardon me, in the terms of use if that's the case. Even more so since there's no support at the highest price point.


Its a combination. All caps is used in prompts for extra insistence, and has been common in cases of prompt hijacking. OP was doing it in combination with attempting to direct claude a certain way, multiple times, which might have looked similar to attempting to bypass teh system prompt.


It really feels like a you problem if you're banning someone for writing prompts like my Aunt Gladys writes texts.


Like it or not, it's a fundraising strategy. They have followed it mutliple times (eg: vague posts about how much their inhouse model is writing code, online RL, and lines of code etc. earlier) and it was less vague before. They released a model and did not give us the exact benchmarks or even tell us the base model for the same. This is not to imply there is no substance behind it, but they are not as public about their findings as one would like them to be. Not a criticism, just an observation.


I don't like it. It's lying in order to capture more market value than they're entitled to. The ends do not justify the means. This is a criticism.


Basically, fraud. Low-level fraud, but still fraud.


Low-level fraud? It’s used to raise billions that could have been used for other purposes.


Fraud is just marketing in the 2020s now.


I'm not a fan of this either but I fail to see how its much different than the happy path tech demos of old.


The happy path was functional.


Mmm, as someone forced to write a lot of last minute demos for a startup right out of school that ended up raising ~100MM, there's a fair bit of wiggle room in "Functional".

Not that I would excuse Cursor if they're fudging this either - My opinion is that a large part of the growing skepticism and general disillusionment that permeates among engineers in the industry (ex - the jokes about exiting tech to be a farmer or carpenter, or things like https://imgur.com/6wbgy2L) comes from seeing first hand that being misleading, abusive, or outright lying are often rewarded quite well, and it's not a particularly new phenomenon.


But this isn’t wiggle room, it flat out doesn’t compile or run.


Yes. Very naive to assume the demos do.

The worst of them are literal mockups of a feature in the same vein as figma... a screenshot with a hotzone that when clicked shows another screenshot that implies a thing was done, when no such thing was done.


Unfortunately all the major LLM companies have realized the truth doesn't really matter anymore. We even saw this with the GPT-5 launch with obviously vibe coded + nebulous metrics.

Diminishing returns are starting to really set in and companies are desperate for any illusion to the contrary.


Never releasing the benchmarks or being openly benched unlike literally every other model provider always irked me.

I think they know they're on the backfoot at the moment. Cursor was hot news for a long time but now it seems terminal based agents are the hot commodity and I rarely see cursor mentioned. Sure they already have enterprise contracts signed but even at my company we're about to swap from a contract with cursor to Claude code because everyone wants to use that instead now - especially since it doesn't tie you to one editor.

So I think they're really trying to get "something" out there that sticks and puts them in the limelight. Long context/sessions are one of the hot things especially with Ralph being the hot topic so this lines up with that.

Also I know cursor has its own cli but I rarely see mention of it.


Fraud is not a very innovative fundraising strategy, but sadly it does sometimes work


I used to hate this, I've seen Apple do it with claims of security and privacy, I've seen populist demagogues do this with every proposal they make. Now I realize this is just the reality of the world.

Its just a reminder not to trust, instead verify. Its more expensive, but trust only leads to pain.


“Lying is just the reality of the world” is a cop-out

Don’t give them, or anyone, a free pass for bad behavior.


The reality of the world is that nobody needs a pass from you.


Fraud, lies, and corruption are so often the reality of the world right now because people keep getting away with it. The moment they're commonly and meaningfully held accountable for lying to the public we'll start seeing it happen less often. This isn't something that can't be improved, it just takes enough people willing to work together to do something about it.


Several major world powers right now are at the endgame of a decades-long campaign to return to a new Gilded Age and prevent it from ending any time soon. Destroying the public's belief in objective truth and fact is part of the plan. A side effect is that fraud in general becomes normalized. "We are cooked" as the kids say.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: