Thank you for the post. It's a good read. I'm working on governance/validation layers for n-LLMs and making them observable so your comments on runaway AIs resonated with me. My research is pointing me to reputation and stake consensus mechanisms being the validation layer either pre inference or pre-execution, and the time to verify decisions can be skipped with enough "decision liquidity" via reputation alone aka decision precedence.
I’ve been running OpenClaw Docker agents in Slack in a similar setup, using Gemini 2.5 Flash Lite through OpenRouter for most tasks, then Opus 4.6 and Codex 5.3 for heavier lifts. They share context via embeddings right now, but I’m going to try parameterizing them like you suggested because they can drift prettyy hard once a hallucinated idea takes off. I’m trying to get to a point where I don’t have to babysit them. I’ve also been thinking about giving them some “democracy” under the hood with a consensus policy engine. I’ve started tinkering an open-source version of that called consensus-tools that I can swap between agentic frameworks. Checking out if it can work with openswarm to work for me too.
Go Momo go! If you want to hook up multiple dogs and have them reach consensus I'm down. I have a 15 lb havapoo I can volunteer ( he needs to help with rent )
This doesn’t look like a reasoning ceiling. It looks like a decision reliability problem.
The unstable tier is the key result. Models that get it right 70–80% of the time are not “almost correct.” They are nondeterministic decision functions. In production that’s worse than being consistently wrong.
A single sampled output is just a proposal. If you treat it as a final decision, you inherit its variance. If you treat it as one vote inside a simple consensus mechanism, the variance becomes observable and bounded.
For something this trivial you could:
-run N independent samples at low temperature
-extract the goal state (“wash the car”)
-assert the constraint (“car must be at wash location”)
-reject outputs that violate the constraint
-RL against the "decision open ledger"
No model change required. Just structure.
The takeaway isn’t that only a few frontier models can reason. It’s that raw inference is stochastic and we’re pretending it’s authoritative.
Reliability will likely come from open, composable consensus layers around models, not from betting everything on a single forward pass.
Fair I cleaned up the wording with ChatGPT with my review prompt. The substance matters more than the style. If a model flips 3/10 times on a trivial constraint, that’s a reliability issue, not a reasoning ceiling.
> If a model flips 3/10 times on a trivial constraint, that’s a reliability issue, not a reasoning ceiling.
I have reviewed your previous comments, and you have consistently written: that's instead of that’s. So what I read is still some LLM output, even though I think there is some kind of human behind the LLM.
Great read. The bilingual shadow reasoning example is especially concerning. Subtle policy shifts reshaping downstream decisions is exactly the kind of failure mode that won’t show up in a benchmark leaderboard.
My wife is trilingual, so now I’m tempted to use her as a manual red team for my own guardrail prompts.
I’m working in LLM guardrails as well, and what worries me is orchestration becoming its own failure layer. We keep assuming a single model or policy can “catch” errors. But even a 1% miss rate, when composed across multi-agent systems, cascades quickly in high-stakes domains.
I suspect we’ll see more K-LLM architectures where models are deliberately specialized, cross-checked, and policy-scored rather than assuming one frontier model can do everything. Guardrails probably need to move from static policy filters to composable decision layers with observability across languages and roles.
Appreciate you publishing the methodology and tooling openly. That’s the kind of work this space needs.
The cascading failure point is critical. A 1% miss rate per layer in a 5-layer pipeline gives you roughly 5% end-to-end failure, and that's assuming independence. In practice the failures correlate because multilingual edge cases that bypass one guardrail tend to bypass adjacent ones too.
The observation that guardrails need to move from static policy filters to composable decision layers is exactly right. But I'd push further: the layer that matters most isn't the one checking outputs. It's the one checking authority before the action happens.
A policy filter that misses a Persian prompt injection still blocks the action if the agent doesn't hold a valid authorization token for that scope. The authorization check doesn't need to understand the content at all. It just needs to verify: does this agent have a cryptographically valid, non-exhausted capability token for this specific action?
That separates the content safety problem (hard, language-dependent, probabilistic) from the authority control problem (solvable with crypto, language-independent, deterministic). You still need both, but the structural layer catches what the probabilistic layer misses.
This is why I’m using the open source consensus-tools engine and CLI under the hood. I run ~100 maintainer-style agents against changes, but inference is gated at the final decision layer.
Agents compete and review, then the best proposal gets promoted to me as a PR. I stay in control and sync back to the fork.
It’s not auto-merge. It’s structured pressure before human merge.
I’m working on an open source project that treats this as a consensus problem instead of a single model accuracy problem.
You define a policy (majority, weighted vote, quorum), set the confidence level you want, and run enough independent inferences to reach it. Cost is visible because reliability just becomes a function of compute.
The question shifts from “is this output correct?” to “how much certainty do we need, and what are we willing to pay for it?”
Still early, but the goal is to make accuracy and cost explicit and tunable.
With consensus.tools we split things intentionally. The OSS CLI solves the single user case. You can run local "consensus boards" and experiment with policies and agent coordination without asking anyone for permission.
Anything involving teams, staking, hosted infra, or governance sits outside that core.
Open source for us is the entry point and trust layer, not the whole business. Still early, but the federation vs stadium framing is useful.
reply