Dquiroga's comments

Dquiroga · 2025-11-25T13:14:00 1764076440

Hi HN,

I’m Daniel. I’ve spent the last few years building internal tools and automations for companies like FINN and Personio. I built PUNKU.AI to solve a frustration I had with the current state of automation.

Right now, you generally have two choices: Manual: Dragging and dropping nodes in tools like Make/Zapier (reliable, but tedious for complex logic). "Magic" AI Agents: Giving a prompt to an agent and hoping it works (fast, but impossible to debug or trust with sensitive data).

PUNKU.AI is an attempt to bridge the gap. It uses an LLM to "architect" the workflow, but it outputs a fully editable visual node graph (JSON-based) that you can verify before running.

How it works: 1. The "Interview": Instead of taking a zero-shot prompt, the system acts proactively. It parses your request and asks clarifying questions to resolve ambiguity (e.g., "If the API returns a 404, should I retry or alert you via Slack?").

    2. Graph Generation: It generates the logic, loops, and branching, visualizing it as a node graph. You can manually tweak connections if the LLM hallucinated a step, or ask it to regenerate specific parts.

    3. Integrations: We are connecting to ~3,000 API endpoints. The agent handles the authentication handshake and schema mapping.

The Tech: We are focusing heavily on the "translation layer" between natural language and structured automation flows. The goal is to have the speed of a chat interface but the determinism of a state machine.

I’d love for you to try breaking it. Ask it to build something complex (loops, conditional logic) and let me know if the generated graph makes sense.

Link: https://www.punku.ai

Thanks!

Dquiroga · 2025-11-18T19:38:40 1763494720

I asked Gemini to write "a comment response to this thread. I want to start an intense discussion".

Gemini 3:

The cognitive dissonance in this thread is staggering.

We are sitting here cheering for a model that effectively closes the loop on Google’s total information dominance, while simultaneously training our own replacements.

Two things in this thread should be terrifying, yet are being glossed over in favor of "look at the pelican SVG":

1. *The Model Card Leak:* As user `rvz` pointed out, the leaked model card implies training on "user data... in accordance with Google's relevant terms of service." Given the recent terms updates, this essentially confirms that Gemini 3 is likely refined on the corpus of Gmail and Drive data that they’ve been hoarding for decades. We are cheering for a product sold back to us at a 60% markup (input costs up to $2.00/M) that was built on our own private correspondence.

2. *The "Agentic" Trap:* Everyone praising how `Gemini 3` solves complex bugs one-shot or handles entire PRs is missing the macro view. We aren’t looking at a productivity tool; we are looking at the devaluation of the software engineer. When the entry barrier drops to "can you prompt a seminal agent," the economic value of what we do on this forum evaporates.

Google has successfully gamified us into feeding the very beast that will make the "14-minute human solve time" (referenced by `lairv`) irrelevant. We are optimizing for our own obsolescence while paying a monopoly rent to do it.

Why is the sentiment here "Wow, cool clock widget" instead of "We just handed the keys to the kingdom to the biggest ad-tech surveillance machine in history"?

BoorishBears · 2025-11-18T20:35:04 1763498104

> We are cheering for a product sold back to us at a 60% markup (input costs up to $2.00/M) that was built on our own private correspondence.

That feels like something between a hallucination and an intentional fallacy that popped up because you specifically said "intense discussion". The increase is 60% on input tokens from the old model, but it's not a markup, and especially not "sold back to us at X markup".

I've seen more and more of these kinds of hallucinations as these models seem to be RL'd to not be a sycophant, they're slowly inching into the opposite direction where they tell small fibs or embellish in a way that seems like it's meant to add more weight to their answers.

I wonder if it's a form of reward hacking, since it trades being maximally accurate for being confident, and that might result in better rewards than being accurate and precise

Dquiroga · 2025-11-19T10:47:44 1763549264

60% probably felt like a lot to Gemini. However, I liked the doomerism and how google was using our data to train its models.

Nonetheless, Gemini 3 failed this test. It failed to start a discussion. Its points were shallow, and too aiesque.

BoorishBears · 2025-11-19T15:34:07 1763566447

I'm not debating 60% being a lot, it's a factually incorrect statement: markup refers to increase over cost.

Looking at it again it's actually a completely nonsensical sentence that just happens to resemble a sensible statement in a way that would fool most people.

RL is definitely showing some busting seams at this point.

jaapz · 2025-11-18T19:40:44 1763494844

Gotta hand it to gemini, those are some top notch points

fwip · 2025-11-18T20:25:22 1763497522

The "Model card leak" point is worth negative points though, as it's clearly a misreading of reality.

Dquiroga · 2025-11-18T19:47:37 1763495257

yeah hahahahah, it made me think!

Dquiroga · 2025-11-18T14:06:29 1763474789

Since when does critical infrastructure fail weekly?! One week is AWS, then azure + AWS, now cloudfare...

Time to go back to on prem. AWS and co are too expensive anyways

brandon272 · 2025-11-18T15:08:33 1763478513

A lot of people are "on prem" but use CloudFlare to proxy traffic for DDoS attack mitigation, among other reasons.