More

pietz · 2025-12-24T00:05:17 1766534717

Hold up, so d0 is just Claude with access to the terminal?

Sounds like they just discovered that they don't have a product.

pietz · 2025-12-23T23:59:32 1766534372

Weirdly, I find a higher signal to noise in this analogy than looking at benchmarks these days.

If you let your inner fanboy rest for a moment you realize Gemini 3, Claude Opus 4.5, and GPT 5.2 are all amazing. If two of them disappeared tomorrow, my AI assisted productiveness wouldn't change.

The 3% difference on benchmark X doesn't mean anything anymore. It's probably more helpful to compare them on character traits instead of numbers.

My one word to describe Claude would be "pleasant". It's just so nice to communicate with. GPT/Codex would be the "thorough". It finds and thinks of stuff the others don't. For Gemini 3, the jury is still out. It might be the smart kid on the block that's still a bit rough around the edges, but given that it's a preview things might change soon.

jorl17 · 2025-12-24T01:58:57 1766541537

Mine definitely would. This sounds so clichéd, but Claude (Opus 4.5, but also the others) just "gets how I think" better. I've tried Gemini 3 and GPT 5.2 and didn't like them at all -- not when I know I can have Claude. I mostly code Python + Django, so it could also be from that.

Gemini 3 has this extremely annoying habit of bleeding its reasoning process onto comments which are hard to read and not very human-like (they're not "reasoning", they're "question for the sake of questioning", which I get as a part of the process, but not as a comment in the code!). I've seen it do things like these many times:

    # Because so and so and so and so we must do x(param1=True, param2=False)
    # Actually! No, wait! It is better if we do x(param1=True, param2=True)
    x(param1=True, param2=True, param3=False) # This one is even better!

Beyond that, it just does not produce what I consider good python code. I daily-drove Gemini 2.5 before I realized how good Anthropic's models were (or perhaps before they punched back after 2.5?) and haven't been able to go back.

As for GPT 5.2, I just feel like it doesn't really follow my instructions or way of thinking. Like it's dead set on following whatever best practices it has learned, and if I disagree with them, well tough luck. Plus, and I have no better way of saying this, it's just rude and cold, and I hate it for it.

true_religion · 2025-12-24T04:11:56 1766549516

I recently discovered Claude, and it does much better than Codex or Gemini for python code.

Gemini seems to lean to making everything a script, disconnected from the larger vision. Sure, it uses our existing libraries, but the files it writes and functions it makes can’t be integrated back in.

Codex is fast. Very fast. Which makes it great for a conversational UI, and answering questions about the codebasw or proposing alternatives but when it writes code it’s too clever. The code is valid but not pythonic. Like the invention of one line functions just to optimize a situation that had could be parameterized in three places.

Claude on the other hand makes code that is simple to understand and has enough architecture that you can lift it out and use as is without too much rewriting.

pietz · 2025-12-21T16:16:26 1766333786

A bit of a missed opportunity not to use the JSON Resume schema for this.

https://jsonresume.org/schema

sinaatalay · 2025-12-21T16:45:12 1766335512

We deliberately chose not to use JSON Resume because we wanted greater flexibility. For example, in RenderCV, you can use any section title you want and place any of the 9 available entry types under any section. In contrast, JSON Resume has predefined section titles, and each section is restricted to a predefined entry type. For instance, you must use the experience entry schema under the experience section.

pietz · 2025-12-21T18:24:45 1766341485

I hear you. This boils town to personal opinion. I would have preferred to use an existing standard than introducing yet another one. The custom sections aren't something I've ever seen or needed anyway.

pietz · 2025-12-19T11:14:23 1766142863

That's a fair point and yet I deeply believe Codex is better here. After finishing a big task, I used two fresh instances of Claude and Codex to review it. Codex finds more issues in ~9 out of 10 cases.

While I prefer the way Claude speaks and writes code, there is no doubt that whatever Codex does is more thorough.

pietz · 2025-12-17T23:56:37 1766015797

Interesting. Flash suggests more power to me than Mini. I never use gpt-5-mini in the UI whereas Flash appears to be just as good as Pro just a lot faster.

taytus · 2025-12-18T05:32:45 1766035965

Im in between :)

Mini - small, incomplete, not good enough

Flash - good, not great, fast, might miss something.

pietz · 2025-12-17T09:21:33 1765963293

Now do the same with Rust, build a Python wrapper and we went full circle :)

pietz · 2025-12-16T08:31:30 1765873890

It seems to me like many developers are moving from native vector stores back to Postgres with pgvector. It's easier to integrate, cheaper to set up, familiar to work with, oftentimes faster, while not hitting any limitations for most projects. My vector store recommendation boils down to: "Use pgvector unless you have specific reason not to".

Does this still hold?

codingjaguar · 2025-12-16T10:11:34 1765879894

well it's apples and oranges. Why do people buy F150 instead of fitting things into the trunk of a Corolla? cuz they got a lot of stuff.

For people who run thousands of QPS on billions of vectors, Milvus is a solid choice. For someone playing with a twitter demo with a few thousand vectors, any vector db can do the job well. In fact there is a fun project Milvus Lite designed for that case :)

I've seen many builders migrate from pgvector to Milvus as their apps scale. But perhaps they wish they had considered scalability earlier.

(I'm from Milvus so i could be biased.)

Snakes3727 · 2025-12-16T10:45:51 1765881951

We regularly do tens of thousands of QPS on pgvector fine on massive data stores.

We dropped milvus after they started trying for force their zilliz garbage saas down our throats.

pietz · 2025-12-16T15:17:50 1765898270

People buy F150s because they find them cool and not because they actually need the space. Your Corolla could make deliveries around town in roughly the same time, while being cheaper and easier compared to introducing a new expensive car. In situations you need more space (which most of us won't), you can add a trailer instead.

Interesting, I guess we're on the same page ;)

gardnr · 2025-12-16T10:23:12 1765880592

pgVector is great and so is FAISS, but those are just a subset of what you get from Milvus. If all you need to do is RAG over 50Mb of documents then pick the right tool for the job. I use Chroma for a lot of projects.

Then, what if you want hybrid search, or different IVF variants, or disk-based search, or horizontal scaling, or something that leverages SIMD, or sparse vectors? Milvus is great.

jankovicsandras · 2025-12-16T14:33:09 1765895589

You can do hybrid search in Postgres.

Shameless plug: https://github.com/jankovicsandras/plpgsql_bm25 BM25 search implemented in PL/pgSQL ( Unlicense / Public domain )

The repo includes plpgsql_bm25rrf.sql : PL/pgSQL function for hybrid search ( plpgsql_bm25 + pgvector ) with Reciprocal Rank Fusion; and Jupyter notebook examples.

pietz · 2025-12-16T14:38:36 1765895916

You start by underselling what can be done with Postgres and then follow up with the upper end of requirements that most projects won't need. My argument is exactly what you conveniently left out: The big bulk between the two.

kgeist · 2025-12-16T12:27:34 1765888054

How does Milvus compare to OpenSearch/ElasticSearch for hybrid search?

leo_e · 2025-12-16T09:18:17 1765876697

I think they are for different scales and scenario complexities.

pietz · 2025-12-12T13:19:43 1765545583

They are definitely ahead in multi modality and I'd argue they have been for a long time. Their image understanding was already great, when their core LLM was still terrible.

pietz · 2025-12-05T09:23:19 1764926599

It means half of the internet isn't (wasn't) working.

pietz · 2025-11-14T12:17:56 1763122676

I imagine Anthropic employs a lot of talent from China. Beyond the political, they should be fairly certain to publish these claims to avoid an internal shit storm.