More

minimaxir · 2025-12-05T20:09:19 1764965359

The actual token calculations with input videos for Gemini 3 Pro is...confusing.

https://ai.google.dev/gemini-api/docs/media-resolution

pseudosavant · 2025-12-05T22:10:20 1764972620

That is because it isn't actually tokens that are fed into the model for non-text. For text, it is tokenized, and each token has a specific set of vectors. But with other media, they've trained encoders that analyze the media and produce a set of vectors that are the same "format" as the token's vectors, but it isn't actually ever a token.

Most companies have rules for how many tokens the media should "cost", but they aren't usually exact.

minimaxir · 2025-12-05T19:54:28 1764964468

Gemini 3 Pro is not Nano Banana Pro, and the image generation/model that decodes the generated image tokens may not be as robust.

The thinking step of Nano Banana Pro can refine some lateral steps (i.e. the errors in the homework correction and where they are spatially in the image) but it isn't perfect and can encounter some of the typical pitfalls. It's a lot better than Nano Banana base, though.

hodder · 2025-12-05T20:01:12 1764964872

As a consumer I typed this into "Gemini". The behind the scenes model selection just adds confusion.

If "AI" trust is the big barrier for widespread adoption to these products, Alphabet soup isn't the solution (pun intended).

iknowstuff · 2025-12-05T20:37:37 1764967057

Nano Banana generates images.

This article is about understanding images.

Your task is unrelated to the article.

JacobAsmuth · 2025-12-06T00:33:50 1764981230

It works fine for me. https://imgur.com/a/MKNufm1

minimaxir · 2025-12-05T19:52:19 1764964339

Nothing new, it's just highlighting practical vision use cases.

minimaxir · 2025-12-05T19:39:11 1764963551

Gemini 3 Pro has been playing Pokemon Crystal (which is significantly harder than Red) in a race against Gemini 2.5 Pro: https://www.twitch.tv/gemini_plays_pokemon

Gemini 3 Pro has been making steady progress (12/16 badges) while Gemini 2.5 Pro is stuck (3/16 badges) despite using double the turns and tokens.

theLiminator · 2025-12-05T22:09:09 1764972549

I think what would be interesting is if it could play the game with vision only inputs. That would represent a massive leap multimodal understanding.

minimaxir · 2025-12-05T19:36:29 1764963389

That's more of an issue with Nano Banana Pro than with Gemini 3 Pro.

siva7 · 2025-12-05T19:42:33 1764963753

What's the difference? I thought the vision ai component of gemini 3 is called nano banana?

IanCal · 2025-12-05T19:47:05 1764964025

That’s about generating images, the other side is about understanding images.

brokensegue · 2025-12-05T19:55:17 1764964517

i assumed nano banana was just a tool that gemini 3 used though i don't know

minimaxir · 2025-12-05T19:58:30 1764964710

Gemini 3 Pro's text encoder powers Nano Banana Pro, but it has its own image decoding model that decodes the generated image tokens into an actual image, which appears to be the more pertinent issue in this case.

minimaxir · 2025-12-04T17:29:33 1764869373

You're being reductive to the point that you're saying "LLMs are an algorithm like auto complete/search engine, therefore they're the same."

That's not how it works. They're different approaches to how they handle the same inputs.

edwin2 · 2025-12-04T17:35:08 1764869708

i would totally agree that they’re different approaches

i wouldn’t conclude “therefore they’re the same”. they’re clearly not the same

if it’s a different approach to search and scripting, does that not mean it is a kind of search and scripting?

minimaxir · 2025-12-03T20:00:44 1764792044

dang (the head moderator of Hacker News) has said multiple times that HN prefers human-only comments.

minimaxir · 2025-12-02T20:12:18 1764706338

> once most of its users realise that it offers them no actual practical advantages over Pandas

What? Speed and better nested data support (arrays/JSON) alone are extremely useful to every data scientist.

My produtivity skyrocketed after switching from pandas to polars.

minimaxir · 2025-12-02T17:33:08 1764696788

What can you do in more easily in pandas than polars?

minimaxir · 2025-12-02T17:30:49 1764696649

For noncoding tasks, Gemini atleast allows for easier grounding with Google Search.