Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

If the current Bard is really running on PaLM 2, it still hallucinates worse than GPT-3.5. Trying to get it to solve a variant of the classic wolf/goat/cabbage puzzle, I got this gem:

"The scientist is not present on Phobos on the first step. The Doom Slayer teleports himself and the bunny to Deimos, leaving the scientist on Phobos.

That wasn't a one-off thing, either - it repeatedly contradicted itself several times, often in near-adjacent sentences. You might wonder what this means for the ability to do chain-of-thought... so did I, but apparently the bigger problem is convincing it to do CoT in the first place. But if you do, yeah, it's as bad as you'd expect.

Here are two complete conversations, plus GPT-4 doing the same puzzle for comparison; judge for yourself: https://imgur.com/a/HWLgu3c



I don't think current bard runs on palm 2, otherwise it's complete failure


In their official blog post today, Google says this:

"PaLM 2’s improved multilingual capabilities are allowing us to expand Bard to new languages, starting today. Plus, it’s powering our recently announced coding update."

and when I check the Updates tab in Bard UI, it has this entry for today:

"Expanding access to Bard in more countries and languages. You can now collaborate with Bard in Japanese and Korean, in addition to US English. We have also expanded access to Bard in all three languages to over 180 countries."

which seems to strongly imply that it is, indeed, PaLM 2. Just to be sure, I gave it the same puzzle in Korean, and got a similarly lackluster response.


In their presentation, they talked about multiple sizes for the PaLM 2 model, named Gecko, Otter, Bison and Unicorn, with Gecko being small enough to run offline on mobile devices. I can't seem to find any info on what size model is being used with Bard at the moment.


Indeed, it's likely that they're running a fairly small model. But this is in and of itself a strange choice, given how ChatGPT became the gateway drug for OpenAI. Why would Google set Bard up for failure like that? Surely they can afford to run a more competent model as a promo, if OpenAI can?


This is just one task it fails at, hardly enough to generalize from.


That's not the only task it fails at, though. Just the one that I found the most interesting when it comes to broader implications because of so many self-contradictions in the output.

Broadly speaking, I haven't seen a single complex example yet where the output was comparable to GPT-4. How close it is to GPT-3.5 is debatable - the overall feeling that I get is that it's better on some tasks and worse on others; this might actually be down to fine-tuning.


Makes sense. Others also point out it is not as good as GPT-4 in several benchmarks.

https://news.ycombinator.com/item?id=35895404

They did in fact mostly avoid comparison with GPT-4 in the report. It could of course also be that Bard isn't even running on the largest PaLM 2 model, Unicorn. It seems they would have mentioned that though.

But PaLM 2 seems to be just an intermediate step anyway, since their big new model is "Gemini" (i.e. twins, an allusion to the DeepMind/Brain merger?), which is currently in training, according to Pichai. They also mentioned Bard will switch to Gemini in the future.


it claims to run on LaMDA at the moment


If you mean asking it what it's running on, it just hallucinates. As others have noted in the comments here, you can get it to say that it runs on PaLM 3 quite easily.


In chat history you can see which model generated each request - for me it’s always LaMDA


It just says "Bard", even if I click on "Details". Are you, perhaps, using some kind of internal preview?


Strange, this is what I see: https://imgur.com/a/sgtVt2O

I'm based in the UK, I wonder if that makes any difference


I don't see which model generated each request, where exactly do you see this?



Mine says "Bard" where yours says LaMDA. https://i.imgur.com/p8wIPHj.png




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: