I am talking about the 3 larger models PaLM 2-S, PaLM 2-M, and PaLM 2-L described in the technical report.
At I/O, I think they were referencing the scaling law experiments: there are four of them, just like the number of PaLM 2 codenames they cited at I/O (Gecko, Otter, Bison, and Unicorn). The largest of those smaller-scale models is 14.7B, which is too big for a phone too. The smallest is 1B, which can fit in 512MB of RAM with GPTQ4-style quantization.
Either that, or Gecko is the smaller scaling experiment, and Otter is PaLM 2-S.
If the extrapolation is not too flawed, it looks like PaLM 2-S might be about 120B, PaLM 2-M 180B, PaLM 2-L 280B.
Still, I would expect GPT-4 trained for way longer than Chinchilla, so it could be smaller than even PaLM 2-S.