Well, to be fair, to run an unquantized 70B model is going to take somewhere in ...

Well, to be fair, to run an unquantized 70B model is going to take somewhere in the area of 160gb of VRAM (if my quick back of the napkin math is ok). I'm not quite sure of the state of GPUs these days, but getting a 2x a100 80gb (or 4x 40gb) setup is probably going to cost more than a Mac Studio with maxed out RAM.

If we are talking quantized, I am currently running LLaMA v1 30B at 4 bits on a MacBook Air 24GB ram, which is only a little bit more expensive than what a 24GB 4090 retails for. The 4090 would crush the MacBook Air in tokens/sec, I am sure. It is however completely usable on my MacBook (4 tokens/second, IIRC? I might be off on that).

A 4 bit 70B model should take about 36GB-40GB of RAM so a 64GB MacStudio might still be price competitive with a dual 4090 or 4090 / 3090 split setup. The cheapest Studio with 64GB of RAM is 2,399.00 (USD).