I've recently put together a setup that seemed reasonable for my limited budget. Mind you, most of the components were second-hand, open box deals, or deep discount of the moment.
This comfortably fits FP8 quantized 30B models that seem to be "top of the line for hobbyists" grade across the board.
Does it offer more performance than a Macbook Pro that could be had for a comparable sum? Your build can be had for under $3k; a used MBP M3 with 64 GB RAM can be had for approximately $3.5k.
I'm not sure, I did not run any benchmarks. As a ballpark figure -- with both cards throttled down to 250W, running a Qwen-30B FP8 model (variant depending on task), I get upwards of 60 tok/sec. It feels on par with the premium models, tbh.
Of course this is in a single-user environment, with vLLM keeping the model warm.
No NVLink; it took me a long time to compose the exact hardware specs, because I wanted to optimize performance. Both cards are on x8 PCIe direct CPU channels, close to their max throughput anyway. It runs hot with the CPU engaged, but it runs fast.
This comfortably fits FP8 quantized 30B models that seem to be "top of the line for hobbyists" grade across the board.
- Ryzen 9 9950X
- MSI MPG X670E Carbon
- 96GB RAM
- 2x RTX 3090 (24GB VRAM each)
- 1600W PSU