Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> - Need moar GPUs..

Why is there not a greater focus on quantization to optimize model performance, given the evident need for more GPU resources?



We have highly efficient models for inference and a quantization team.

Need moar GPUs to do a video version of this model similar to Sora now they have proved that Diffusion Transformers can scale with latent patches (see stablevideo.com and our work on that model, currently best open video model).

We have 1/100th of the resources of OpenAI and 1/1000th of Google etc.

So we focus on great algorithms and community.

But now we need those GPUs.


Don't fall for it: OpenAI is microsoft. They have as much as google, if not more.


Google got cheap TPU chips, means they circumvent the extremely expensive Nvidia corporate licenses. I can easily see them having 10x the resources of OpenAI for this.


Yes, they have deep pockets and could increase investment if needed. But the actual resources devoted today are public, and in line with the parent said.


To be clear here, you think that Microsoft has more AI compute than Google?


This isn’t OpenAI that make GPTx.

It’s StabilityAI that makes Stable Diffusion X.


can someone explain why nVidia doesn't just hold their own AI? And literally devote 50% of their production to their own compute center? In an age where even ancient companies like Cisco are getting in the AI race, why wouldn't the people with the keys to the kingdom get involved?


They've been very happy selling shovels at a steep margin to literally endless customers.

The reason is because they instantly get a risk free guaranteed VERY healthy margin on every card they sell, and there's endless customers lined up for them.

If they kept the cards, they give up the opportunity to make those margins, and instead take the risk that they'll develop a money generating service (that makes more money then selling the cards).

This way there's no risk of: A competitor out competing them, not successfully developing a profitable product, "the ai bubble popping", stagnating development, etc.

There's also the advantage that this capital has allowed them to buy up most of TSMC's production capacity, which limits the competitors like Google's TPUs.


Because history has shown that the money is in selling the picks and shovels, not operating the mine. (At least for now. There very well may come a point later on when operating the mine makes more sense, but not until it's clear where the most profitable spot will be)


Don’t stretch that analogy too far. It was applicable to gold rushes, which were low hanging fruit where any idiot could dig a hole and find gold.

Historically, once the easy to find gold was all gone it was the people who owned the deep gold mines and had the capital to exploit them who became wealthy.


"The people that made the most money in the gold rush were selling shovels, not digging gold".


1. the real keys to the kingdom are held by TSMC whose fab capacity rules the advanced chips we all get, from NVIDIA to Apple to AMD to even Intel these days.

2. the old advice is to sell shovels during a gold rush


Jensen was just talking about a new kind of data center: AI-generation factories.


> Why is there not a greater focus on quantization to optimize model performance, given the evident need for more GPU resources?

There is an inherent trade off between model size and quality. Quantization reduces model size at the expense of quality. Sometimes it's a better way to do that than reducing the number of parameters, but it's still fundamentally the same trade off. You can't make the highest quality model use the smallest amount of memory. It's information theory, not sorcery.


Yes Quantization compresses float32 values to int8 by mapping the large range of floats to a smaller integer range using a scale factor. This scale factor is key for converting back to floats (dequantization), aiming to preserve as much information as possible within the int8 limits. While quantization reduces model size and speeds up computation, it trades off some accuracy due to the compression. It's a balance between efficiency and model quality, not a magic solution to shrink models without losing some performance.

Quantization is essential for me since a 7B model won't fit on my RTX 2060 with only 6GB of VRAM. It allows me to compress the model so it can run on my hardware.


I believe he means for training




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: