Question about models and weights. When an organization says they release the we...

leodriesch · on March 29, 2023

OpenAI also released the weights for Whisper. Some model releases, like LLaMa from Meta, just contain the code for training and the data. You can train the weights yourself, but for LLaMa that takes multiple weeks on very expensive hardware.

(LLaMa also released the weights for researchers and they leaked online, but the weights are not open-source.)

When a company releases a model including the weights, you can download their pre-trained weights and run inference with them, without having to train by yourself.

superb-owl · on March 29, 2023

The model is just an untrained architecture. You'd need to spend a lot of money (a) gathering data, and (b) running GPUs to train it.

The weights are the fruition of training. They make the model actually useful.

rexreed · on March 29, 2023

When you download a trained model for use by Python, I'm assuming the model contains both the architecture (the neural net or even a boosted tree) as well as the weights / tree structure that makes the model actually usable in inference. When organizations release a trained model, I'm assuming that the weights are necessary to make use of that model? If not, then are they not really releasing the model, but just the architecture and training data?

vdfs · on March 29, 2023

As i understand it:

- Model is the code

- Data is the text/images used for traing

- Weights are the training results

For example Lucene, models will be the java library, data is text data like wikipedia and weights are the lucene index. if you have all the 3 you can start searching right away, if you have model+data you have to generate the index which can take a lot of time, training/indexing take more than searching or using the model. if you have just he model you need to get your own data and run training on it

freeone3000 · on March 29, 2023

Usually not the training data either!