Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This is the first multimodal model i hear about that is open source. Are there already other alternatives?


The Fuyu pre-trained model is not open source. At best, it is source-available. It's also not the only multimodal model you can run locally.

A few other examples include LLaVA[0], IDEFICS[1][2], and CogVLM[3]. Mini-GPT[4] might be another one to look at. I'm pretty sure all of these have better licenses than Fuyu. Fuyu's architecture does sound really interesting, but the license on the pre-trained model is a complete non-starter for almost anything.

[0]: https://github.com/haotian-liu/LLaVA

[1]: https://huggingface.co/blog/idefics

[2]: https://huggingface.co/HuggingFaceM4/idefics-80b-instruct

[3]: https://github.com/THUDM/CogVLM

[4]: https://github.com/Vision-CAIR/MiniGPT-4




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: