Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Great, but how do you imagine multimodal with text, video. Just 2 for simplicity, what will be in the training set. With text model tries to predict next, then more steps were added. But what to do with multimodal?


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: