Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Congrats on the launch!

In what way is "prefix-aware embedding models trained with contrastive loss" better than the standard embedding model provided by OpenAI?

"added in learning from feedback and time based decay" => Sounds interesting! Have you seen significant gains in precision and recall here?

It looks like you are using NextJS app dir + external backend. Why did you decide against NextJs for frontend and backend? Are you happy with your choice?



OpenAI's models may fit that description as well under the hood. Specifically, for `prefix-aware`, this is useful when you have short passages (e.g. Slack messages) that you are trying to match against short queries (e.g. user questions). Without being prefix-aware, the model can get confused, think both are queries, and cause any short passages to match very strongly with short queries.

For learning from feedback for sure! No exact benchmarks, but we've heard from quite a few users about how useful this is to push high quality docs up and reduce the prevalence of poor docs. This is all very hard to evaluate since there aren't readily available, real-world "corporate tool / knowledge base" datasets out there. We're actually building our own in house right now, so we should have more concrete numbers around these things soon.

For the backend, we do a lot of stuff with local embedding models / cross encoders / tokenization / stemming / stop word removal etc. Python has the most mature ecosystem for this kinda stuff (and the retrieval pipeline is the core of our product), so we don't regret it at all!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: