Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

For those who want to do something like this, the word / library you'll need to search for is "trigrams". Make trigrams of all your relevant search terms, then run a job to populate a JSON to your S3/CDN for each of the trigrams, with the most relevant documents sorted inside each one. You have a theoretical max of 26x26x26 = 17K files, but usually many possible trigrams aren't used in English.

Can keep modifying the list over time to take into account, recency, new data, etc. But this is a pretty scalable solution for the most part. Rebuilding your entire index on S3 should still cost you less than a dollar.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: