Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

One thing you could do: FTS5 has the `fts5vocab` virtual table [1] that has all the terms. You could provide a user-defined function that computes the levenshtein distance between your query terms and the terms in that table, obtain candidate terms that way and build a big query that searches for all those lexically close terms.

[1] https://www.sqlite.org/fts5.html#the_fts5vocab_virtual_table...



I can confirm an approach like this works in practice.

Although instead of levenshtein I use spellfix (maybe it uses that under the covers? not sure). If there is no match from the first search, I use the sqlite spellfix extension [0] to find matches. Then feed those candidates into the terms.

https://www.sqlite.org/spellfix1.html


Ah, that's great, didn't know about that :-) It seems to use a similar edit-distance algorithm under the hood, and the docs explicitely mention integration with the FTS extension, so this is probably the way to go!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: