This could of course be a bug, but it's worth noting that vector searching in ge...

This could of course be a bug, but it's worth noting that vector searching in general is a semantic ("meaning based") search technique, but it doesn't operate on the actual text, it operates on the resulting embedding so it may well make sense to include a (possibly fuzzy) full text search as well to catch literal instances of the text if that's what you actually need.

That said, the particular example you give may be either a configuration problem or just that "cat" is a particularly bad choice of word to use for a vector search in your corpus.

HNSW[1] is an approximate nearest neighbour search technique. So if you visualise the embedding as distilling your documents down to a set of numbers, this vector forms the coordinates of a point in a high-dimensional vector space. HNSW is going to take the word "cat" in your example, embed that, and find what are probably the closest other vectors (representing documents) in that space by your distance metric.[2]

Now, it's easy to imagine the problem - if your document corpus is about something very unrelated to cats (like say your corpus is a bunch of programming books), the results of this search are going to be basically just random, because you have a big blob of embedded documents in your corpus which are relatively close together and your search term embeds way off in empty space and therefore the distance from the search term to any given document is extremely large. If you were to search a topic that is reflected in your corpus, your search term could embed right in the middle of that blob and the distances would be much more meaningful meaning the results are likely to be more intuitive and useful.

[1] https://arxiv.org/abs/1603.09320 is I think the original paper that introduced the technique

[2] The probably/approximately get out is what allows the computational complexity of HNSW to stay reasonable even if the corpus is large