SingleStore has just released support for indexed approximate-nearest-neighbor (ANN) vector search. By combining ANN search with our full-text search and SQL capabilities, you can do powerful hybrid search.
Here's how:
- Create a subquery that uses vector search
- Create a subquery that uses keyword search for similar content
- Join the two with a full outer join
- Produce a final result that combines the vector search score and full-text search score, and re-ranks
This allows you to get the benefits of both full-text search (where you only find documents with keyword matches) and vector search (where you get a semantic match where keywords don’t need to match) in one SQL query.
We've created a data set with over 160M vectors and associated paragraphs to simulate vector-based semantic search over all 6.7 million articles in Wikipedia. It's got real data and vectors for video games, and the rest of the data is mocked up. For details about this data set and how to load it, see our blog on ANN search in SingleStore.
This is the table holding the vectors and paragraphs:
create table vecs(id bigint(20),url text default null,paragraph text default null,v vector(1536) not null,shard key(id), key(id) using hash,fulltext (paragraph));
And this creates the ANN index on it:
alter table vecs add vector index ivf (v) INDEX_OPTIONS'{"index_type": "IVF_FLAT"}'
This query follows the outline we gave earlier and does a hybrid vector and full-text search:
/* Get the vector for the first paragraph about the Mario Kart Game.It's a good semantic query vector for Mario Kart. */set @v_mario_kart = (select v from vecswhere url = "https://en.wikipedia.org/wiki/Super_Mario_Kart"order by id limit 1);with fts as (select id, paragraph,match(paragraph) against("mario kart") as scorefrom vecswhere match(paragraph) against("mario kart")order by score desclimit 200),vs as (select id, paragraph, v <*> @v_mario_kart as scorefrom vecsorder by score desclimit 200)select vs.id, vs.paragraph, ifnull(fts.score, 0) * .3+ ifnull(vs.score, 0) * .7 as score,fts.score as fts_s, vs.score as vs_sfrom fts full outer join vs on fts.id = vs.idorder by score desclimit 5;
The full power of SQL is used here in a relatively compact, understandable way. You can do it all in one query rather than having to use an application program to combine results from a vector database, a full-text database and a relational database.
Here are the results:
Interestingly, the final ranked result (by score) is different from the order for the full-text score (fts_s) or the vector similarity score (vs_s). So both the full-text search and the vector search have an influence on the result. Moreover, the results came back very quickly — 123 ms including round-trip time to and from the cloud — for over 160 million rows.
For a video demo of this hybrid search, see the third of three demos in this YouTube video, starting at 4:28.
SingleStore gives you the power of hybrid search with relational, vector and full-text operations all in one system, via the SQL language you already know. Plus, it provides the robust controls you expect from an enterprise database including transactions, high availability, disaster recovery, security, monitoring and alerting. How can you build more powerful semantic search and gen AI applications with the power of SQL?