Hybrid Search: Vector + Full-Text Search

SingleStore has just released support for indexed approximate-nearest-neighbor (ANN) vector search. By combining ANN search with our full-text search and SQL capabilities, you can do powerful hybrid search.

Here's how:

  • Create a subquery that uses vector search
  • Create a subquery that uses keyword search for similar content
  • Join the two with a full outer join
  • Produce a final result that combines the vector search score and full-text search score, and re-ranks

This allows you to get the benefits of both full-text search (where you only find documents with keyword matches) and vector search (where you get a semantic match where keywords don’t need to match) in one SQL query.

We've created a data set with over 160M vectors and associated paragraphs to simulate vector-based semantic search over all 6.7 million articles in Wikipedia. It's got real data and vectors for video games, and the rest of the data is mocked up. For details about this data set and how to load it, see our blog on ANN search in SingleStore.

This is the table holding the vectors and paragraphs:

create table vecs(
id bigint(20),
url text default null,
paragraph text default null,
v vector(1536) not null,
shard key(id), key(id) using hash,
fulltext (paragraph)

And this creates the ANN index on it:

alter table vecs add vector index ivf (v) INDEX_OPTIONS
'{"index_type": "IVF_FLAT"}'

This query follows the outline we gave earlier and does a hybrid vector and full-text search:

/* Get the vector for the first paragraph about the Mario Kart Game.
It's a good semantic query vector for Mario Kart. */
set @v_mario_kart = (select v from vecs
where url = "https://en.wikipedia.org/wiki/Super_Mario_Kart"
order by id limit 1);
with fts as (
select id, paragraph,
match(paragraph) against("mario kart") as score
from vecs
where match(paragraph) against("mario kart")
order by score desc
limit 200
vs as (
select id, paragraph, v <*> @v_mario_kart as score
from vecs
order by score desc
limit 200
select vs.id, vs.paragraph, ifnull(fts.score, 0) * .3
+ ifnull(vs.score, 0) * .7 as score,
fts.score as fts_s, vs.score as vs_s
from fts full outer join vs on fts.id = vs.id
order by score desc
limit 5;

The full power of SQL is used here in a relatively compact, understandable way. You can do it all in one query rather than having to use an application program to combine results from a vector database, a full-text database and a relational database.

Here are the results:

Interestingly, the final ranked result (by score) is different from the order for the full-text score (fts_s) or the vector similarity score (vs_s). So both the full-text search and the vector search have an influence on the result. Moreover, the results came back very quickly —  123 ms including round-trip time to and from the cloud — for over 160 million rows.

For a video demo of this hybrid search, see the third of three demos in this YouTube video, starting at 4:28.

SingleStore gives you the power of hybrid search with relational, vector and full-text operations all in one system, via the SQL language you already know. Plus, it provides the robust controls you expect from an enterprise database including transactions, high availability, disaster recovery, security, monitoring and alerting. How can you build more powerful semantic search and gen AI applications with the power of SQL?

Start free with SingleStore today.