Announcing SingleStore Indexed ANN Vector Search

Clock Icon

16 min read

Pencil Icon

Jan 24, 2024

Announcing SingleStore Indexed ANN Vector Search

Last spring, we made the case that your vector database should not be a vector database, but should be a modern SQL database with vector search capabilities.

SingleStore has supported vector search since 2017 using an exact-nearest-neighbor approach. Today, we're proud to announce we've shipped the public preview of indexed approximate-nearest-neighbor (ANN) search and support for a native vector data type in the latest release of SingleStore (8.5). Indexed ANN gives us orders-of-magnitude faster vector search, and the vector data type makes it easier for developers to build vector-based applications.

The main advantage that specialty vector databases (SVDBs) used to have over SingleStore for vector search is that they indexed vector data to allow for fast searches with reasonable accuracy. That advantage is now gone. With the addition of indexed ANN search, it's the SVDB vendors who need to catch up to the full set of capabilities of SingleStore including SQL, scale-out architecture, fast analytics, ACID transactions, full-text search, extensibility, security, resiliency and more.

whats-vector-search-good-forWhat's vector search good for?

Vector search is useful for lots of different AI applications. For several years it's been used for face matching and object matching in photos. And now with the explosion of interest in Large Language Models (LLMs) and generative AI since ChatGPT was released by OpenAI, vector search has been shown to be incredibly powerful for semantic search of text and Retrieval Augmented Generation (RAG).

These applications all start with an AI model that takes a complex object — like a piece of text or a photo — and converts that object into a high-dimensional vector, called a vector embedding. A typical number of dimensions is anywhere from 64 to 2,000 – much higher than the two or three dimensions we're used to thinking about!

ANN search uses a vector index structure to find the k approximate nearest neighbors to a query vector, and fast. Obviously, exact k-nearest neighbor (KNN) search gives a better quality result than ANN search. But for finding, say, the top 15 matches out of a billion vectors at high concurrency with reasonable hardware cost, ANN search is a must.

Many algorithms have been developed for ANN search. Some of the most popular ones are inverted file (IVF) and hierarchical navigable small world (HNSW). An important enhancement to these techniques is Product Quantization (PQ) which compresses vectors, reducing memory and speeding up search.

Beginning with release 8.5, SingleStore supports the following index types: FLAT, IVF_FLAT, IVF_PQ, IVF_PQFS, HNSW_FLAT, HNSW_PQ and AUTO (auto index algorithm). For full details, check out our documentation.  Here, we'll focus on the main concepts: IVF, HNSW and PQ.

Recall

Different approaches to nearest neighbor search have different levels of recall. Recall is the fraction of the actual top k-nearest neighbors that are retrieved in the k values actually returned. Using no index, you get a perfect recall of 1.0. Using ANN search, depending on the algorithm chosen, the data and the settings, you may get recall values in a broad range, say 0.10 to 1.0.

ivfIVF

Inverted File (IVF) partitions the vectors into clusters of nearby vectors.

It builds an inverted index from centroids to vectors so that during the search, we can find nearby centroids to the query vector — and only search vectors from those clusters.

Given typical parameter settings, the search time for IVF is roughly proportional to the square root of n for n total vectors.

product-quantizationProduct Quantization

Product Quantization (PQ) is a technique used for vector compression. It is very effective in compressing high dimensional vectors for nearest-neighbor search, often compressing them by up to a factor of 64.

If you're a digital photographer, you may be familiar with JPEG compression which can compress raw images to be much smaller than their original size. JPEG is a lossy compression scheme — it loses a tiny bit of fidelity in the picture for a huge reduction in size, and is almost imperceptible to the naked eye. PQ is a lossy compression scheme for vectors, like JPEG is for images. And like JPEG, PQ saves a lot of space with very little loss in quality.

IVFPQ uses PQ to compress vectors, reducing the memory needed for the index and speeding up vector search times since not as much data must be examined to compare vectors.

hnswHNSW

The HNSW approach is a clever, fascinating form of vector similarity indexing that gives fast searches with good recall.

Before tackling the "hierarchical" nature of HNSW, let's look at just Navigable Small World (NSW) without the hierarchy. NSW is inspired by the mythical concept of “Six Degrees of Separation”: all people are six or fewer social connections away from each other, but at the same time each person typically maintains a relatively small number of friends. For this type of  graph, ANN can be solved with an efficient local search algorithm. Here's an example NSW graph:

One approach to building this graph is to add nodes to the graph in random order, connecting each new node to its three nearest neighbors (or some other suitable small number). To query the graph, take the query vector, start at a random node and examine its neighbors. Go to the neighbor closest to the query vector. Repeat this process until you find the closest node to the query vector with no neighbors even closer.

You can repeat this process several times and return the K closest values to the query vector found, as the result of an approximate KNN search.

Now, let's add hierarchy to the NSW idea to create HNSW. You can build several NSW graphs stacked on each other, with the bottom layer (layer 0) having the full graph, and the next layer up having an NSW graph built from a random subset of the nodes in the layer below it and exponentially fewer nodes than the layer below it. Each node in a layer other than the bottom layer is connected directly to the same node in the layer below.

Here's an example three-layer HNSW structure:

To search this, given a query vector q, you do an NSW search on the top layer. For the closest vector found in that layer, you drop down to the same node in the next layer and continue the search. This gets you continually closer to the "closest" nodes in the structure; the search on one layer puts you in the right neighborhood in the next layer down to make the search in that layer faster. You repeat this process until reaching the bottom layer.

Remember all the nodes visited. Repeat this some number of times proportional to k, then return the closest k vectors found.

The average search time on HNSW is proportional to log(number of vectors in the index) empirically. And, the index size is large and it needs to be in memory because of the random seek patterns on the graph.

Because HNSW takes a significant amount of RAM and time to build, consider that when deciding what kind of index to choose. Fortunately, there is a PQ-based variation of HNSW called HNSW_PQ that gives great performance and good recall with a lot less memory than straight HNSW; it has a small loss of recall due to the lossy compression of PQ.

ann-search-in-single-storeANN search in SingleStore

SingleStore 8.5 supports IVF, IV-FPQ and HNSW based on the implementations of these algorithms in Facebook AI Similarity Search (Faiss), a popular open-source ANN search library originally developed by Facebook, now known as Meta.

Pluggable architecture to enable future improvements

Vector index algorithms are a subject of active research and development, with numerous proposals featuring different trade-offs. New algorithms — and enhancements to existing ones — are consistently being put forth each year.

Given this dynamic landscape, we believe it is crucial to support a broad spectrum of vector index algorithms, enabling users to choose the one that best suits their use case. Our vector index is implemented in a way that allows us to easily integrate various vector index implementations.

In 8.5, we provide support for several significant in-memory vector index algorithms, including the state-of-the-art partition-based algorithm IVFPQFastScan and the graph-based algorithm HNSWPQ.

We are committed to continuously improving our vector index algorithms, enhancing existing ones, and incorporating new ones. Specifically, we are exploring the addition of on-disk vector index algorithms post-8.5, index algorithms optimized for GPUs and fast GPU-based index construction for existing index types.

Furthermore, we introduce support for AUTO index, enabling our system to make indexing decisions instead of users manually selecting vector index algorithms and tuning their parameters. AUTO index strives to choose the best vector index algorithm for each data slice. This feature ensures that, after an upgrade, you can seamlessly transition to new and improved algorithms without experiencing downtime.

vector-data-typeVector data type

To make it easier to work with vector data, we've added a new vector data type to SingleStore. Previously, it was necessary to store vectors as blobs, using functions to operate on those blobs that understood their vector format. This was workable, but a vector type has these benefits:

  • Self-documenting
  • Allows writing apps with less code
  • Provides improved error checking

For example, you can create a table with a vector column like so:

create table articles(id int, title varchar(300),
publication text, title_embedding vector(1536));

And insert data into it like so:

insert articles values(1, "Google Updates Bard Chatbot With 'Gemini' A.I. as It Chases ChatGPT", "NY Times", "[0.3, 0.2, ... 0.4]");

The JSON array of numbers in the last position will be cast internally to a packed binary vector with the same logic used in JSON_ARRAY_PACK().

If you try to perform an operation like DOT_PRODUCT()  on vector columns of different lengths when they are expected to be the same length, you'll get an error at compile time. Prior to 8.5 when using blobs for vectors, errors were only caught at runtime. This new compile-time error checking can help you build bug-free apps quicker.

We've also added infix operators <*> for DOT_PRODUCT and <-> for EUCLIDEAN_DISTANCE. All these changes make it easier to write vector-similarity-based apps. This is just a short introduction to the vector type. A complete blog on the topic is coming soon!

ann-index-example-performanceANN index example + performance

Here we've created a realistic example of semantic search and RAG enabled by the new indexed ANN search features in SingleStore. Imagine you want to do vector searches on all of Wikipedia — which has over 6,765,536 million articles. Let's estimate there are about 24 paragraphs per article, and we'll get a vector for each paragraph. You'd have about 160 million paragraphs, and thus about 160 million vectors to search.

We know SingleStore can do full-scan search (with perfect recall) for a few million vectors economically. But at the scale of 160 million vectors, we need indexed ANN search to get interactive (sub-second) response time. Add many concurrent users to the mix, and ANN search is even more important for this application.

We didn't have the time or want to invest money to get vectors for all of Wikipedia using OpenAI! But to show how this can work, we scraped about 1,800 articles about video games from wikipedia, based on a list of good articles about video games. We got about 41,000 real OpenAI ada-002 embeddings for the paragraphs in these articles. Then we generated 160M mock vectors, quite different from the real vectors.

This allows us to ask questions of the entire data set, but only about the subject area of video games, to get meaningful results. Since these are real OpenAI embeddings, we can do RAG using OpenAI based on the results. See the appendix for details on how we generated the data set.

The generated vectors were loaded into a table with the following schema:

create table vecs(
id bigint(20), url text default null,
paragraph text default null, v vector(1536) not null,
shard key(id), key(id) using hash, fulltext (paragraph)
);

Then, we scraped data from Wikipedia directly into SingleStore. Finally, we added a simple index on our vector field: an Inverted File vector index (IVF_FLAT), in order to facilitate the ANN scan over our vectors:

alter table vecs add vector index ivf (v) INDEX_OPTIONS
'{"index_type": "IVF_FLAT"}'

With our schema and index in place, let’s run some sample queries and our RAG-based chat!

Similarity search with a top-k filter

First, we define a base query vector. An interesting choice would be to look for paragraphs related to Nintendo. Let us use our previously defined full-text index to grab that vector:

set @qv = (select v from vecs where match (paragraph) against
('Nintendo') limit 1);

It turns out that in our test this retrieves a vector for a paragraph "Rad Racer was ranked number 57 on IGN's Top 100 Nintendo…" shown later. So it's a good proxy for a query asking about Nintendo’s Rad Racer.

Our main query is to simply find the top three  most similar vectors in our database to the @qv query vector. This query is very similar to what we see being used in production on vector data in SingleStore.

-- variation 1: does not use the vector index
select paragraph, v <*> @qv as sim
from vecs
order by sim use index () desc
limit 3;
-- variation 2: uses the vector index
select paragraph, v <*> @qv as sim
from vecs
order by sim desc
limit 3;

The first query will *not* use the vector index for the search, as evidenced by the use index () clause. By default, SingleStore will use the vector index to perform the scan if it exists, as in the second query.

In the first query, the resulting response time was around 3-4 seconds, doing a full scan without the use of the IVF index. However in the second query, using the IVF index, the resulting response time was around 400 ms once the index was warmed up properly. The results from the query are listed in the following table.

Not only were we able to speed up the similarity search by an order of magnitude, but using this index with the default parameter settings gave perfect recall — which also held up to the top 15 results!

paragraphsim
Rad Racer was ranked number 57 on IGN's Top 100 Nintendo Entertainment System games and was called "iconic" and one of the NES's premier racing games. Maxim included the game amongst thirteen others in their greatest 8-bit video games of all time list.1.0
Rad Racer was met with favorable reviews, enjoyed commercial success, and sold 1.96 million copies. It also ranked 8th on Nintendo Power's player's poll Top 30. Famitsu praised the sense of speed but felt the game was slightly monotonous. Japanese publication Family Computer Magazine applauded the variety of game landscapes found in different levels. British magazine Computer and Video Games called it an "extremely playable racing game" and said "things get very fast and competitive as you get further into the game0.92927
The game sold 1.96 million copies and is considered one of the best racing games on the NES, but was criticized as being derivative of other racing games from the period. Reviewers widely compared the game to Out Run, though opined that Rad Racer was different in some ways, and they praised the sense of speed. The game appeared in the 1989 film The Wizard and was one of three games to feature a unique competition course in the 1990 Nintendo World Championship.0.92120

In the video demo referenced previously, we used IVF_PQ instead of IVF_FLAT. We got good results there as well, with the top 6 related to Nintendo’s Rad Racer video game.

ann-index-and-ragANN index and RAG

We have seen the power of SingleStore’s new ANN indexes. A simple IVF index with our default settings improved the performance of the similarity search query that we wrote by an order of magnitude! Now, as noted before, this type of query involving vector search is extremely powerful when it comes to building RAG-based applications.

Next, let's go through the results of a simple RAG use case with the data we have collected and generated. Here is the similarity search we have used:

select paragraph, v <*> :query_embedding :> vector(1536) as sim
from vecs
order by sim use index (ivf) desc
limit 5;

The response times for this type of similarity search were in the range of 100-500 milliseconds on average, in line with the response times we got from the query in the previous section. Here is a sample question and response from the chatbot:

ask_wiki_page("Tell me about the Mario video game franchise and its history. What is it known for?", limit=15)

Searching for matches...

Search complete in 0.18604707717895508 seconds.

Asking Chatbot...

'The Mario video game franchise is a long-running series of platform games developed and published by Nintendo. It began with the release of the arcade game Mario Bros. in 1983, designed by Shigeru Miyamoto and Gunpei Yokoi. The success of this game led to the development of Super Mario Bros., which became a major hit and spawned many successors in the Super Mario series.

Super Mario Bros. introduced the iconic characters Mario and Luigi and popularized the side-scrolling platform genre. It is considered one of the most influential and important video games of all time. The franchise has since expanded to include various spin-offs and sub-series, such as Super Mario Kart, Super Mario RPG, and Mario & Luigi.

The franchise is known for its gameplay, humor, and graphics. It has also been a commercial success, with over 193 million units of Mario games sold as of January 2007, making it one of the best-selling video game franchises of all time.

For more detailed information, you can refer to the Wikipedia articles on the Mario franchise (https://en.wikipedia.org/wiki/Mario) and Super Mario series (https://en.wikipedia.org/wiki/Super_Mario).'

For more details along with the actual python code that we used to scrape the video game articles from Wikipedia and do RAG with OpenAI using this data, please refer to this GitHub repository. See the readme.md file for more instructions on how to set up the environment and run the code.

We have also stored the video-game-specific data in an open AWS S3 bucket. For details on how to get this data into your own database using SingleStore Pipelines, please refer to the appendix.

conclusionConclusion

SingleStore is proud to enter the era of indexed ANN vector search and RAG as the only modern, distributed SQL database that enables you to build transactional, analytical, full-text and vector search applications all on one platform. As a developer, it gives you the power of SQL for vector queries — enabling filters, aggregates, joins, ordering, common table expressions, window functions and more. This saves you implementation time, data transfer time, code and money compared to using an SVDB.

We want to thank the entire Wikipedia organization and Wikipedia video game enthusiast community for the wonderful articles they've created! Please donate to Wikipedia the next time you use it!

Appendix: Loading vector data for Wikipedia about video games and mock data for the rest

This appendix details the process of generating 160,000,000 vectors in a table in SingleStore, and then loading the Wikipedia video game data from our open AWS S3 bucket into the same table. To accommodate the full data set, we recommend using an S32 cloud workspace to run this example. If you'd like to experience this demo with a smaller workspace, scale down the set of mock vectors from 160,000,000 to a smaller size. E.g. try an S2 workspace and 10,000,000 mock vectors.

Helper functions

These functions are defined to help generate a random, normalized, 1536-dimensional vector. 1536 was chosen as the dimensionality because the OpenAI embeddings from text-ada-002 are of that dimension.

set sql_mode = pipes_as_concat;
delimiter //
create or replace function randbetween(a float, b float) returns float
as
begin
return (rand()*(b - a) + a);
end //
create or replace function gen_vector(length int) returns text as
declare s text = "[";
begin
if length < 2 then
raise user_exception("length too short: " || length);
end if;
for i in 1..length-1 loop
s = s || randbetween(-1,1) || "," ;
end loop;
s = s || randbetween(-1,1) || "]";
return s;
end //
create or replace function normalize(v blob) returns blob as
declare
squares blob = vector_mul(v,v);
length float = sqrt(vector_elements_sum(squares));
begin
return scalar_vector_mul(1/length, v);
end //
create or replace function norm1536(v vector(1536)) returns vector(1536) as
begin
return normalize(v);
end //
create or replace function nrandv1536() returns vector(1536) as
begin
return norm1536(gen_vector(1536));
end //
delimiter ;

Loop for generating the vectors

Run this code to generate 160,000,000 random vectors of dimensionality 1536 directly in the below table. You can change num_rows before starting it to load a different amount of vectors.

create table vecs(
id bigint(20),
url text default null,
paragraph text default null,
v vector(1536) not null,
shard key(id),
key(id) using hash,
fulltext (paragraph)
);
insert into vecs (id, v) values (1, nrandv1536());
delimiter //
do
declare num_rows bigint = 160000000;
declare c int;
begin
select count(*) into c from vecs;
while c < num_rows loop
insert into vecs (id, v)
select id + (select max(id) from vecs), nrandv1536()
from vecs
where id <= 128*1024; /* chunk size 128K so we can see progress */
select count(*) into c from vecs;
end loop;
end //
delimiter ;
create table vecs(
id bigint(20), url text default null,
paragraph text default null, v vector(1536) not null,
shard key(id), key(id) using hash, fulltext (paragraph)
);

Pipeline to load the Wikipedia data

As previously mentioned, we have the Wikipedia text data stored in a csv file in an open S3 bucket with URI 's3://wikipedia-video-game-data/video-game-embeddings(1).csv'. To load the video game data, run the following statements.

- since the bucket is open, you can leave the credentials clause as-is
create or replace pipeline `wiki_pipeline` as
load data S3 's3://wikipedia-video-game-data/video-game-embeddings(1).csv'
config '{"region":"us-west-1"}'
credentials '{"aws_access_key_id": "", "aws_secret_access_key": ""}'
skip duplicate key errors
into table vecs
format csv
fields terminated by ','
enclosed by '"'
lines terminated by '\r\n';
start pipeline wiki_pipeline foreground;

RAG

If you would like to test the RAG application demonstrated in the video yourself, please refer to this GitHub repository, and follow the instructions in the readme to run the demo.


Share