SingleStore Brings High Performance to Vector Search

SingleStore is known for its high performance — and now we’re bringing that performance to vector search.

Benchmarking shows that SingleStore's indexed vector Approximate Nearest Neighbor (ANN) search is orders of magnitude faster than pgvector (PostgreSQL), and performs comparably to Milvus, a leading specialized vector database.

With even more performance improvements in development, we are confident of being able to deliver comparable — or better performance — than vector-only databases like Pinecone, Weaviate, Qdrant, Chroma DB, etc., as well as other vector-capable databases like Elasticsearch, MongoDB®, etc.

You can board the SingleStore vector search train with confidence that it will take you and your gen AI applications where you want to go.

SingleStore for vector search

With SingleStore, not only do you get high-performance vector search, you get a modern, scalable data platform with real-time analytics. In SingleStore, you can combine indexed vector search with full-text indexing. And, you can combine vector search with queries over other data types including JSON, time-series, full text, spatial and key-value data. It's simple: with SingleStore, you get high-performance indexed vector search with standard SQL capabilities. That's powerful.

SingleStore is fast and scalable, supporting both high-performance transaction processing and fast analytics. It is a distributed system with a scale-out architecture and has support for ANSI SQL, ACID transactions, full-text search, high availability, disaster recovery, point-in-time recovery, programmability and extensibility.

Vector search performance results

Indexed ANN search is important for applications that need to find the top-N vectors in large vector data sets (e.g., a few million to over a billion vectors) where exact k-nearest neighbor (KNN) search is cost prohibitive.

We recently collaborated with researchers from the Purdue Database Group on an upcoming paper. Part of this work involves benchmarking SingleStore against Milvus, a leading specialized vector database, and pgvector, a vector database extension of the popular PostgreSQL database. Our benchmarking shows SingleStore has comparable performance to Milvus using HNSW (graph-based) and IVF (quantization-based) indexes, and outperforms pgvector.

The metrics we used are throughput (measured as queries per second), recall and index build time. Recall is used to evaluate the accuracy of the returned results.

Indexes and datasets used

We benchmarked both graph-based Hierarchical Navigable Small Worlds (HNSW) indexes and inverted file (IVF) with quantization indexes on SIFT in four sizes (SIFT1M, SIFT10M, SIFT100M, SIFT1B), GIST1M and Cohere10M using VectorDBBench.

Results: Performance

For HNSW, SingleStore has 81.8-94.7% QPS of Milvus and is 1.7-2.6x faster than pgvector
For inverted file indexes with product quantization-encoded vectors (IVF_PQFS), SingleStore is 78.7-98.9% QPS of Milvus and 47-100x faster than pgvector's IVF_FLAT (no product quantization) index.

Figures 1 and 2 show the QPS-Recall comparison on the GIST1M and Cohere10M datasets, respectively. The results demonstrate SingleStore's comparable performance to Milvus.

Note that pgvector could not be run on the Cohere10M dataset, and the GIST1M graph uses a log scale so that all three databases can be included on one plot.

*Figure 1: QPS-Recall comparison on GIST1M*

*Figure 2: QPS-Recall comparison on Cohere10M*

Results: Index Build Time

Index building with SingleStore is consistently faster than both Milvus and pgvector. This enables use cases for applications that are generating a significant amount of data that needs to be vectorized for immediate query processing.

Overall Milvus required 1.6-6.0x times longer to build indexes than SingleStore, and pgvector required 4.5- 74.8x longer to build indexes. SingleStore's higher performance is due to SingleStore's optimized API for loading data into the table before index building.

Figures 3 and 4 illustrate SingleStore’s faster index construction time versus both pgvector and Milvus, for GIST1M and Cohere10M datasets respectively. Note that both plots use log scales. Lower time is better on these graphs.

*Figure 3: Index construction time on GIST1M*

*Figure 4: Index construction time on Cohere10M*

Conclusion + roadmap

The results show that SingleStore is bringing high-performance, general-purpose database features and enterprise-grade reliability to the vector database space. SingleStore, a general-purpose database, delivers comparable performance to Milvus (a database that only works with vector data), and better performance than pgvector (an extension to PostgreSQL, a popular SQL database).

It is also important to note that the vector benchmark is tailored for vector databases. Unlike vector-only databases, SingleStore can be used to retrieve both structured and unstructured data at petabyte scale in a few milliseconds through just one SQL statement that also has non-trivial analytics results for a richer response.

Things will only keep getting better. SingleStore is integrating new vector index libraries including Faiss, Knowhere and others. New vector index libraries can be easily incorporated in SingleStore due to our pluggable vector index architecture. We're planning for automatic index type and parameter selection, as well as an index merger to automatically merge indexes into fewer, larger indexes for improved performance.

Finally, we're working on improved hybrid search — combining vector similarity search with full-text search and standard SQL queries. The ability to combine multiple search methods in one query is a key benefit of a vector-capable general purpose database like SingleStore.

Notes

The results we share in this blog use a new vector index library that will be publicly available in the mid-2024 release of SingleStore.

Related resources