Why Your Vector Database Should Still Not be a Vector Database

EH

Eric Hanson

Director of Product Management

A year ago we made the case that your vector database should not be a vector database, but, rather, a modern SQL database that supports vector search. We think time has shown that we were right. Here's why.

Why Your Vector Database Should Still Not be a Vector Database

SQL databases enhanced with vector search are the best place to build intelligent applications. Application software already exists to support all kinds of business processes spanning finance, healthcare, manufacturing, telecom, media and entertainment, education, retail, utilities, energy and more. And the majority of these applications are running on SQL databases.

The kinds of enhancements that are being made to modern apps based on AI include summarization of content (text and structured data), question answering, semantic search, image and video understanding, information extraction and entity resolution. These are so compelling that they'll have to become part of the primary application that owns the data.

The architects of these applications know they have to enrich their apps with generative AI. Either they've already done it, they're working on it or they're about to start. Most of them will choose a SQL database to enrich their applications with gen AI, because:

  1. That's where the data is
  2. The rest of their app has been built on a SQL database
  3. SQL databases are adding high-performance vector capabilities as well

Moreover, these gen AI enhancements are not just a superficial new addition — they are fundamental to the success, adoption and future competitiveness of the apps.

Beyond the fact that the data is already in a relational format in SQL databases, another reason architects will choose to make AI enhancements to their SQL apps on their SQL databases is that those databases satisfy their other key data management requirements. These include the benefits of the relational data model and the SQL language itself, OLTP and OLAP performance, transactions, high-availability and disaster recovery and accessibility from many programming languages and tools.

If you need semistructured data support, SingleStore's JSON type gives you everything you need. It's the perfect escape hatch when you want to leave the structured relational world to accommodate unpredictable data shapes, or you just want to take JSON data directly from the app and save it. And, vector types to enrich your apps with semantic search and Retrieval Augmented Generation (RAG) can sit right next to your JSON and other structured columns.

Virtually all of our cloud customers are using JSON. If you've decided to go with NoSQL, SingleStore Kai™, our MongoDB®-compatible API, can handle your semistructured and vector/gen AI data at scale with the resilience benefits of a modern DBMS — and analytical query speeds that are 100x faster than on MongoDB.

As a modern SQL database with support for transactions and analytics at virtually any scale, SingleStore is well-positioned to support gen AI capabilities in both new and existing applications. A year ago, we made the case for the general-purpose, modern SQL database as the right platform for gen AI. In the last year or so, we’ve seen the emergence of SQL competitors to SingleStore that are vector-search-capable, notably Oracle 23ai and PostgreSQL with the pgvector extension. We see this as validation of the importance of SQL platforms for gen AI application development.

From Siemens and 6Sense to DirectlyApply and Lumana, SingleStore today enables many customers to power fast AI-enriched use cases and applications.

Also, since last year, SingleStore has made tremendous progress on vector search, adding two major enhancements: indexed approximate nearest neighbor (ANN) search and a vector data type. These were the two big gaps that separated SingleStore from specialized vector databases (SVDBs) like Pinecone, Zilliz, Milvus and Qdrant. We believe that application architects will not see an easy justification any more for using these specialized vector databases — they just add cost and complexity.

In terms of performance, our indexed vector search is extremely fast and is demonstrated to be between 47 to 100x faster than pgvector. Perhaps just as important, vector index build times for SingleStore are 2-3x faster than Milvus and pgvector. But the real value in all of these is being able to combine the vector search performance with fast SQL analytics, joins and aggregations across petabytes of structured and semi-structured data to power intelligent applications.

A big issue with these specialized systems is that the data does not live in them. So it has to be copied from its home. That's usually a general purpose database, and among those, SQL systems are the dominant ones. Having two copies of data in two places, in different formats, leads to some problems that are easy to understand. They include extra cost for people to run both systems, software licenses, hardware and/or cloud fees and data getting out of sync.

And if one of the systems has a weakness, then it becomes the weak link in the application architecture. For SVDBs, a serious area of weakness to consider relates to robustness. These systems often lack the full feature set for transactions, high availability and disaster recovery that full-featured SQL databases have.

We've argued that vector search to support gen AI belongs in the SQL database that owns the application's data. SingleStore, as an emerging platform, doesn't own the mass of data that you see today in Oracle, PostgreSQL or other systems. But we think there are good reasons to consider SingleStore for both new and enhanced applications. Many customers migrate to SingleStore from existing SQL databases — usually for reasons of performance, scalability and cost. Our customers often come to us from existing SQL platforms including Oracle, SQL Server, PostgreSQL and MySQL, for these reasons.

Vector search, while it can be done efficiently with modern indexing algorithms, still adds an extra cost. For example, a 300-character paragraph can become a 6,144-byte vector, a 20-fold increase in data size. That makes SingleStore's multi-node scalability, efficient query execution and vector indexing strategies even more important.

Stay tuned for announcements about big advances in vector search and full-text search in SingleStore that can make your RAG apps even more powerful. Build your next intelligent application on a modern, future-proof, AI-capable SQL database. Build it on SingleStore.

Start free today.


Share