AI-Powered Semantic Search in SingleStoreDB
SingleStoreDB can supercharge your apps with AI. In this blog, we demonstrate how semantic search can be performed on your data in SingleStoreDB — including code examples and a motivating case study from Siemens, a SingleStore customer.What Is Semantic Search?At its core, semantic search relies on natural language processing (NLP) to accurately interpret the context and intent behind a user's search query. Unlike traditional keyword-based search methods, semantic search algorithms take into account the relationship between words and their meanings, enabling them to deliver more accurate and relevant results — even when search terms are vague or ambiguous. Semantic search relies heavily on machine learning algorithms to identify language patterns and understand concept relationships. Embeddings are a key tool in semantic search, creating vector representations of words that capture their semantic meaning. These embeddings essentially create a "meaning space," where words with similar meanings are represented by nearby vectors.What Is SingleStoreDB?SingleStoreDB is a real-time, distributed SQL database designed to handle both transactional (OLTP) and analytical (OLAP) within a unified engine. With support for fast writes and efficient querying, SingleStoreDB excels at managing large-scale transactional workloads and delivering real-time analytics.SingleStoreDB is available as a cloud service (Singlestore Helios) or for self-hosted installation.SingleStoreDB is also a multi-model database and provides vector database extensions, in addition to support for relational, semistructured, full-text, spatial and time-series data. Its vector capabilities include built-in functions for vector similarity calculations such as cosine similarity and Euclidean distance.These functions can be leveraged in SQL queries to perform similarity calculations efficiently on large volumes of vector data. Moreover, filters on metadata (other descriptive data about objects for which you've created vector embeddings) can be easily intermixed with vector similarity search, by simply using standard SQL WHERE clause filters. An easy way to get started is to sign up for a Singlestore Helios trial — and get $600 in credits.Is SingleStoreDB the Optimal Foundation for Semantic Search in Your Applications?SingleStoreDB's patented Universal Storage supports both OLTP and OLAP workloads, making it ideal for semantic search use cases. Adding embeddings to your data is simple — just place the vector data in a binary or blob column, using json_array_pack() or unhex() functions.Efficient retrieval of high-dimensional vectors and handling of large-scale vector similarity matching workloads are made possible by SingleStoreDB’s distributed architecture and efficient low-level execution. You can also rely on SingleStoreDB’s built-in parallelization and Intel SIMD-based vector processing to take care of the heavy lifting involved in processing vector data.This enables you to achieve fast and efficient vector similarity matching without the need for parallelizing your application or moving lots of data from your database into your application. We previously benchmarked the performance of our vector matching functions in our blog, “Image Matching in SQL with SingleStoreDB.” We ran the dot_product function as a measure of cosine similarity on 16 million records in just 5 milliseconds.With its support for SQL, SingleStoreDB provides developers with a familiar and powerful interface for building semantic search applications. SQL can be used to create complex queries and perform advanced analytics on the text data stored in SingleStoreDB. In fact, with just one line of SQL, developers can run a semantic search algorithm on their vector embeddings, as demonstrated in the following example.SingleStoreDB's ability to update and query vector data in real-time enables us to power applications that continuously learn and adapt to new inputs, providing users with increasingly precise and tailored responses over time. By eliminating the need for periodic retraining of machine-learning models or other time-consuming processes, SingleStoreDB allows for seamless and efficient provision of real-time insights.See Semantic Search with SingleStoreDB in Action!The following tutorial will guide you through an example of adding embeddings to each row in your SingleStoreDB database using OpenAI APIs, enabling you to run semantic search queries in mere milliseconds using Python. Follow along to add embeddings to your dataset in your desired Python development environment.Our goal in this example is to extract meaningful insights from a hypothetical company’s employee review dataset by leveraging the power of semantic search. By using OpenAI's Embeddings API and vector matching algorithms on SingleStoreDB, we can conduct sophisticated queries on the reviews left by employees about their company. This approach allows us to delve deeper into the true sentiments of employees, without being constrained by exact keyword matches.Step 1: Install and import dependencies in your environmentInstall the following dependencies in your development environment using pip3.pip3 install mysql.connector openai matplotlib plotly pandas scipy scikit-learn requestsThen start python3 — and at the python3 command prompt, import the following dependencies.import osimport openaiimport jsonfrom openai.embeddings_utils import get_embeddingimport mysql.connectorimport requestsStep 2: Create an OpenAI account and get API connection detailsTo vectorize and embed the employee reviews and query strings, we leverage OpenAI's embeddings API. To use this API you will need an API key, which you can get here. You'll need to add a payment method to actually get vector embeddings using the API, though the charges are minimal for a small example like we present here. Once you have your key, you can add it to your environment variables as OPENAI_API_KEY.os.environ["OPENAI_API_KEY"] = 'youropenAIAPIKey'openai.api_key = os.getenv("OPENAI_API_KEY")Step 3: Sign up for your free SingleStoreDB trial and add your connection details to your Python environmentWe'll go through the example using Singlestore Helios, but of course you can self-host it and run the example in a similar way. If you're going to use our cloud, sign up for your Singlestore Helios trial and get $600 in credits.First, create your workspace using the + icon next to your desired workspace group. S-00 is sufficient for this use case.