New

Similarity Search on Vector Data

Notebook

SingleStore Notebooks

Similarity Search on Vector Data

Note

This notebook can be run on a Free Starter Workspace. To create a Free Starter Workspace navigate to Start using the left nav. You can also use your existing Standard or Premium workspace with this Notebook.

What's in this notebook:

  1. Create and use a database.

  2. Create a table to hold vector data and load data.

  3. Search based on vector similarity.

  4. Search using metadata filtering.

  5. Create and use a vector index.

  6. Check that your query is using a vector index.

  7. Clean up.

Questions?

Reach out to us through our forum.

1. Create and use a database.

To use this notebook, you need to have an active workspace and have selected a database to use. Please select a database using the dropdown above.

2. Create a table to hold vector data and load data.

The SQL below creates a table to hold comments as one might find on a restaurant review site. The table contains the comment itself stored as a TEXT column and a vector embedding of that comment stored as a VECTOR (Vector Type) column. Working with Vector Data provides more details on this example and information about similarity search over vectors.

In [1]:

%%sql
CREATE TABLE comments(id INT NOT NULL PRIMARY KEY,
comment TEXT,
comment_embedding VECTOR(4) NOT NULL,
category VARCHAR(256));

In [2]:

%%sql
INSERT INTO comments VALUES
(1, "The cafeteria in building 35 has a great salad bar",
'[0.2, 0.11, 0.37, 0.05]',
"Food"),
(2, "I love the taco bar in the B16 cafeteria.",
'[0,0.800000012,0.150000006,0]',
"Food"),
(3, "The B24 restaurant salad bar is quite good.",
'[0.1, 0.15, 0.37, 0.05]',
"Food");

Verify the data was loaded

Use the following SQL to view the data in the comments table.

In [3]:

%%sql
SELECT * FROM comments;

3. Search based on vector similarity.

To find the most similar vectors in a query vector, use an ORDER BY… LIMIT… query. The ORDER BY command will sort the vectors by a similarity score produced by a vector similarity function, with the closest matches at the top.

The SQL below sets up a query vector, then uses the DOT_PRODUCT infix operator (<*>) to find the two vectors that are most similar to the query vector.

In [4]:

%%sql
SET @query_vec = ('[0.09, 0.14, 0.5, 0.05]'):>VECTOR(4):>BLOB;
SELECT id, comment, category,
comment_embedding <*> @query_vec AS score
FROM comments
ORDER BY score DESC
LIMIT 2;

4. Search using metadata filtering.

When building vector search applications, you may wish to filter on the fields of a record, with simple filters or via joins, in addition to applying vector similarity operations.

The following query combines the use of an ORDER BY ... LIMIT query and a metadata filter on category. This query will filter to find all comments in the category "Food" and then calculate the score for each of those and rank in descending order.

In [5]:

%%sql
SET @query_vec = ('[0.44, 0.554, 0.34, 0.62]'):>VECTOR(4):>BLOB;
SELECT id, comment, category,
comment_embedding <*> @query_vec AS score
FROM comments
WHERE category = "Food"
ORDER BY score DESC
LIMIT 3;

5. Create and use a vector index.

The command below creates a vector index on the comment_embedding field of the comments table.

In [6]:

%%sql
ALTER TABLE comments ADD VECTOR INDEX ivf(comment_embedding)
INDEX_OPTIONS '{"index_type":"IVF_FLAT"}';

Optionally optimize the table for best performance.

In [7]:

%%sql
OPTIMIZE TABLE comments FULL;

The following query will use the vector index. Vector indexes can be used to improve performance of queries over large vector data sets. Refer to Vector Indexing for information on creating and using vector indexes.

In [8]:

%%sql
SET @query_vec = ('[0.44, 0.554, 0.34, 0.62]'):>VECTOR(4):>BLOB;
SELECT id, comment, category,
comment_embedding <*> @query_vec AS score
FROM comments
ORDER BY score DESC
LIMIT 2;

6. Check that your query is using a vector index.

The EXPLAIN command can be used to see the query plan and verify that the vector index is being used. In the example below, you can see INTERNAL_VECTOR_SEARCH in the ColumnStoreFilter row. This tells you that the vector index is being used.

In [9]:

%%sql
SET @query_vec = ('[0.09, 0.14, 0.5, 0.05]'):>VECTOR(4):>BLOB;
EXPLAIN
SELECT id, comment, category,
comment_embedding <*> @query_vec AS score
FROM comments
ORDER BY score DESC
LIMIT 2;

7. Clean up.

The command below will drop the table created as part of this notebook. Dropping this table will allow you to rerun the notebook from the beginning.

In [10]:

%%sql
DROP TABLE comments;

Details

Tags

#starter#openai#genai#vectordb

License

This Notebook has been released under the Apache 2.0 open source license.