New

Similarity Search on Vector Data

Notebook

SingleStore Notebooks

Similarity Search on Vector Data

Note

This notebook can be run on a Free Starter Workspace. To create a Free Starter Workspace navigate to Start using the left nav. You can also use your existing Standard or Premium workspace with this Notebook.

What's in this notebook:

Create and use a database.
Create a table to hold vector data and load data.
Search based on vector similarity.
Search using metadata filtering.
Create and use a vector index.
Check that your query is using a vector index.
Clean up.

Questions?

Reach out to us through our forum.

1. Create and use a database.

To use this notebook, you need to have an active workspace and have selected a database to use. Please select a database using the dropdown above.

2. Create a table to hold vector data and load data.

The SQL below creates a table to hold comments as one might find on a restaurant review site. The table contains the comment itself stored as a TEXT column and a vector embedding of that comment stored as a VECTOR (Vector Type) column. Working with Vector Data provides more details on this example and information about similarity search over vectors.

In [1]:

1
%%sql
2
CREATE TABLE comments /* Creating table for sample data. */
3
    (id INT NOT NULL PRIMARY KEY,
4
   comment TEXT,
5
   comment_embedding VECTOR(4) NOT NULL,
6
   category VARCHAR(256));

In [2]:

1
%%sql
2
INSERT INTO comments VALUES
3
      (1, "The cafeteria in building 35 has a great salad bar",
4
       '[0.2, 0.11, 0.37, 0.05]',
5
       "Food"),
6
      (2, "I love the taco bar in the B16 cafeteria.",
7
       '[0,0.800000012,0.150000006,0]',
8
       "Food"),
9
      (3, "The B24 restaurant salad bar is quite good.",
10
       '[0.1, 0.15, 0.37, 0.05]',
11
       "Food");

Verify the data was loaded

Use the following SQL to view the data in the comments table.

In [3]:

1
%%sql
2
SELECT * FROM comments;

3. Search based on vector similarity.

To find the most similar vectors in a query vector, use an ORDER BY… LIMIT… query. The ORDER BY command will sort the vectors by a similarity score produced by a vector similarity function, with the closest matches at the top.

The SQL below sets up a query vector, then uses the DOT_PRODUCT infix operator (<*>) to find the two vectors that are most similar to the query vector.

In [4]:

1
%%sql
2
SET @query_vec = ('[0.09, 0.14, 0.5, 0.05]'):>VECTOR(4):>BLOB;
3

4
SELECT id, comment, category,
5
         comment_embedding <*> @query_vec AS score
6
    FROM comments
7
    ORDER BY score DESC
8
    LIMIT 2;

4. Search using metadata filtering.

When building vector search applications, you may wish to filter on the fields of a record, with simple filters or via joins, in addition to applying vector similarity operations.

The following query combines the use of an ORDER BY ... LIMIT query and a metadata filter on category. This query will filter to find all comments in the category "Food" and then calculate the score for each of those and rank in descending order.

In [5]:

1
%%sql
2
SET @query_vec = ('[0.44, 0.554, 0.34, 0.62]'):>VECTOR(4):>BLOB;
3

4
SELECT id, comment, category,
5
         comment_embedding <*> @query_vec AS score
6
    FROM comments
7
    WHERE category = "Food"
8
    ORDER BY score DESC
9
    LIMIT 3;

5. Create and use a vector index.

The command below creates a vector index on the comment_embedding field of the comments table.

In [6]:

1
%%sql
2
ALTER TABLE comments ADD VECTOR INDEX ivf(comment_embedding)
3
INDEX_OPTIONS '{"index_type":"IVF_FLAT"}';

Optionally optimize the table for best performance.

In [7]:

1
%%sql
2
OPTIMIZE TABLE comments FULL;

The following query will use the vector index. Vector indexes can be used to improve performance of queries over large vector data sets. Refer to Vector Indexing for information on creating and using vector indexes.

In [8]:

1
%%sql
2
SET @query_vec = ('[0.44, 0.554, 0.34, 0.62]'):>VECTOR(4):>BLOB;
3

4
SELECT id, comment, category,
5
         comment_embedding <*> @query_vec AS score
6
    FROM comments
7
    ORDER BY score DESC
8
    LIMIT 2;

6. Check that your query is using a vector index.

The EXPLAIN command can be used to see the query plan and verify that the vector index is being used. In the example below, you can see INTERNAL_VECTOR_SEARCH in the ColumnStoreFilter row. This tells you that the vector index is being used.

In [9]:

1
%%sql
2
SET @query_vec = ('[0.09, 0.14, 0.5, 0.05]'):>VECTOR(4):>BLOB;
3

4
EXPLAIN
5
SELECT id, comment, category,
6
         comment_embedding <*> @query_vec AS score
7
    FROM comments
8
    ORDER BY score DESC
9
    LIMIT 2;

7. Clean up.

The command below will drop the table created as part of this notebook. Dropping this table will allow you to rerun the notebook from the beginning.

In [10]:

1
%%sql
2
DROP TABLE comments;

Details

About this Template

Example of similarity searches over vector data.

This Notebook can be run in Shared Tier, Standard and Enterprise deployments.

License

This Notebook has been released under the Apache 2.0 open source license.

See Notebook in action

Launch this notebook in SingleStore and start executing queries instantly.

Similarity Search on Vector Data

Notebook

Similarity Search on Vector Data

What's in this notebook:

Questions?

1. Create and use a database.

2. Create a table to hold vector data and load data.

Verify the data was loaded

3. Search based on vector similarity.

4. Search using metadata filtering.

5. Create and use a vector index.

6. Check that your query is using a vector index.

7. Clean up.

Details

About this Template

Tags

License

See Notebook in action