Similarity Search on Vector Data
Notebook
Note
This notebook can be run on a Free Starter Workspace. To create a Free Starter Workspace navigate to Start using the left nav. You can also use your existing Standard or Premium workspace with this Notebook.
What's in this notebook:
Create and use a database.
Create a table to hold vector data and load data.
Search based on vector similarity.
Search using metadata filtering.
Create and use a vector index.
Check that your query is using a vector index.
Clean up.
Questions?
Reach out to us through our forum.
1. Create and use a database.
To use this notebook, you need to have an active workspace and have selected a database to use. Please select a database using the dropdown above.
2. Create a table to hold vector data and load data.
The SQL below creates a table to hold comments as one might find on a restaurant review site. The table contains the comment itself stored as a TEXT
column and a vector embedding of that comment stored as a VECTOR
(Vector Type) column. Working with Vector Data provides more details on this example and information about similarity search over vectors.
In [1]:
%%sqlCREATE TABLE comments /* Creating table for sample data. */(id INT NOT NULL PRIMARY KEY,comment TEXT,comment_embedding VECTOR(4) NOT NULL,category VARCHAR(256));
In [2]:
%%sqlINSERT INTO comments VALUES(1, "The cafeteria in building 35 has a great salad bar",'[0.2, 0.11, 0.37, 0.05]',"Food"),(2, "I love the taco bar in the B16 cafeteria.",'[0,0.800000012,0.150000006,0]',"Food"),(3, "The B24 restaurant salad bar is quite good.",'[0.1, 0.15, 0.37, 0.05]',"Food");
Verify the data was loaded
Use the following SQL to view the data in the comments
table.
In [3]:
%%sqlSELECT * FROM comments;
3. Search based on vector similarity.
To find the most similar vectors in a query vector, use an ORDER BY… LIMIT…
query. The ORDER BY
command will sort the vectors by a similarity score produced by a vector similarity function, with the closest matches at the top.
The SQL below sets up a query vector, then uses the DOT_PRODUCT
infix operator (<*>
) to find the two vectors that are most similar to the query vector.
In [4]:
%%sqlSET @query_vec = ('[0.09, 0.14, 0.5, 0.05]'):>VECTOR(4):>BLOB;SELECT id, comment, category,comment_embedding <*> @query_vec AS scoreFROM commentsORDER BY score DESCLIMIT 2;
4. Search using metadata filtering.
When building vector search applications, you may wish to filter on the fields of a record, with simple filters or via joins, in addition to applying vector similarity operations.
The following query combines the use of an ORDER BY ... LIMIT
query and a metadata filter on category. This query will filter to find all comments in the category "Food"
and then calculate the score for each of those and rank in descending order.
In [5]:
%%sqlSET @query_vec = ('[0.44, 0.554, 0.34, 0.62]'):>VECTOR(4):>BLOB;SELECT id, comment, category,comment_embedding <*> @query_vec AS scoreFROM commentsWHERE category = "Food"ORDER BY score DESCLIMIT 3;
5. Create and use a vector index.
The command below creates a vector index on the comment_embedding
field of the comments
table.
In [6]:
%%sqlALTER TABLE comments ADD VECTOR INDEX ivf(comment_embedding)INDEX_OPTIONS '{"index_type":"IVF_FLAT"}';
Optionally optimize the table for best performance.
In [7]:
%%sqlOPTIMIZE TABLE comments FULL;
The following query will use the vector index. Vector indexes can be used to improve performance of queries over large vector data sets. Refer to Vector Indexing for information on creating and using vector indexes.
In [8]:
%%sqlSET @query_vec = ('[0.44, 0.554, 0.34, 0.62]'):>VECTOR(4):>BLOB;SELECT id, comment, category,comment_embedding <*> @query_vec AS scoreFROM commentsORDER BY score DESCLIMIT 2;
6. Check that your query is using a vector index.
The EXPLAIN
command can be used to see the query plan and verify that the vector index is being used. In the example below, you can see INTERNAL_VECTOR_SEARCH
in the ColumnStoreFilter
row. This tells you that the vector index is being used.
In [9]:
%%sqlSET @query_vec = ('[0.09, 0.14, 0.5, 0.05]'):>VECTOR(4):>BLOB;EXPLAINSELECT id, comment, category,comment_embedding <*> @query_vec AS scoreFROM commentsORDER BY score DESCLIMIT 2;
7. Clean up.
The command below will drop the table created as part of this notebook. Dropping this table will allow you to rerun the notebook from the beginning.
In [10]:
%%sqlDROP TABLE comments;
Details
About this Template
Example of similarity searches over vector data.
This Notebook can be run in Shared Tier, Standard and Enterprise deployments.
Tags
License
This Notebook has been released under the Apache 2.0 open source license.