If you haven’t already, check out SQrL, the new SingleStore chatbot dedicated to assisting you in learning, developing and building your real-time applications on SingleStoreDB.
In this blog post, we'll dive into the process of how two novice developers, Arnaud and Pranav, built a fun little AI movie recommender on SingleStoreDB and OpenAI. Unlike some of the 10x engineers at SingleStore, we are relatively new to building generative AI apps. Here’s how we were able to spin up this app in just a weekend with the help of SQrL, SingleStore's AI bot that is hyper-trained on its docs and resources.
Getting the Right Data and Shape
The first step in creating our movie recommender app was finding the right dataset and bringing it into SingleStoreDB. We used the full MovieLens 25M Dataset. SQrL and SingleStore Notebooks made this process incredibly easy.
We needed to ingest the data from AWS S3 into SingleStoreDB, and used SQrL to help us with the code snippets. We also knew we needed to implement full-text search to enable users to search for movies with autocomplete — and SQrL provided us with a highly optimized schema that included full-text index, pipeline statements and even a code snippet on how to write a full-text search query!
With SQrL, you don't need to be a database expert to design optimized tables and write highly efficient queries on SingleStoreDB. SQrL did 95% of the work, and we just needed to complete the remaining 5%.
To enhance our movie matching algorithm, we've chosen to embed the tags associated with each movieID along with their corresponding titles. We will also soon incorporate the movie's summary and description into this combined data. However, the movie’s tags were stored in a separate table named “Tags,” so we needed to find a way to concatenate all the tags associated with a particular movieID into one column. To achieve this, we consulted SQrL and learned about the Group_Concat function:
Initially, we had planned to use OpenAI ADA to convert our tags and titles into a single vector. However, due to cost constraints, we opted to use the Python sentence-transformer library recommended by SQrL.
SQrL also suggested that we utilize data frames directly to create a new table with the following schema: movieID, title and vector (blob format). With some minor modifications on our end, we were able to incorporate the vector associated with each movieID into the table using JSON_ARRAY_PACK.
We built our application using Flask, a micro web framework for Python. Flask is known for its simplicity and flexibility, making it a popular choice among developers. The core logic of our application revolved around two main SQL queries on SingleStoreDB:
- Full-search query with auto-complete, allowing users to search and select amongst our 65,000 movies to add to their favorites.
- Semantic search query that considers user input to find five movies with the highest match_score.
To enhance our full-text search experience, we created a route called /autocomplete that is triggered from the frontend every time a user enters a new keystroke in the search box. This route executes an SQL query that matches the title of all movies in our database with the input from the frontend using %s.
Our native support for full-text indices enables us to quickly provide the movies with the highest relevance scores to the frontend:
data = request.get_json()
prefix = data['input'] + "*"
conn = init_connection()
query = (
"WITH queryouter AS ("
"SELECT DISTINCT(title), movieId, MATCH(title) AGAINST (%s) as
"FROM movies_with_full_text "
"WHERE MATCH(title) AGAINST (%s) "
"ORDER BY relevance DESC "
"SELECT title, movieId FROM queryouter;"
cursor = conn.cursor()
cursor.execute(query, (prefix, prefix))
rows = cursor.fetchall()
Semantic Search with Analytics
As hobbyist developers, we had a general understanding of how to implement the semantic search algorithm. We knew we needed to calculate the cosine similarity between the user-selected movies and all movies in our database, average the match scores for each movie and return the top five matches. To bring our idea to life, we consulted with SQrL and leveraged our native vector functions, like dot_product for cosine similarity, as well as filters and Common Table Expressions (CTE) to create the core algorithm.
SQrL also provided us with specialized code snippets that were optimized for our Flask app and Python environment, ensuring seamless integration and optimal performance. The structure of the query is as follows:
- We inserted the preferred movies selected by the user into a table and assigned a unique user id for that session, allowing us to retrieve their preferences quickly.
- We joined the table with the preferred movies to the table containing all movies, obtaining the vectors of the selected movies
- We calculated the dot product between the vectors of the preferred movies and the vectors of all movies in the database. We then ranked the resulting scores in descending order of the average matching score, which gives us the most relevant movie recommendations.
- Finally, we added additional analytics by finding the average user rating for the selected movie with an aggregate and a join with the ratings table.
With SQrL’s help, we ended up with the following query:
WITH table_match AS (
SELECT m.title, m.movieId, m.vector
FROM user_choice t
INNER JOIN movie_with_tags_with_vectors m ON m.title = t.title
WHERE userid = %s
movie_pairs AS (
SELECT m1.movieId AS movieId1, m1.title AS title1, m2.movieId AS
movieId2, m2.title AS title2, DOT_PRODUCT(m1.vector, m2.vector) AS similarity
FROM table_match m1
CROSS JOIN movie_with_tags_with_vectors m2
WHERE m1.movieId != m2.movieId
AND NOT EXISTS (
FROM user_choice uc
WHERE uc.userid = %s
AND uc.title = m2.title
movie_match AS (
SELECT movieId1, title1, movieId2, title2, similarity
ORDER BY similarity DESC
distinct_count AS (
SELECT DISTINCT movieId2, title2 AS Title, ROUND(AVG(similarity),
4) AS Rating_Match
GROUP BY movieId2, title2
ORDER BY Rating_Match DESC
average_ratings AS (
SELECT movieId, AVG(rating) AS Avg_Rating
GROUP BY movieId
SELECT dc.Title, dc.Rating_Match, ROUND(ar.Avg_Rating, 4) AS Avg_Rating
FROM distinct_count dc
JOIN average_ratings ar ON dc.movieId2 = ar.movieId
ORDER BY dc.Rating_Match DESC
We deployed this app using Vercel's serverless infrastructure, which allowed us to easily and quickly deploy and scale our app with minimal server management. We also used Vercel's built-in Git integration to streamline the deployment process and enable seamless, continuous integration and deployment (CI/CD) workflows.
The combination of full-text, semantic search and analytics (e.g. joins and aggregates) is the secret sauce behind the success of many GPT apps today — and with SingleStoreDB's full-text indices, native vector functions and powerful analytics capabilities, building these apps has never been easier. Thanks to SQrL, hobbyist developers like us can now create AI-powered movie recommendation engines over the weekend, with optimized schema and efficient queries that make development a breeze.
SingleStoreDB has never been more accessible, and with SQrL as your co-pilot, building something special has never been easier. Let your ideas fly and build with SQrL.