How We Built a Real-Time RAG Application for Free With SingleStore and Vercel

How We Built a Real-Time RAG Application for Free With SingleStore and Vercel

Sometimes, the best ideas start with a Slack conversation between two colleagues.

And in an effort to fully explore generative AI capabilities, we will demonstrate how you can build a modern, real-time AI app for free entirely on SingleStore. The app provides recommendations on which LLMs you should use for your use case based on the latest data, sentiment and scores — check it out! Our CMO, Madhukar, likes to call this technology live RAG – let us  show you how it’s done!

hybrid-free-database-cloud-service-a-single-developer-platform-for-your-gen-ai-appsHybrid (free) database + cloud service:  A single developer platform for your gen AI apps

A free SingleStore starter workspace enables you to run vector search, full-text search, analytics and transactions (read/writes/updates) in a single database. This is extremely powerful — all you need is a free starter workspace for all your modern AI app workloads. In addition, this also gives you an extremely performant database built for running apps on real-time and streaming data.

One of the key attributes of real-time RAG or live RAG is data freshness. What we are going to build today usesSingleStore Notebooks and our Job Service, bringing in real-time data from various sources (as soon as it’s generated), enriching it with vector embeddings and ensuring it’s queryable by your downstream LLMs and applications on demand.

The application is hosted on Vercel, and you can follow the steps in this article by using the code we took to go from prototype to production in a Notebook here!

bringing-real-time-multi-modal-data-to-your-ll-ms-and-apps-with-pythonBringing real-time multi-modal data to your LLMs and apps with Python

We built an LLM recommender app and to understand which LLM is more popular, we looked at different data sources. To understand the popularity and appropriateness of the LLM, we first need to get data about LLMs, how they are benchmarked, how other devs are using them for their apps and projects and the “live” sentiment of these devs on Twitter and Reddit.

First, we got a list of all LLMs from the Open LLM Leaderboard, which shows the best-performing LLMs based on various tests — which also forms the foundation for our recommendations. Creating recommendations also involves understanding what people think about each LLM, how they use it and what kind of apps they create with it. To do this, we use Twitter, Reddit and GitHub APIs.

From Twitter and Reddit posts, we learn about people's opinions on specific LLMs. By looking at GitHub projects, we find out what people have built using specific LLMs.

The Open LLM Leaderboard does not provide a ready-to-use dataset with a list of all models and their results. To obtain such a dataset, you would need to create your own script based on their source code to generate a JSON file. Or, you can use the dataset we’ve generated.

To search for Twitter posts by the model's name, we used the tweepy library and its search_recent_tweets function. To search for Reddit posts, we used the praw library and its subreddit.serach function. For searching GitHub repositories, we used the GitHub HTTP API endpoint. All of this can be done with SingleStore Notebooks.

Using these APIs, we first gathered raw data for each model. Before performing the searches, all models were inserted into the models table. Additionally, we created a model_readmes table containing the model's readme from the HuggingFace page in both its original and embedded formats. Twitter posts, Reddit posts and GitHub readmes are inserted into their respective tables: model_twitter_posts model_reddit_posts and model_github_readmes, both in their original and embedded formats.

Next, using the job service, we scheduled the notebook to retrieve new data from the data sources every hour for each LLM. You can also choose to set a shorter time depending on your use case. This ensures that the latest scores, sentiment and usage is captured in our recommendations.

Job service and Notebook allow us to easily create a full-stack AI application — without developing a third-party server, cron job and without deployment to  third-party hosting. This simplifies the application architecture, saves a lot of time and resources, while also helping devs go from a prototype on a familiar Jupyter notebook interface to powering their production apps and workloads.

With all of this in place in the Notebook that we put on a scheduled job, we can now build an app that queries this data using RAG and start helping users answer their question around which LLMs are more appropriate and popular for their specific use cases.

single-store-elegance-sdk-❤️-vercelSingleStore Elegance SDK ❤️ Vercel

To build the front-end part of the application we used Next.js framework, SingleStore Elegance SDK, SingleStore Vercel Connector and Vercel:

  • Next.js provides the ability to execute server-side code without deploying a third-party server.
  • SingleStore Elegance SDK has a set of out-of-the-box functionality for developing both CRUD and AI applications with SingleStore.
  • The SingleStore Vercel Connector makes it easy to create a connection between the SingleStore database and Next.js application.
  • Vercel provides the ability to quickly and easily deploy an application imported from a GitHub repository.

To create a Next.js app with the already-installed SingleStore Elegance SDK, we used the following command:

npx create-singlestoredb-app --template elegance-next

After that we got an environment ready to develop the Next.js application.

First, we created the UI components, pages and the use case form. Then we created the api/models/search endpoint to handle a user’s use case, and perform a semantic recommendation search. The user's use case is sent to this endpoint as a string, then formatted into an embedding using the eleganceServerClient.ai.createEmbedding function and passed as an argument to a semantic search across all tables using the eleganceServerClient.ai.vectorSerach function.

The search is performed asynchronously across five tables: models, model_readmes, model_twitter_posts, model_reddit_posts and model_github_readmes.

The results of each search are then grouped by model name, the average similarity score for each group of results is calculated, the groups are sorted from highest to lowest average similarity score and the first five results are taken.

Before sending a response for each found model, a description of why a model fits the user use case is generated using the eleganceServerClient.ai.createChatCompletion function based on posts from Twitter, Reddit and GitHub projects.

Once we finished developing the application we deploy it to Vercel and connect it to the SingleStore database using the SingleStore Vercel Connector.

the-developer-data-platform-for-gen-ai-appsThe developer data platform for gen AI apps

We are on a mission to make it super easy to transition from an idea in Slack to a full-scale deployed app. SingleStore is the only database you need to run all the workloads (vector search, analytics, transactions, etc.) gen AI apps require. With Notebooks + Job Service, you can bring life to and productionalize your data in SingleStore.


Share