Have you ever needed to set a machine learning model on your data before?
Personally, I know this used to be massively complicated. It usually involved needing to write application code to connect a third-party machine learning system, and then connecting the output of the model back into your application. This is how I usually feel when setting up a new machine learning project.
But now, with MindsDB and SingleStore, setting up a modern machine learning model has never been easier. That's because MindsDB allows us to automate model training while leveraging the data directly stored in SingleStore.
By leveraging an additional predictive layer on top of our SingleStore database with the machine learning capabilities of MindsDB, we can make massive new insights into our existing data with minimal effort and cost. Traditionally, training datasets is painful and expensive. But if we can leverage our existing datasets, that saves tons of time for a developer.
In this post, you will be integrating SingleStore with MindsDB, building predictions directly into your existing tables, and querying them using simple SQL on our data. We will look at how to set up a SingleStore account and create our own cluster, and then import a dataset into our SingleStore database. Finally, we will connect our SingleStore data into MindsDB in order to ask predictive questions of their data and receive accurate answers from it.
How to Set Up SingleStore
The first thing we are going to do is set up a SingleStore account. If you have an existing account with SingleStore, click on the “Sign In” button in the top right of the page. If you do not have an existing account, you may sign up for free by clicking on the “Start Free” button on the top right.
Note: If you encounter any issues or have any questions about integrating SingleStore with MindsDB, you can refer to the SingleStore Docs or Community Forums for more information on setup or general help.
How to create a SingleStore Cluster
Now that we have logged into our SingleStore account, we can create our cluster. In order to create a new cluster, click the “Create Cluster'' Button on the top right of the page. You will then be asked to input information about your cluster, including a name and password.
Note: Please save this password because you will need it to log in to the cluster as a user after it is deployed. This password will be important later on, since we will be using it to log into our SingleStore database from MindsDB.
Once you click “Create Cluster” on the bottom left, wait for your cluster setup to be completed. New clusters might take a minute, so feel free to grab a soda and come back.
Once you see the success message at the top of the screen, your cluster has been created, and we are ready to create our tables and import our test data.
How to import a CSV dataset into SingleStore
For this demo, we are going to be using a demo dataset containing health insurance data. You can download this data from our GitHub (Link: https://github.com/mindsdb/mindsdb-examples/tree/master/others/health_insurance/dataset). This dataset includes information including age, sex, BMI, children, smoker, region, and charges from a fictitious insurance company. We will be using this dataset to eventually connect this data stored in SingleStore to MindsDB and run predictions and queries, such as predicting the average BMI of a male smoker.
Head over to the SQL Editor on the SingleStore portal. We will now create a new database and table where we are going to save our insurance data. Run this SQL script by copying this code into the editor and press ‘Run.” Check the logs to ensure that no errors occurred.
CREATE DATABASE IF NOT EXISTS; USE health_insurance; CREATE TABLE IF NOT EXISTS insurance_train ( age INT, sex TEXT, bmi DECIMAL (5,3), children INT, smoker TEXT, region TEXT, charges DECIMAL (10,5) );
Note: If you receive an error, read the description to understand why you are seeing the error. Check to make sure you have selected the correct database and that you have inputted the correct code into the SQL Editor.
Once we have the database and tables setup on SingleStore, we will be ready to import our test data. To import our insurance dataset, we are going to be using Sequel Pro. If you do not already have Sequel Pro downloaded on your computer, you can download it here [<https://www.sequelpro.com>].
Note: To find more information on the other methods of importing data into SingleStore, check out SingleStore Docs.
Once you have downloaded and installed Sequel Pro locally, you will be prompted to enter the connection details for your SingleStore cluster. Remember that password I told you to write down when you set up your cluster? You’re going to need it now. However, if you forgot your username and password, you can go back to SingleStore and click on the name of your cluster.
Once you input the information in SequelPro, click Connect. Once you are connected, we can import the data. To do this, go to File → Import, and select your dataset. Once you have done this, you should be able to see your data in Sequel Pro. It’s a good idea to confirm that the data was loaded correctly. If you would like to check to ensure your data is imported properly, you can go back to your SQL Editor in SingleStore and type and run this command:
SELECT * FROM insurance_train;
How to connect SingleStore with MindsDB
Now that we’ve got our insurance dataset imported into a SingleStore database, let’s connect our data to MindsDB in order to start making real time predictions. To do this, we are going to create an account on MindsDB. Use this link and sign up to create an account. [<https://cloud.mindsdb.com>].
Once you are signed in to MindsDB, it will take you through some features such as where to add datasets and where you can run predictions and queries. Using MindsDB, you can create predictive tables based on data which we have stored in SingleStore. For more about setting up a MindsDB account, refer to the MindsDB Docs. You can also ask any questions about integrating with SingleStore on our community forums.
We are going to connect to our SingleStore dataset by selecting “Dataset” in the top right corner. Once again, you will be asked to input the connection details of your SingleStore database. Once again, use the cluster password you created in SingleStore as well as the other information found under your cluster.
Your dataset is now successfully connected to MindsDB! Way to go!
How to use MindsDB to Run Predictive Analysis
Here’s the moment we’ve all been waiting for! We’re going to use our SingleStore data to make predictions about people based on the insurance data we imported into SingleStore. Using MindsDB, we can answer questions like, “What is the predicted BMI of a smoker?” “What’s the predicted age of a person with children in New York City?” and so much more. It really makes me feel like I have mind reading abilities.
Let’s create a new prediction by clicking on “Predictors” on the left and clicking “Train New” on the bottom right. For this example, we will try to predict BMI based on whether a person is a smoker. Select our SingleStore database as the data source, give it a name, I used “BMI Predictor,” and select the “smoker” as the column to be predicted.
Once the predictor has finished running, the data will be saved in an AI Table on SingleStore, where we will be able to make queries and insights from the MindsDB prediction. Now, let’s create a new query from our new AI Table by clicking on “Query” on the left and then selecting “New Query” on the bottom right. You can then select “smoker” and "yes” from the dropdown.
Once you have done that, press “Run Query”. Your query results will show up on the screen. You will be able to see your prediction as well as prediction confidence, which normally depends on the length of your dataset. You can see here that the predicted BMI of a smoker is 25.12 with a 46% confidence interval based on our dataset.
In this post, we covered how you can now run your own queries and predictions using MindsDB and SingleStore. But this is just the beginning. I would encourage you to play with your other datasets and see if you can pull some interesting insights from your data.