Getting Started with Notebooks
Notebook
Note
This notebook can be run on a Free Starter Workspace. To create a Free Starter Workspace navigate to Start using the left nav. You can also use your existing Standard or Premium workspace with this Notebook.
What you will learn in this notebook:
Load a CSV file from our Github Repo [Python]
Ingest that file into a SingleStoreDB without defining the schema [Python]
Interact natively with the database using SQL [SQL]
Convert results to a DataFrame and visualize results with Plotly [Python]
Questions?
Reach out to us through our forum.
Enhance your notebooks with visualizations
1. Import libraries for reading data into a DataFrame
Our data set contains geographic data, so we also install Shapely to store that data in Shapely geometry objects.
In [1]:
!pip3 install shapely --quietimport pandas as pdimport shapely.wkt
2. Load a csv file hosted in Github using Python
Notice that we are using the dtype=
, parse_dates=
, and converters=
options of the read_csv
method to
convert specific columns into various data types, including geographic data in the business_location
column.
See the read_csv
documentation
for more information.
In [2]:
url = 'https://raw.githubusercontent.com/singlestore-labs/singlestoredb-samples/main/' + \'Sample%20datasets/csv/Restaurant_Scores_LIVES_Standard.csv'
In [3]:
def str_to_shapely(x: str) -> shapely.geometry.Point | None:"""Convert a WKT string to a shapely object while handling NULLs."""return shapely.wkt.loads(x) if x else None# Read URL directly using pd.read_csvdf = pd.read_csv(url, index_col=0,# Use parse_date=, dtype=, and converters= to specify explicit data typesparse_dates=['inspection_date'],date_format='%m/%d/%Y %H:%M:%S %p',dtype=dict(business_id=int, business_phone_number=str, business_postal_code=str, inspection_score=float),converters=dict(business_location=str_to_shapely))df
Display the data types in the resulting DataFrame. Note that any objects that pandas does not support natively (e.g., strings, blobs, shapely geometries, etc.) show up as object
.
In [4]:
df.dtypes
Action Required
If you have a Free Starter Workspace deployed already, select the database from drop-down menu at the top of this notebook. It updates the connection_url to connect to that database.
3. Ingest a DataFrame in a SingleStoreDB table
Create the database
Import the library to connect to the database
Create the connection to the library
Ingest the dataframe to the newly created database
Set the database name in a variable. It will be used in subsequent queries.
Here we are using the database_name
variable in a %%sql
cell. The syntax for including Python variables
is to surround the variable name with {{ ... }}
.
In [5]:
shared_tier_check = %sql show variables like 'is_shared_tier'if not shared_tier_check or shared_tier_check[0][1] == 'OFF':database_name = 'getting_started_notebook'%sql DROP DATABASE IF EXISTS {{database_name}};%sql CREATE DATABASE {{database_name}};else:current_database = %sql SELECT DATABASE() as CurrentDatabasedatabase_name = current_database[0][0]
Action Required
Make sure to select a database from the drop-down menu at the top of this notebook. It updates the connection_url to connect to that database.
We can use SQLAlchemy and pandas to upload a DataFrame. Note that if the table does not exist, the data types will be inferred from the data. This may not result in the exact types that you desire. You can define the table in the database before uploading to get the exact types you want.
If you get an error about the database not being selected, that simply means that your connection_url
does not
contain a specific database to connect to. You can use the drop-down menu at the top of this notebook (immediately
under the title) to select a database to work with. Changing the selection in the drop-down menu also updates
the connection_url
variable.
In [6]:
import sqlalchemy as sa# Create a SQLAlchemy engine and connectdb_connection = sa.create_engine(connection_url).connect()
The SingleStoreDB Python package also adds a convenience function for SQLAlchemy connections
without using the connection_url
. It automatically gets the connection information from
the SINGLESTOREDB_URL
environment variable.
In [7]:
import singlestoredb as s2# Create a SQLAlchemy engine and connect, without having to specify the connection URLdb_connection = s2.create_engine().connect()# Upload the DataFramedf.to_sql('sf_restaurant_scores', con=db_connection, if_exists='append', chunksize=1000)
4. Interact natively with the database using SQL
Read the top 10 rows from the table
Alter the table to get the date in a date format, not string
Read the number of restaurant inspections over the time in San Francisco
In [8]:
%%sqlSELECT * FROM {{database_name}}.sf_restaurant_scores LIMIT 10;
In the code block below, we use the result1 <<
syntax on the %%sql
line to store the result of the SQL
operation into a variable which can be used later. As with other Jupyter notebooks, you can always get the value
of the last executed cell in the _
(underscore) variable, but setting a specifc variable name to use is generally
a safer way to retrieve results.
In [9]:
%%sql result1 <<SELECTDATE_TRUNC('month', inspection_date) AS month,COUNT(*) AS count_inspectionFROM{{database_name}}.sf_restaurant_scoresGROUP BYMONTHORDER BYMONTH DESC;
The output of a %%sql
cell is a ResultSet
which contains methods for converting to various other data types (e.g., csv
, dicts
, DataFrame
, PolarsDataFrame
). It is also possible to convert to a DataFrame by passing a ResultSet
object to the DataFrame
constructor as we'll see below.
In [10]:
type(result1)
5. Visualize with Plotly
We are using Plotly to visualize the data in result1
. The first parameter of the
bar
function requires a DataFrame, so we'll convert result1
to a DataFrame before calling bar
.
In [11]:
result1_df = pd.DataFrame(result1)result1_df[:5]
In [12]:
import plotly.express as pxpx.bar(result1_df, x='month', y='count_inspection', title='Inspections by Month')
6. Cleanup database
Action Required
If you created a new database in your Standard or Premium Workspace, you can drop the database by running the cell below. Note: this will not drop your database for Free Starter Workspaces. To drop a Free Starter Workspace, terminate the Workspace using the UI.
In [13]:
shared_tier_check = %sql show variables like 'is_shared_tier'if not shared_tier_check or shared_tier_check[0][1] == 'OFF':%sql DROP DATABASE IF EXISTS {{database_name}};
Details
About this Template
SingleStore Notebooks with an AI Copilot (SQrL) for data prototyping, visualization and analysis.
This Notebook can be run in Shared Tier, Standard and Enterprise deployments.
Tags
License
This Notebook has been released under the Apache 2.0 open source license.