New

Getting Started with SingleStore

Notebook


SingleStore Notebooks

Getting Started with SingleStore

Note

This notebook can be run on a Free Starter Workspace. To create a Free Starter Workspace navigate to Start using the left nav. You can also use your existing Standard or Premium workspace with this Notebook.

This Jupyter notebook provides a comprehensive overview and test drive of SingleStore's multi-model capabilities, showcasing how to efficiently manage and query diverse data types within a single database platform.

The notebook starts with a simple "Getting Started" example, guiding users through various standard SQL queries to interact with the database. It then progressively demonstrates how to add and query different data models, including vectors for machine learning, full-text search for unstructured data, JSON for hierarchical data, geospatial data for location-based queries, and time series data for temporal analysis. This hands-on approach offers an accessible way for users to explore SingleStore's versatility and powerful multi-model functionality.

Simple "Getting Started" example

This code checks whether the current database environment is using a "shared tier" and then conditionally drops and creates a database based on the result.

In [1]:

1

shared_tier_check = %sql SHOW VARIABLES LIKE "is_shared_tier"

2

if not shared_tier_check or shared_tier_check[0][1] == "OFF":

3

%sql DROP DATABASE IF EXISTS multi_model;

4

%sql CREATE DATABASE IF NOT EXISTS multi_model;

Action Required

Select the database from the drop-down menu at the top of this notebook.

Various standard SQL queries

Create some simple tables

This setup establishes a basic relational structure to store customer information and their corresponding orders.

In [2]:

1

%%sql

2

DROP TABLE IF EXISTS customers;

3

DROP TABLE IF EXISTS orders;

4

5

CREATE TABLE IF NOT EXISTS customers /* Creating table for sample data. */(

6

customer_id INT PRIMARY KEY,

7

customer_name VARCHAR(50),

8

country VARCHAR(50)

9

);

10

11

CREATE TABLE IF NOT EXISTS orders /* Creating table for sample data. */(

12

order_id INT PRIMARY KEY,

13

customer_id INT,

14

amount DECIMAL(10, 2),

15

product VARCHAR(50)

16

);

Insert some data

In [3]:

1

%%sql

2

INSERT INTO customers (customer_id, customer_name, country) VALUES

3

(1, "John Doe", "Canada"),

4

(2, "Jane Smith", "Canada"),

5

(3, "Sam Brown", "Canada"),

6

(4, "Lisa White", "Canada"),

7

(5, "Mark Black", "Canada");

8

9

INSERT INTO orders (order_id, customer_id, amount, product) VALUES

10

(101, 1, 150.00, "Book"),

11

(102, 2, 200.00, "Pen"),

12

(103, 3, 50.00, "Notebook"),

13

(104, 1, 300.00, "Laptop"),

14

(105, 4, 250.00, "Tablet");

Sum of amounts

In [4]:

1

%%sql

2

SELECT

3

SUM(amount) AS total_sales

4

FROM

5

orders;

Minimum amount

In [5]:

1

%%sql

2

SELECT

3

MIN(amount) AS min_order_amount

4

FROM

5

orders;

Maximum amount

In [6]:

1

%%sql

2

SELECT

3

MAX(amount) AS max_order_amount

4

FROM

5

orders;

Average amount

In [7]:

1

%%sql

2

SELECT

3

ROUND(AVG(amount), 2) AS avg_order_amount

4

FROM

5

orders;

Count the number of orders

In [8]:

1

%%sql

2

SELECT

3

COUNT(*) AS number_of_orders

4

FROM

5

orders;

Join customers and orders tables

In [9]:

1

%%sql

2

SELECT

3

customers.customer_name,

4

orders.order_id,

5

orders.amount

6

FROM

7

customers, orders

8

WHERE

9

customers.customer_id = orders.customer_id

10

ORDER BY

11

amount ASC;

Group by customer and calculate total amount spent

In [10]:

1

%%sql

2

SELECT

3

customers.customer_name,

4

SUM(orders.amount) AS total_spent

5

FROM

6

customers, orders

7

WHERE

8

customers.customer_id = orders.customer_id

9

GROUP BY

10

customers.customer_name

11

ORDER BY

12

total_spent DESC;

Add Vectors

Add a 3-dimensional vector to the orders table

In [11]:

1

%%sql

2

ALTER TABLE orders ADD COLUMN dimensions VECTOR(3);

Add some vector data

3 dimensions represent Length (L), Width (W) and Height (H) in cm

In [12]:

1

%%sql

2

UPDATE orders SET dimensions = '[8.5, 5.5, 1.0]' WHERE order_id = 101;

3

UPDATE orders SET dimensions = '[0.5, 0.5, 14.0]' WHERE order_id = 102;

4

UPDATE orders SET dimensions = '[21.0, 29.7, 0.5]' WHERE order_id = 103;

5

UPDATE orders SET dimensions = '[32.0, 22.0, 2.0]' WHERE order_id = 104;

6

UPDATE orders SET dimensions = '[24.0, 16.0, 0.7]' WHERE order_id = 105;

Show the vectors

In [13]:

1

%%sql

2

SET vector_type_project_format = JSON;

3

4

SELECT

5

*

6

FROM

7

orders;

Select orders using <*> which is Dot Product

The dot product is a way of multiplying two vectors to get a single number (a scalar).

In simple terms, the dot product provides a way to combine two sets of numbers into a single value that reflects how much the vectors "point" in the same direction.

In [14]:

1

%%sql

2

SET vector_type_project_format = JSON;

3

4

SELECT

5

*,

6

ROUND((dimensions <*> '[32.0, 22.0, 2.0]'), 2) AS score

7

-- ROUND(DOT_PRODUCT(dimensions, '[32.0, 22.0, 2.0]'), 2) AS score

8

FROM

9

orders

10

ORDER BY

11

score DESC;

Select orders using <-> which is Euclidean Distance

Euclidean distance is a way to measure how far apart two points are in space.

In simple terms, Euclidean distance provides a straight-line measurement of how far one point is from another, like using a ruler to measure the distance between two points on a map.

In [15]:

1

%%sql

2

SET vector_type_project_format = JSON;

3

4

SELECT

5

*,

6

ROUND((dimensions <-> '[32.0, 22.0, 2.0]'), 2) AS score

7

-- ROUND(EUCLIDEAN_DISTANCE(dimensions, '[32.0, 22.0, 2.0]'), 2) AS score

8

FROM

9

orders

10

ORDER BY

11

score ASC;

Add Full-Text

Add a description column to the orders table

In [16]:

1

%%sql

2

ALTER TABLE orders ADD COLUMN description VARCHAR(255);

Update orders table with descriptions

In [17]:

1

%%sql

2

UPDATE orders

3

SET description = CASE

4

WHEN product = "Book" THEN "A high-quality book that offers insightful content and engaging narratives."

5

WHEN product = "Pen" THEN "A smooth-writing pen designed for comfort and precision."

6

WHEN product = "Notebook" THEN "A versatile notebook perfect for notes, sketches, and ideas."

7

WHEN product = "Laptop" THEN "A powerful laptop with high performance and sleek design for all your computing needs."

8

WHEN product = "Tablet" THEN "A compact tablet with a vibrant display and versatile functionality."

9

ELSE "A product with excellent features and quality."

10

END;

Show the descriptions

In [18]:

1

%%sql

2

SELECT

3

*

4

FROM

5

orders;

Add a full-text index to the orders table

In [19]:

1

%%sql

2

ALTER TABLE orders ADD FULLTEXT USING VERSION 2 orders_ft_index (product, description);

3

OPTIMIZE TABLE orders FLUSH;

Search for a match on "vibrant" in the description part

In [20]:

1

%%sql

2

SELECT

3

*

4

FROM

5

orders

6

WHERE

7

MATCH (TABLE orders) AGAINST ("description:vibrant");

Use various operators to show flexibility

+ (must appear), * (multiple wildcard), ? (single wildcard)

In [21]:

1

%%sql

2

SELECT

3

product

4

FROM

5

orders

6

WHERE

7

MATCH (TABLE orders) AGAINST ("product:(+oo?) OR description:versa*");

Add JSON

Add a JSON column to the orders table

In [22]:

1

%%sql

2

ALTER TABLE orders ADD COLUMN additional_details JSON NOT NULL;

Update orders table with additional details in JSON format

In [23]:

1

%%sql

2

UPDATE orders

3

SET additional_details = CASE

4

WHEN order_id = 101 THEN '{

5

"invoice_number": "INV1001",

6

"order_status": "Delivered",

7

"shipping_address": {

8

"street": "456 Elm St",

9

"city": "Toronto",

10

"state": "ON",

11

"postal_code": "M5A 1A1",

12

"country": "Canada"

13

},

14

"payment_method": "Credit Card",

15

"discounts_applied": [{

16

"discount_code": "WELCOME10",

17

"amount": 10.00

18

}],

19

"order_date": "2024-07-01",

20

"estimated_delivery_date": "2024-07-05",

21

"tracking_number": "TRACK1001",

22

"customer_notes": "Leave at the front desk."

23

}'

24

WHEN order_id = 102 THEN '{

25

"invoice_number": "INV1002",

26

"order_status": "Pending",

27

"shipping_address": {

28

"street": "789 Oak St",

29

"city": "Vancouver",

30

"state": "BC",

31

"postal_code": "V5K 1A1",

32

"country": "Canada"

33

},

34

"payment_method": "PayPal",

35

"discounts_applied": [{

36

"discount_code": "SPRING20",

37

"amount": 20.00

38

}],

39

"order_date": "2024-07-02",

40

"estimated_delivery_date": "2024-07-06",

41

"tracking_number": "TRACK1002",

42

"customer_notes": "Contact me before delivery."

43

}'

44

WHEN order_id = 103 THEN '{

45

"invoice_number": "INV1003",

46

"order_status": "Shipped",

47

"shipping_address": {

48

"street": "321 Pine St",

49

"city": "Montreal",

50

"state": "QC",

51

"postal_code": "H2X 1Y4",

52

"country": "Canada"

53

},

54

"payment_method": "Credit Card",

55

"discounts_applied": [{

56

"discount_code": "SAVE15",

57

"amount": 15.00

58

}],

59

"order_date": "2024-07-03",

60

"estimated_delivery_date": "2024-07-07",

61

"tracking_number": "TRACK1003",

62

"customer_notes": "Deliver after 5 PM."

63

}'

64

WHEN order_id = 104 THEN '{

65

"invoice_number": "INV1004",

66

"order_status": "Shipped",

67

"shipping_address": {

68

"street": "654 Maple St",

69

"city": "Calgary",

70

"state": "AB",

71

"postal_code": "T2P 1N4",

72

"country": "Canada"

73

},

74

"payment_method": "Credit Card",

75

"discounts_applied": [{

76

"discount_code": "NEWYEAR25",

77

"amount": 25.00

78

}],

79

"order_date": "2024-07-01",

80

"estimated_delivery_date": "2024-07-08",

81

"tracking_number": "TRACK1004",

82

"customer_notes": "Leave package at the back door."

83

}'

84

WHEN order_id = 105 THEN '{

85

"invoice_number": "INV1005",

86

"order_status": "Delivered",

87

"shipping_address": {

88

"street": "987 Birch St",

89

"city": "Ottawa",

90

"state": "ON",

91

"postal_code": "K1A 0A1",

92

"country": "Canada"

93

},

94

"payment_method": "PayPal",

95

"discounts_applied": [{

96

"discount_code": "HOLIDAY30",

97

"amount": 30.00

98

}],

99

"order_date": "2024-07-03",

100

"estimated_delivery_date": "2024-07-09",

101

"tracking_number": "TRACK1005",

102

"customer_notes": "Please ring the doorbell."

103

}'

104

ELSE '{}'

105

END;

Extract specific JSON fields

In [24]:

1

%%sql

2

SELECT

3

order_id,

4

additional_details::invoice_number AS invoice_number,

5

additional_details::order_status AS order_status

6

FROM

7

orders

8

ORDER BY

9

order_id;

Find orders that have been "Delivered"

In [25]:

1

%%sql

2

SELECT

3

order_id,

4

additional_details::invoice_number AS invoice_number

5

FROM

6

orders

7

WHERE

8

additional_details::order_status = '"Delivered"'

9

ORDER BY

10

order_id;

Aggregate data based on JSON fields

In [26]:

1

%%sql

2

SELECT

3

additional_details::order_status AS order_status,

4

COUNT(*) AS order_count

5

FROM

6

orders

7

GROUP BY

8

order_status;

Add Geospatial

Insert 2 more customers into customers table

In [27]:

1

%%sql

2

INSERT INTO customers (customer_id, customer_name, country) VALUES

3

(6, "Emily Davis", "Canada"),

4

(7, "Michael Johnson", "Canada");

Create neighborhoods table for geospatial data

In [28]:

1

%%sql

2

DROP TABLE IF EXISTS neighborhoods;

3

4

CREATE TABLE IF NOT EXISTS neighborhoods /* Creating table for sample data. */(

5

id INT UNSIGNED NOT NULL,

6

name VARCHAR(64) NOT NULL,

7

population INT UNSIGNED NOT NULL,

8

shape TEXT NOT NULL,

9

centroid GEOGRAPHYPOINT NOT NULL,

10

sort key (name),

11

shard key (id)

12

);

Add some city data to the neighborhoods table

In [29]:

1

%%sql

2

INSERT INTO neighborhoods (id, name, population, shape, centroid) VALUES

3

(1, "Toronto", 2794356,

4

"POLYGON((-79.6393 43.6777, -79.1152 43.6777, -79.1152 43.8554, -79.6393 43.8554, -79.6393 43.6777))",

5

"POINT(-79.3832 43.6532)"

6

),

7

(2, "Vancouver", 662248,

8

"POLYGON((-123.2247 49.1985, -123.0234 49.1985, -123.0234 49.3169, -123.2247 49.3169, -123.2247 49.1985))",

9

"POINT(-123.1216 49.2827)"

10

),

11

(3, "Montreal", 1762949,

12

"POLYGON((-73.9354 45.3991, -73.4757 45.3991, -73.4757 45.7044, -73.9354 45.7044, -73.9354 45.3991))",

13

"POINT(-73.5673 45.5017)"

14

),

15

(4, "Calgary", 1306784,

16

"POLYGON((-114.3160 50.8420, -113.8599 50.8420, -113.8599 51.2124, -114.3160 51.2124, -114.3160 50.8420))",

17

"POINT(-114.0719 51.0447)"

18

),

19

(5, "Ottawa", 1017449,

20

"POLYGON((-75.9274 45.2502, -75.3537 45.2502, -75.3537 45.5489, -75.9274 45.5489, -75.9274 45.2502))",

21

"POINT(-75.6972 45.4215)"

22

);

Add a geospatial column to the customers table

In [30]:

1

%%sql

2

ALTER TABLE customers ADD COLUMN location GEOGRAPHYPOINT;

Update customers table with location data

In [31]:

1

%%sql

2

UPDATE customers SET location = "POINT(-79.3832 43.6532)" WHERE customer_id = 1;

3

UPDATE customers SET location = "POINT(-123.1216 49.2827)" WHERE customer_id = 2;

4

UPDATE customers SET location = "POINT(-73.5673 45.5017)" WHERE customer_id = 3;

5

UPDATE customers SET location = "POINT(-114.0719 51.0447)" WHERE customer_id = 4;

6

UPDATE customers SET location = "POINT(-75.6972 45.4215)" WHERE customer_id = 5;

7

UPDATE customers SET location = "POINT(-79.3832 43.6532)" WHERE customer_id = 6;

8

UPDATE customers SET location = "POINT(-123.1216 49.2827)" WHERE customer_id = 7;

Join the neighborhoods table to itself and measure distances between neighborhoods

In [32]:

1

%%sql

2

SELECT

3

b.name AS town,

4

ROUND(GEOGRAPHY_DISTANCE(a.centroid, b.centroid), 0) AS distance_from_center,

5

ROUND(GEOGRAPHY_DISTANCE(a.shape, b.shape), 0) AS distance_from_border

6

FROM

7

neighborhoods a, neighborhoods b

8

WHERE

9

a.name = "Vancouver"

10

ORDER BY

11

2;

Find out where you are

In [33]:

1

%%sql

2

SELECT

3

name

4

FROM

5

neighborhoods

6

WHERE

7

GEOGRAPHY_INTERSECTS("POINT(-79.3770 43.7500)", shape);

Find customers within "Vancouver"

In [34]:

1

%%sql

2

SELECT

3

c.customer_id, c.customer_name

4

FROM

5

customers c, neighborhoods n

6

WHERE

7

n.name = "Vancouver" AND GEOGRAPHY_CONTAINS(n.shape, c.location);

Add Time Series

Count orders by day

In [35]:

1

%%sql

2

SELECT

3

DATE_FORMAT(STR_TO_DATE(additional_details::order_date, '"%Y-%m-%d"'), '%Y-%m-%d') AS order_date,

4

COUNT(*) AS order_count

5

FROM

6

orders

7

GROUP BY

8

order_date

9

ORDER BY

10

order_date;

Sum of order amounts by month

In [36]:

1

%%sql

2

SELECT

3

DATE_FORMAT(STR_TO_DATE(additional_details::order_date, '"%Y-%m-%d"'), '%Y-%m') AS order_month,

4

SUM(amount) AS total_amount

5

FROM

6

orders

7

GROUP BY

8

order_month

9

ORDER BY

10

order_month;

Orders count by customer over time

In [37]:

1

%%sql

2

SELECT

3

customer_id,

4

DATE_FORMAT(STR_TO_DATE(additional_details::order_date, '"%Y-%m-%d"'), '%Y-%m-%d') AS order_date,

5

COUNT(*) AS order_count

6

FROM

7

orders

8

GROUP BY

9

customer_id, order_date

10

ORDER BY

11

customer_id, order_date;

Bonus

Create a map from geospatial city data

In [38]:

1

!pip install folium shapely --quiet

Action Required

Select the database from the drop-down menu at the top of this notebook. It updates the connection_url which is used by SQLAlchemy to make connections to the selected database.

In [39]:

1

from sqlalchemy import *

2

3

db_connection = create_engine(connection_url)

Get city data from neighborhoods table

In [40]:

1

import pandas as pd

2

3

query = """

4

SELECT

5

id,

6

name,

7

population,

8

shape :> TEXT AS polygon,

9

centroid :> TEXT AS point

10

FROM

11

neighborhoods

12

"""

13

14

df = pd.read_sql(

15

query,

16

db_connection

17

)

Convert the data to geospatial format for Python

In [41]:

1

from shapely import wkt

2

3

df["polygon"] = df["polygon"].apply(wkt.loads)

4

df["point"] = df["point"].apply(wkt.loads)

Plot the cities on a map

In [42]:

1

import folium

2

3

m = folium.Map(

4

location = [56.1304, -106.3468],

5

zoom_start = 4

6

)

7

8

for idx, row in df.iterrows():

9

folium.Polygon(

10

locations = [(point[1], point[0]) for point in row["polygon"].exterior.coords],

11

color = "blue",

12

weight = 2,

13

fill = True,

14

fill_color = "blue",

15

fill_opacity = 0.1

16

).add_to(m)

17

18

folium.Marker(

19

location = (row["point"].y, row["point"].x),

20

popup = row["name"]

21

).add_to(m)

22

23

html_content = m._repr_html_()

Save the map to stage

Stage Support

The following code will only work on the Standard Tier at this time.

In [43]:

1

from singlestoredb import notebook as nb

2

3

if not shared_tier_check or shared_tier_check[0][1] == "OFF":

4

with nb.stage.open("map.html", "w") as st:

5

st.write(html_content)

Cleanup

In [44]:

1

%%sql

2

DROP TABLE IF EXISTS customers;

3

DROP TABLE IF EXISTS orders;

4

DROP TABLE IF EXISTS neighborhoods;

In [45]:

1

shared_tier_check = %sql SHOW VARIABLES LIKE "is_shared_tier"

2

if not shared_tier_check or shared_tier_check[0][1] == "OFF":

3

%sql DROP DATABASE IF EXISTS multi_model;

Conclusions

In this Jupyter notebook, we explored the robust multi-model capabilities of SingleStore, demonstrating how to efficiently manage and query a wide range of data types within a unified database platform. Beginning with a simple "Getting Started" guide, we progressively delved into various standard SQL queries and extended our exploration to include more advanced data models such as vectors for machine learning, full-text search for unstructured data, JSON for hierarchical data, geospatial data for location-based queries, and time series data for temporal analysis. Through these practical examples, users can appreciate SingleStore's versatility and powerful functionality, gaining the skills to effectively harness its multi-model capabilities for diverse applications.

Details


About this Template

Test Drive SingleStore Multi-Model Examples in One Notebook

Notebook Icon

This Notebook can be run in Shared Tier, Standard and Enterprise deployments.

Tags

startersqlvectorsfulltextjsongeospatialtimeseries

License

This Notebook has been released under the Apache 2.0 open source license.

See Notebook in action

Launch this notebook in SingleStore and start executing queries instantly.