Sales Data Analysis Dataset From Amazon S3
Notebook
Note
This notebook can be run on a Free Starter Workspace. To create a Free Starter Workspace navigate to Start using the left nav. You can also use your existing Standard or Premium workspace with this Notebook.
Note
This notebook creates a pipeline, data may take up to 1 minute to populate
The Sales Data Analysis use case demonstrates how to leverage SingleStore's powerful querying capabilities in a business intelligence context like analyzing sales data stored in a CSV file.
This demo showcases typical operations that businesses perform to gain insights from their sales data, such as:
calculating total sales
identifying top-selling products
analyzing sales trends over time.
By working through this example, new users will:
learn how to load CSV data into Singlestore from S3
execute aggregate functions
perform time-series analysis
Demo Flow
How to use this notebook
Create a database (You can skip this Step if you are using Free Starter Tier)
We need to create a database to work with in the following examples.
In [1]:
shared_tier_check = %sql show variables like 'is_shared_tier'if not shared_tier_check or shared_tier_check[0][1] == 'OFF':%sql DROP DATABASE IF EXISTS SalesAnalysis;%sql CREATE DATABASE SalesAnalysis;
Create Table
In [2]:
%%sqlCREATE TABLE IF NOT EXISTS SalesData /* Creating table for sample data. */ (`Date` text CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci,`Store_ID` bigint(20) DEFAULT NULL,`ProductID` text CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci,`Product_Name` text CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci,`Product_Category` text CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci,`Quantity_Sold` bigint(20) DEFAULT NULL,`Price` float DEFAULT NULL,`Total_Sales` float DEFAULT NULL)
Load Data Using Pipelines
In [3]:
%%sqlCREATE PIPELINE IF NOT EXISTS SalesData_Pipeline AS /* Creating pipeline for sample data. */LOAD DATA S3 's3://singlestoreloaddata/SalesData/*.csv'CONFIG '{ \"region\": \"ap-south-1\" }'/*CREDENTIALS '{"aws_access_key_id": "<access key id>","aws_secret_access_key": "<access_secret_key>"}'*/INTO TABLE SalesDataFIELDS TERMINATED BY ','LINES TERMINATED BY '\n'IGNORE 1 lines;START PIPELINE SalesData_Pipeline;
Data may take couple of seconds to load after pipeline is started, rerun cell to verify
In [4]:
%%sqlSELECT count(*) FROM SalesData
Sample Queries
We will try to execute some Analytical Queries
Top-Selling Products
In [5]:
%%sqlSELECT product_name, SUM(quantity_sold) AS total_quantity_sold FROM SalesDataGROUP BY product_name ORDER BY total_quantity_sold DESC LIMIT 5;
Sales Trends Over Time
In [6]:
%%sqlSELECT date, SUM(total_sales) AS total_sales FROM SalesDataGROUP BY date ORDER BY total_sales desc limit 5;
Total Sales by Store
In [7]:
%%sqlSELECT Store_ID, SUM(total_sales) AS total_sales FROM SalesDataGROUP BY Store_ID ORDER BY total_sales DESC limit 5;
Sales Contribution by Product (Percentage)
In [8]:
%%sqlSELECT product_name, SUM(total_sales) * 100.0 / (SELECT SUM(total_sales) FROM SalesData) AS sales_percentage FROM SalesDataGROUP BY product_name ORDER BY sales_percentage DESC limit 5;
Top Days with Highest Sale
In [9]:
%%sqlSELECT date, SUM(total_sales) AS total_sales FROM SalesDataGROUP BY date ORDER BY total_sales DESC LIMIT 5;
Conclusion
We have shown how to insert data from a Amazon S3 using Pipelines
to SingleStoreDB. These techniques should enable you to
integrate your Amazon S3 with SingleStoreDB.
Clean up
Remove the '#' to uncomment and execute the queries below to clean up the pipeline and table created.
Drop Pipeline
In [10]:
%%sql#STOP PIPELINE SalesData_Pipeline;#DROP PIPELINE SalesData_Pipeline;
Drop Data
In [11]:
#shared_tier_check = %sql show variables like 'is_shared_tier'#if not shared_tier_check or shared_tier_check[0][1] == 'OFF':# %sql DROP DATABASE IF EXISTS SalesAnalysis;#else:# %sql DROP TABLE SalesData;
Details
About this Template
The Sales Data Analysis use case demonstrates how to utilize Singlestore's powerful querying capabilities to analyze sales data stored in a CSV file.
This Notebook can be run in Shared Tier, Standard and Enterprise deployments.
Tags
License
This Notebook has been released under the Apache 2.0 open source license.