New

Sales Data Analysis Dataset From Amazon S3

Notebook

SingleStore Notebooks

Sales Data Analysis Dataset From Amazon S3

Note

This notebook can be run on a Free Starter Workspace. To create a Free Starter Workspace navigate to Start using the left nav. You can also use your existing Standard or Premium workspace with this Notebook.

Note

This notebook creates a pipeline, data may take up to 1 minute to populate

The Sales Data Analysis use case demonstrates how to leverage SingleStore's powerful querying capabilities in a business intelligence context like analyzing sales data stored in a CSV file.

This demo showcases typical operations that businesses perform to gain insights from their sales data, such as:

calculating total sales
identifying top-selling products
analyzing sales trends over time.

By working through this example, new users will:

learn how to load CSV data into Singlestore from S3
execute aggregate functions
perform time-series analysis

Demo Flow

How to use this notebook

Create a database (You can skip this Step if you are using Free Starter Tier)

We need to create a database to work with in the following examples.

In [1]:

1
shared_tier_check = %sql show variables like 'is_shared_tier'
2
if not shared_tier_check or shared_tier_check[0][1] == 'OFF':
3
    %sql DROP DATABASE IF EXISTS SalesAnalysis;
4
    %sql CREATE DATABASE SalesAnalysis;

Create Table

In [2]:

1
%%sql
2

3
CREATE TABLE IF NOT EXISTS SalesData /* Creating table for sample data. */ (
4
  `Date` text CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci,
5
  `Store_ID` bigint(20) DEFAULT NULL,
6
  `ProductID` text CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci,
7
  `Product_Name` text CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci,
8
  `Product_Category` text CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci,
9
  `Quantity_Sold` bigint(20) DEFAULT NULL,
10
  `Price` float DEFAULT NULL,
11
  `Total_Sales` float DEFAULT NULL
12
)

Load Data Using Pipelines

In [3]:

%%sql

CREATE PIPELINE IF NOT EXISTS SalesData_Pipeline AS /* Creating pipeline for sample data. */

LOAD DATA S3 's3://singlestoreloaddata/SalesData/*.csv'

CONFIG '{ \"region\": \"ap-south-1\" }'

/*

CREDENTIALS '{"aws_access_key_id": "<access key id>",

               "aws_secret_access_key": "<access_secret_key>"}'

*/

INTO TABLE SalesData

FIELDS TERMINATED BY ','

LINES TERMINATED BY '\n'

IGNORE 1 lines;

START PIPELINE SalesData_Pipeline;

Data may take couple of seconds to load after pipeline is started, rerun cell to verify

In [4]:

1
%%sql
2
SELECT count(*) FROM SalesData

Sample Queries

We will try to execute some Analytical Queries

Top-Selling Products

In [5]:

1
%%sql
2
SELECT  product_name, SUM(quantity_sold) AS total_quantity_sold FROM SalesData
3
    GROUP BY  product_name ORDER BY total_quantity_sold DESC LIMIT 5;

Sales Trends Over Time

In [6]:

1
%%sql
2
SELECT date, SUM(total_sales) AS total_sales FROM SalesData
3
GROUP BY date ORDER BY total_sales desc limit 5;

Total Sales by Store

In [7]:

1
%%sql
2
SELECT  Store_ID, SUM(total_sales) AS total_sales FROM SalesData
3
GROUP BY  Store_ID ORDER BY total_sales DESC limit 5;

Sales Contribution by Product (Percentage)

In [8]:

1
%%sql
2
SELECT product_name, SUM(total_sales) * 100.0 / (SELECT SUM(total_sales) FROM SalesData) AS sales_percentage FROM SalesData
3
    GROUP BY product_name ORDER BY sales_percentage DESC limit 5;

Top Days with Highest Sale

In [9]:

1
%%sql
2
SELECT date, SUM(total_sales) AS total_sales FROM SalesData
3
    GROUP BY date ORDER BY total_sales DESC LIMIT 5;

Conclusion

We have shown how to insert data from a Amazon S3 using Pipelines to SingleStoreDB. These techniques should enable you to integrate your Amazon S3 with SingleStoreDB.

Clean up

Remove the '#' to uncomment and execute the queries below to clean up the pipeline and table created.

Drop Pipeline

In [10]:

%%sql

#STOP PIPELINE SalesData_Pipeline;

#DROP PIPELINE SalesData_Pipeline;

Drop Data

In [11]:

1
#shared_tier_check = %sql show variables like 'is_shared_tier'
2
#if not shared_tier_check or shared_tier_check[0][1] == 'OFF':
3
#    %sql DROP DATABASE IF EXISTS SalesAnalysis;
4
#else:
5
#    %sql DROP TABLE SalesData;

Details

About this Template

The Sales Data Analysis use case demonstrates how to utilize Singlestore's powerful querying capabilities to analyze sales data stored in a CSV file.

This Notebook can be run in Shared Tier, Standard and Enterprise deployments.

License

This Notebook has been released under the Apache 2.0 open source license.

See Notebook in action

Launch this notebook in SingleStore and start executing queries instantly.

Sales Data Analysis Dataset From Amazon S3

Notebook

Sales Data Analysis Dataset From Amazon S3

Demo Flow

How to use this notebook

Create a database (You can skip this Step if you are using Free Starter Tier)

Create Table

Load Data Using Pipelines

Data may take couple of seconds to load after pipeline is started, rerun cell to verify

Sample Queries

Conclusion

Clean up

Drop Pipeline

Drop Data

Details

About this Template

Tags

License

See Notebook in action