
Load files from S3 into Shared Tier
Notebook

This notebook guides you through data ingestion of CSV files from an AWS S3 location into your shared tier workspace.
Create a Pipeline from CSV files in AWS S3
In this example, we want to create a pipeline that ingests from a CSV file stored in an AWS S3 bucket. We will guide you through an example with stock market data.
Action Required
Make sure to select your database from the drop-down menu at the top of this notebook. It updates the connection_url to connect to that database.
Create a Table
Start by creating a table to store the ingested data. In our example, we will create a Stocks
table that will store trading data for a specific stock on a given date.
In [1]:
1
%%sql2
3
CREATE TABLE stocks /* Creating table for sample data. */(4
`date` date NULL,5
`open` double NULL,6
`high` double NULL,7
`low` double NULL,8
`close` double NULL,9
`volume` bigint(20) NULL,10
`Name` text CHARACTER SET utf8 COLLATE utf8_general_ci NULL,11
SHARD KEY ()12
);
Create a pipeline
We then create a pipeline by pointing the data source to the S3 bucket containing the dataset. In our case, we have a CSV file all_stocks_5yr.csv
and will be ingesting it into our Stocks
table via a pipeline stocks_pipeline
.
To create your own pipeline, you will need the following information:
The path of the bucket, such as:
'helios-self-poc-stockticker/all_stocks_5yr.csv'
The name of the bucket’s region, such as:
us-east-1
Your AWS account’s access credentials:
<aws_access_key_id>
and<aws_secret_access_key>
*For more on how to retrieve the above information, read our Pipeline Documentation.
In [2]:
1
%%sql2
CREATE OR REPLACE PIPELINE stocks_pipeline /* Creating pipeline for sample data. */3
AS LOAD DATA S3 'helios-self-poc-stockticker/all_stocks_5yr.csv'4
CONFIG '{"region": "us-east-1"}'5
CREDENTIALS '{"aws_access_key_id": "<your_key>",6
"aws_secret_access_key": "<your_secret>"}'7
BATCH_INTERVAL 450008
SKIP DUPLICATE KEY ERRORS -- SKIP ALL ERRORS can be used to skip all errors that can be tracked through "Monitor the pipeline for errors"9
INTO TABLE stocks10
FIELDS TERMINATED BY ',' ENCLOSED BY '"' ESCAPED BY '\\'11
LINES TERMINATED BY '\n' STARTING BY ''12
IGNORE 1 LINES13
FORMAT CSV14
(15
`stocks`.`date`,16
`stocks`.`open`,17
`stocks`.`high`,18
`stocks`.`low`,19
`stocks`.`close`,20
`stocks`.`volume`,21
`stocks`.`Name`22
);
Start and monitor the pipeline
The CREATE PIPELINE statement creates a new pipeline, but the pipeline has not yet been started, and no data has been loaded. To start a pipeline in the background, run:
In [3]:
1
%%sql2
3
START PIPELINE stocks_pipeline;
In [4]:
1
%%sql2
show pipelines;
If there is no error or warning, you should see no error message.
In [5]:
1
%%sql2
3
SELECT * FROM information_schema.pipelines_errors4
WHERE pipeline_name = 'stocks_pipeline';
Query the table
In [6]:
1
%%sql2
3
SELECT * FROM stocks LIMIT 5;
Cleanup resources
In [7]:
1
%%sql2
3
DROP PIPELINE IF EXISTS test.stocks_pipeline;4
DROP TABLE IF EXISTS test.stocks;

Details
About this Template
This notebook guides you through data ingestion of CSV files from an AWS S3 location into your shared tier workspace.
This Notebook can be run in Standard and Enterprise deployments.
Tags
License
This Notebook has been released under the Apache 2.0 open source license.
See Notebook in action
Launch this notebook in SingleStore and start executing queries instantly.