New

Managing Stage files with Fusion SQL

Notebook

SingleStore Notebooks

Managing Stage files with Fusion SQL

Fusion SQL can be used to manage your workspace groups and workspaces, but it can also be used to upload, download, and manage files in your workspace group Stage. We'll show you how to work with files in Stage in this notebook.

Displaying the Stage Fusion SQL commands

The SHOW FUSION COMMANDS displays the commands that are handled by the Fusion engine. You can use the LIKE to filter the commands.

In [1]:

commands = %sql SHOW FUSION COMMANDS LIKE '%stage%'
for cmd in commands:
print(*cmd, '\n')

Creating a workspace group

We'll start by creating a workspace group. We can get a region in the US by using the SHOW REGIONS command and the random package.

In [2]:

import random
import secrets
us_regions = %sql SHOW REGIONS LIKE '%us%'
region_id = random.choice(us_regions).ID
region_id

In [3]:

wg_name = 'Fusion Notebook'
password = secrets.token_urlsafe(20) + '-x&'

In [4]:

%%sql
CREATE WORKSPACE GROUP '{{ wg_name }}'
IN REGION ID '{{ region_id }}' WITH PASSWORD '{{ password }}'
WITH FIREWALL RANGES '0.0.0.0/0'

Uploading and downloading Stage files

Uploading and downloading files to your Stage is easy with Fusion SQL. The commands are shown below.

DOWNLOAD STAGE FILE '<stage-path>' [ IN GROUP { ID '<group-id>' | '<group-name>' } ] [ TO '<local-path>' ]
    [ OVERWRITE ] [ ENCODING '<encoding>' ];

UPLOAD FILE TO STAGE '<stage-path>' [ IN GROUP { ID '<group-id>' | '<group-name>' } ] FROM '<local-path>' [ OVERWRITE ];

First we'll create a data file locally that we can work with.

In [5]:

%%writefile mydata.csv
name,age,height
Sue,27,65
Joe,32,70
Max,44,69
Ann,33,64

We can now upload our data file to our workspace group Stage.

In [6]:

%%sql
UPLOAD FILE TO STAGE 'stats.csv' IN GROUP '{{ wg_name }}' FROM 'mydata.csv'

We can list the files in a Stage with the SHOW STAGE FILES command.

In [7]:

%%sql
SHOW STAGE FILES IN GROUP '{{ wg_name }}'

Downloading the file is just as easy as uploading.

In [8]:

%%sql
DOWNLOAD STAGE FILE 'stats.csv' IN GROUP '{{ wg_name }}' TO 'stats.csv' OVERWRITE

In [9]:

!cat stats.csv

If you just want to display the contents of the Stage file without saving it to a local file, you simply leave the TO option off the DOWNLOAD STAGE FILE.

In [10]:

%%sql
DOWNLOAD STAGE FILE 'stats.csv' IN GROUP '{{ wg_name }}' ENCODING 'utf-8'

Creating folders

Up to this point we have just worked with files at the root of our Stage. We can use Fusion SQL to create folder structures as well. This is done with the CREATE STAGE FOLDER command.

CREATE STAGE FOLDER '<stage-path>' [ IN GROUP { ID '<group-id>' | '<group-name>' } ] [ OVERWRITE ];

The following code will create this folder structure:

project-1/
project-1/data/
project-2/
project-2/data/

In [11]:

for name in ['project-1', 'project-1/data', 'project-2', 'project-2/data']:
%sql CREATE STAGE FOLDER '{{ name }}' IN GROUP '{{ wg_name }}';

In [12]:

%%sql
SHOW STAGE FILES IN GROUP '{{ wg_name }}' RECURSIVE

Now that we have a folder structure we can put files into those folders.

In [13]:

%%sql
UPLOAD FILE TO STAGE 'project-1/data/stats.csv' IN GROUP '{{ wg_name }}' FROM 'mydata.csv';
UPLOAD FILE TO STAGE 'project-2/data/stats.csv' IN GROUP '{{ wg_name }}' FROM 'mydata.csv';

Now when we do a recursive listing of our Stage, we'll see the newly created files.

In [14]:

%%sql
SHOW STAGE FILES IN GROUP '{{ wg_name }}' RECURSIVE

We can list the files at a specific path as well.

In [15]:

%%sql
SHOW STAGE FILES IN GROUP '{{ wg_name }}' AT 'project-2/data'

Loading data from Stage

We are going to load data from a Stage into a database table. For this, we need to have a workspace and a database.

In [16]:

%%sql
CREATE WORKSPACE 'stage-loader' IN GROUP '{{ wg_name }}' WITH SIZE 'S-00' WAIT ON ACTIVE

In [17]:

%%sql
SHOW WORKSPACES IN GROUP 'Fusion Notebook'

Action Required

Make sure to select the stage-loader workspace from the drop-down menu at the top of this notebook.

In [18]:

%%sql
CREATE DATABASE IF NOT EXISTS stage_loader

Action Required

Make sure to select the stage_loader database from the drop-down menu at the top of this notebook. It updates the connection_url to connect to that database.

In [19]:

%%sql
DROP TABLE IF EXISTS stats;
CREATE TABLE stats (
name TEXT,
age INT,
height INT
);

Load the data from the Stage using a pipeline.

In [20]:

%%sql
CREATE PIPELINE IF NOT EXISTS stage_test
AS LOAD DATA STAGE 'project-2/data/stats.csv'
BATCH_INTERVAL 2500
SKIP DUPLICATE KEY ERRORS
INTO TABLE stats
IGNORE 1 LINES
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
FORMAT CSV;
START PIPELINE stage_test FOREGROUND;
DROP PIPELINE stage_test;

We can now query the table and select the output into a Stage. Note that the GROUP BY 1 is used here to combine the outputs from all of the database partitions into a single file. If you don't use that, you'll get multiple output files, each with a portion of the result set.

In [21]:

%%sql
SELECT * FROM stats GROUP BY 1 INTO STAGE 'project-3/data/stats.csv'
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'

In [22]:

%%sql
SHOW STAGE FILES IN GROUP '{{ wg_name }}' AT 'project-3' RECURSIVE

In [23]:

%%sql
DOWNLOAD STAGE FILE 'project-3/data/stats.csv' ENCODING 'utf-8'

Deleting Stage files and folders

Files and folders can be deleted from a workspace Stage as well. This is done with the DROP STAGE FILE and DROP STAGE FOLDER commands.

DROP STAGE FILE '<stage-path>' [ IN GROUP { ID '<group-id>' | '<group-name>' } ];

DROP STAGE FOLDER '<stage-path>' [ IN GROUP { ID '<group-id>' | '<group-name>' } ] [ RECURSIVE ];

Let's delete the stats.csv file at the root of our Stage.

In [24]:

%%sql
DROP STAGE FILE 'stats.csv' IN GROUP '{{ wg_name }}'

In [25]:

%%sql
SHOW STAGE FILES IN GROUP '{{ wg_name }}'

Now let's delete the project-2 folder including all of the files in it.

In [26]:

%%sql
DROP STAGE FOLDER 'project-2' IN GROUP '{{ wg_name }}' RECURSIVE

In [27]:

%%sql
SHOW STAGE FILES IN GROUP '{{ wg_name }}' RECURSIVE

In [28]:

%%sql
DROP STAGE FOLDER 'project-1' IN GROUP '{{ wg_name }}' RECURSIVE;
DROP STAGE FOLDER 'project-3' IN GROUP '{{ wg_name }}' RECURSIVE;

Conclusion

We have demonstrated how to create and delete files and folders in a workspace group Stage using Fusion SQL. It is also possible to work with Stage files using the SingleStoreDB Python SDK, see the API documentation for more details.

Details

Tags

#starter#fusion#python

License

This Notebook has been released under the Apache 2.0 open source license.