New

Managing Stage files with Fusion SQL

Notebook


SingleStore Notebooks

Managing Stage files with Fusion SQL

Fusion SQL can be used to manage your workspace groups and workspaces, but it can also be used to upload, download, and manage files in your workspace group or starter workspace Stage. We'll show you how to work with files in Stage in this notebook.

Displaying the Stage Fusion SQL commands

The SHOW FUSION COMMANDS displays the commands that are handled by the Fusion engine. You can use the LIKE to filter the commands.

In [1]:

1commands = %sql SHOW FUSION COMMANDS LIKE '%stage%'2for cmd in commands:3    print(*cmd, '\n')

Creating a workspace group

We'll start by creating a workspace group. We can get a region in the US by using the SHOW REGIONS command and the random package.

In [2]:

1import random2import secrets3
4us_regions = %sql SHOW REGIONS LIKE '%us%'5
6region_id = random.choice(us_regions).ID7region_id

In [3]:

1wg_name = 'Fusion Notebook'2password = secrets.token_urlsafe(20) + '-x&'

In [4]:

1%%sql2CREATE WORKSPACE GROUP '{{ wg_name }}'3    IN REGION ID '{{ region_id }}' WITH PASSWORD '{{ password }}'4    WITH FIREWALL RANGES '0.0.0.0/0'

Uploading and downloading Stage files

Uploading and downloading files to your Stage is easy with Fusion SQL. The commands are shown below.

DOWNLOAD STAGE FILE '<stage-path>' [ IN { ID '<deployment-id>' | '<deployment-name>' } ] [ TO '<local-path>' ]
    [ OVERWRITE ] [ ENCODING '<encoding>' ];

UPLOAD FILE TO STAGE '<stage-path>' [ IN { ID '<deployment-id>' | '<deployment-name>' } ] FROM '<local-path>' [ OVERWRITE ];

First we'll create a data file locally that we can work with.

In [5]:

1%%writefile mydata.csv2name,age,height3Sue,27,654Joe,32,705Max,44,696Ann,33,64

We can now upload our data file to our workspace group Stage.

In [6]:

1%%sql2UPLOAD FILE TO STAGE 'stats.csv' IN '{{ wg_name }}' FROM 'mydata.csv'

We can list the files in a Stage with the SHOW STAGE FILES command.

In [7]:

1%%sql2SHOW STAGE FILES IN '{{ wg_name }}'

Downloading the file is just as easy as uploading.

In [8]:

1%%sql2DOWNLOAD STAGE FILE 'stats.csv' IN '{{ wg_name }}' TO 'stats.csv' OVERWRITE

In [9]:

1!cat stats.csv

If you just want to display the contents of the Stage file without saving it to a local file, you simply leave the TO option off the DOWNLOAD STAGE FILE.

In [10]:

1%%sql2DOWNLOAD STAGE FILE 'stats.csv' IN '{{ wg_name }}' ENCODING 'utf-8'

Creating folders

Up to this point we have just worked with files at the root of our Stage. We can use Fusion SQL to create folder structures as well. This is done with the CREATE STAGE FOLDER command.

CREATE STAGE FOLDER '<stage-path>' [ IN { ID '<deployment-id>' | '<deployment-name>' } ] [ OVERWRITE ];

The following code will create this folder structure:

project-1/
project-1/data/
project-2/
project-2/data/

In [11]:

1for name in ['project-1', 'project-1/data', 'project-2', 'project-2/data']:2    %sql CREATE STAGE FOLDER '{{ name }}' IN '{{ wg_name }}';

In [12]:

1%%sql2SHOW STAGE FILES IN '{{ wg_name }}' RECURSIVE

Now that we have a folder structure we can put files into those folders.

In [13]:

1%%sql2UPLOAD FILE TO STAGE 'project-1/data/stats.csv' IN '{{ wg_name }}' FROM 'mydata.csv';3UPLOAD FILE TO STAGE 'project-2/data/stats.csv' IN '{{ wg_name }}' FROM 'mydata.csv';

Now when we do a recursive listing of our Stage, we'll see the newly created files.

In [14]:

1%%sql2SHOW STAGE FILES IN '{{ wg_name }}' RECURSIVE

We can list the files at a specific path as well.

In [15]:

1%%sql2SHOW STAGE FILES IN '{{ wg_name }}' AT 'project-2/data'

Loading data from Stage

We are going to load data from a Stage into a database table. For this, we need to have a workspace and a database.

In [16]:

1%%sql2CREATE WORKSPACE 'stage-loader' IN GROUP '{{ wg_name }}' WITH SIZE 'S-00' WAIT ON ACTIVE

In [17]:

1%%sql2SHOW WORKSPACES IN GROUP 'Fusion Notebook'

Action Required

Make sure to select the stage-loader workspace from the drop-down menu at the top of this notebook.

In [18]:

1%%sql2CREATE DATABASE IF NOT EXISTS stage_loader

Action Required

Make sure to select the stage_loader database from the drop-down menu at the top of this notebook. It updates the connection_url to connect to that database.

In [19]:

1%%sql2DROP TABLE IF EXISTS stats;3CREATE TABLE stats (4    name TEXT,5    age  INT,6    height INT7);

Load the data from the Stage using a pipeline.

In [20]:

1%%sql2CREATE PIPELINE IF NOT EXISTS stage_test3    AS LOAD DATA STAGE 'project-2/data/stats.csv'4    BATCH_INTERVAL 25005    SKIP DUPLICATE KEY ERRORS6    INTO TABLE stats7    IGNORE 1 LINES8    FIELDS TERMINATED BY ','9    LINES TERMINATED BY '\n'10    FORMAT CSV;11START PIPELINE stage_test FOREGROUND;12DROP PIPELINE stage_test;

We can now query the table and select the output into a Stage. Note that the GROUP BY 1 is used here to combine the outputs from all of the database partitions into a single file. If you don't use that, you'll get multiple output files, each with a portion of the result set.

In [21]:

1%%sql2SELECT * FROM stats GROUP BY 1 INTO STAGE 'project-3/data/stats.csv'3    FIELDS TERMINATED BY ','4    LINES TERMINATED BY '\n'

In [22]:

1%%sql2SHOW STAGE FILES IN '{{ wg_name }}' AT 'project-3' RECURSIVE

In [23]:

1%%sql2DOWNLOAD STAGE FILE 'project-3/data/stats.csv' ENCODING 'utf-8'

Deleting Stage files and folders

Files and folders can be deleted from a workspace Stage as well. This is done with the DROP STAGE FILE and DROP STAGE FOLDER commands.

DROP STAGE FILE '<stage-path>' [ IN { ID '<deployment-id>' | '<deployment-name>' } ];

DROP STAGE FOLDER '<stage-path>' [ IN { ID '<deployment-id>' | '<deployment-name>' } ] [ RECURSIVE ];

Let's delete the stats.csv file at the root of our Stage.

In [24]:

1%%sql2DROP STAGE FILE 'stats.csv' IN '{{ wg_name }}'

In [25]:

1%%sql2SHOW STAGE FILES IN '{{ wg_name }}'

Now let's delete the project-2 folder including all of the files in it.

In [26]:

1%%sql2DROP STAGE FOLDER 'project-2' IN '{{ wg_name }}' RECURSIVE

In [27]:

1%%sql2SHOW STAGE FILES IN '{{ wg_name }}' RECURSIVE

In [28]:

1%%sql2DROP STAGE FOLDER 'project-1' IN '{{ wg_name }}' RECURSIVE;3DROP STAGE FOLDER 'project-3' IN '{{ wg_name }}' RECURSIVE;

Conclusion

We have demonstrated how to create and delete files and folders in a workspace group Stage using Fusion SQL. Note that it also supports managing Stage for starter workspaces. It is also possible to work with Stage files using the SingleStoreDB Python SDK, see the API documentation for more details.

Details


About this Template

Learn how to manage your Stage files in SingleStoreDB Cloud using Fusion SQL.

This Notebook can be run in Standard and Enterprise deployments.

Tags

starterfusionpython

License

This Notebook has been released under the Apache 2.0 open source license.

See Notebook in action

Launch this notebook in SingleStore and start executing queries instantly.