Managing Stage files with Fusion SQL
Notebook
Fusion SQL can be used to manage your workspace groups and workspaces, but it can also be used to upload, download, and manage files in your workspace group Stage. We'll show you how to work with files in Stage in this notebook.
Displaying the Stage Fusion SQL commands
The SHOW FUSION COMMANDS
displays the commands that are handled by the Fusion
engine. You can use the LIKE
to filter the commands.
In [1]:
commands = %sql SHOW FUSION COMMANDS LIKE '%stage%'for cmd in commands:print(*cmd, '\n')
Creating a workspace group
We'll start by creating a workspace group. We can get a region in the US by using the SHOW REGIONS
command and the random
package.
In [2]:
import randomimport secretsus_regions = %sql SHOW REGIONS LIKE '%us%'region_id = random.choice(us_regions).IDregion_id
In [3]:
wg_name = 'Fusion Notebook'password = secrets.token_urlsafe(20) + '-x&'
In [4]:
%%sqlCREATE WORKSPACE GROUP '{{ wg_name }}'IN REGION ID '{{ region_id }}' WITH PASSWORD '{{ password }}'WITH FIREWALL RANGES '0.0.0.0/0'
Uploading and downloading Stage files
Uploading and downloading files to your Stage is easy with Fusion SQL. The commands are shown below.
DOWNLOAD STAGE FILE '<stage-path>' [ IN GROUP { ID '<group-id>' | '<group-name>' } ] [ TO '<local-path>' ]
[ OVERWRITE ] [ ENCODING '<encoding>' ];
UPLOAD FILE TO STAGE '<stage-path>' [ IN GROUP { ID '<group-id>' | '<group-name>' } ] FROM '<local-path>' [ OVERWRITE ];
First we'll create a data file locally that we can work with.
In [5]:
%%writefile mydata.csvname,age,heightSue,27,65Joe,32,70Max,44,69Ann,33,64
We can now upload our data file to our workspace group Stage.
In [6]:
%%sqlUPLOAD FILE TO STAGE 'stats.csv' IN GROUP '{{ wg_name }}' FROM 'mydata.csv'
We can list the files in a Stage with the SHOW STAGE FILES
command.
In [7]:
%%sqlSHOW STAGE FILES IN GROUP '{{ wg_name }}'
Downloading the file is just as easy as uploading.
In [8]:
%%sqlDOWNLOAD STAGE FILE 'stats.csv' IN GROUP '{{ wg_name }}' TO 'stats.csv' OVERWRITE
In [9]:
!cat stats.csv
If you just want to display the contents of the Stage file without saving it to a local
file, you simply leave the TO
option off the DOWNLOAD STAGE FILE
.
In [10]:
%%sqlDOWNLOAD STAGE FILE 'stats.csv' IN GROUP '{{ wg_name }}' ENCODING 'utf-8'
Creating folders
Up to this point we have just worked with files at the root of our Stage. We can use Fusion SQL
to create folder structures as well. This is done with the CREATE STAGE FOLDER
command.
CREATE STAGE FOLDER '<stage-path>' [ IN GROUP { ID '<group-id>' | '<group-name>' } ] [ OVERWRITE ];
The following code will create this folder structure:
project-1/
project-1/data/
project-2/
project-2/data/
In [11]:
for name in ['project-1', 'project-1/data', 'project-2', 'project-2/data']:%sql CREATE STAGE FOLDER '{{ name }}' IN GROUP '{{ wg_name }}';
In [12]:
%%sqlSHOW STAGE FILES IN GROUP '{{ wg_name }}' RECURSIVE
Now that we have a folder structure we can put files into those folders.
In [13]:
%%sqlUPLOAD FILE TO STAGE 'project-1/data/stats.csv' IN GROUP '{{ wg_name }}' FROM 'mydata.csv';UPLOAD FILE TO STAGE 'project-2/data/stats.csv' IN GROUP '{{ wg_name }}' FROM 'mydata.csv';
Now when we do a recursive listing of our Stage, we'll see the newly created files.
In [14]:
%%sqlSHOW STAGE FILES IN GROUP '{{ wg_name }}' RECURSIVE
We can list the files at a specific path as well.
In [15]:
%%sqlSHOW STAGE FILES IN GROUP '{{ wg_name }}' AT 'project-2/data'
Loading data from Stage
We are going to load data from a Stage into a database table. For this, we need to have a workspace and a database.
In [16]:
%%sqlCREATE WORKSPACE 'stage-loader' IN GROUP '{{ wg_name }}' WITH SIZE 'S-00' WAIT ON ACTIVE
In [17]:
%%sqlSHOW WORKSPACES IN GROUP 'Fusion Notebook'
Action Required
Make sure to select the stage-loader workspace from the drop-down menu at the top of this notebook.
In [18]:
%%sqlCREATE DATABASE IF NOT EXISTS stage_loader
Action Required
Make sure to select the stage_loader database from the drop-down menu at the top of this notebook. It updates the connection_url to connect to that database.
In [19]:
%%sqlDROP TABLE IF EXISTS stats;CREATE TABLE stats (name TEXT,age INT,height INT);
Load the data from the Stage using a pipeline.
In [20]:
%%sqlCREATE PIPELINE IF NOT EXISTS stage_testAS LOAD DATA STAGE 'project-2/data/stats.csv'BATCH_INTERVAL 2500SKIP DUPLICATE KEY ERRORSINTO TABLE statsIGNORE 1 LINESFIELDS TERMINATED BY ','LINES TERMINATED BY '\n'FORMAT CSV;START PIPELINE stage_test FOREGROUND;DROP PIPELINE stage_test;
We can now query the table and select the output into a Stage. Note that the
GROUP BY 1
is used here to combine the outputs from all of the database partitions
into a single file. If you don't use that, you'll get multiple output files,
each with a portion of the result set.
In [21]:
%%sqlSELECT * FROM stats GROUP BY 1 INTO STAGE 'project-3/data/stats.csv'FIELDS TERMINATED BY ','LINES TERMINATED BY '\n'
In [22]:
%%sqlSHOW STAGE FILES IN GROUP '{{ wg_name }}' AT 'project-3' RECURSIVE
In [23]:
%%sqlDOWNLOAD STAGE FILE 'project-3/data/stats.csv' ENCODING 'utf-8'
Deleting Stage files and folders
Files and folders can be deleted from a workspace Stage as well.
This is done with the DROP STAGE FILE
and DROP STAGE FOLDER
commands.
DROP STAGE FILE '<stage-path>' [ IN GROUP { ID '<group-id>' | '<group-name>' } ];
DROP STAGE FOLDER '<stage-path>' [ IN GROUP { ID '<group-id>' | '<group-name>' } ] [ RECURSIVE ];
Let's delete the stats.csv
file at the root of our Stage.
In [24]:
%%sqlDROP STAGE FILE 'stats.csv' IN GROUP '{{ wg_name }}'
In [25]:
%%sqlSHOW STAGE FILES IN GROUP '{{ wg_name }}'
Now let's delete the project-2
folder including all of the files in it.
In [26]:
%%sqlDROP STAGE FOLDER 'project-2' IN GROUP '{{ wg_name }}' RECURSIVE
In [27]:
%%sqlSHOW STAGE FILES IN GROUP '{{ wg_name }}' RECURSIVE
In [28]:
%%sqlDROP STAGE FOLDER 'project-1' IN GROUP '{{ wg_name }}' RECURSIVE;DROP STAGE FOLDER 'project-3' IN GROUP '{{ wg_name }}' RECURSIVE;
Conclusion
We have demonstrated how to create and delete files and folders in a workspace group Stage using Fusion SQL. It is also possible to work with Stage files using the SingleStoreDB Python SDK, see the API documentation for more details.
Details
About this Template
Learn how to manage your Stage files in SingleStoreDB Cloud using Fusion SQL.
This Notebook can be run in Standard and Enterprise deployments.
Tags
License
This Notebook has been released under the Apache 2.0 open source license.