Select into GCS Performance

Hello,

I’m exporting data from single store to GCS using the select into GCS functionality however I’m not too satisfied with the performance and I’m looking for ways to improve it.
I’m using c# .net with the single store connector package.

I’ve tried selects with and without a limit statement because I noticed that the limit statement makes single store create just one file instead of one file per partition when I run without the limit.

At first it was running in about 7 to 8 minutes with a dataset of 20 million rows without compression.

With compression and limit statement I got it down to around 2 minutes. I’m using limit with the highest possible number (18446744073709551615) since I didn’t really want to limit the result but I’m using in order to get a single file in the export.

If I remove the limit statement I can get it to run in about 50 something seconds with 10 files in the output but still it is not great I guess, the compressed files are about 198MB and decompressed it is close to 3GB. With compression I was expecting that it would run in under 10 seconds or so.

Is there anything I can do to improve performance? Ideally I wanted to get one file with all 20M rows but if performance is better with an output of multiple files then I guess we can work around that.

Also is there any built in functionality to add the column headers to the file exported? I will dynamically select the table I’m exporting so unless I do some dynamic stuff to select the fields from the schema I can’t really add the column headers.

Thanks in advance!