CSV datetime string to DateTime field using Pipeline

manoj · May 12, 2022, 2:18am

After 2 days of many permutations trying to solve a simple data load I think I may get better success on this forum.

I have a DateTime field in my Columnstore table. I am trying to import 1 row of data via a pipeline.

Here is part of the Pipeline that matters (I changed the field name to post here).

@un_hi_rec_id,
@my_ts)
SET my_ts = STR_TO_DATE(@my_ts, ‘%Y-%m-%d %h:%i:%s’),
un_hi_rec_id = @my_ts;

the field in my CSV file that maps to my_ts looks like this,
2019-06-18 19:10:19
I made the datetime string format match the SingleStore DateTime string representation.

This result of the Pipeline running is NULL for the my_ts field and proper values on the un_hi_rec_id field which I use for testing this out. The un_hi_rec_id is a TEXT data type.

.So I tried the following

SET my_ts= CONVERT(@my_ts, datetime),
un_hi_rec_id = @my_ts;

Did not work. Got NULL on the my_ts field.

However, this following did work.

SET my_ts= now(),
un_hi_rec_id = @my_ts;

Below is the string field to be loaded in CSV and the now() result after is goes into my my_ts field.

2019-06-18 19:10:19 from csv which results in NULL
2022-05-11 19:08:40 now() which loads properly

My eyes are not what they used to be but the datetime string field I have in my csv file looks identical to the result of the now() function.

How do you get a datetime string value in a csv file into a datetime field in a table using a Pipeline?

manoj · May 12, 2022, 2:27am

SET my_ts= date_format(‘2019-06-18 19:10:19’, ‘%Y-%m-%d %h:%i:%s’),
un_hi_rec_id = @my_ts;

The above works with the hardcoded datetime string.

SET my_ts = date_format(@my_ts, ‘%Y-%m-%d %h:%i:%s’),
un_hi_rec_id = @my_ts;

This sets it to NULL

manoj · May 13, 2022, 5:40pm

So it seems there is a BUG in the Pipeline that is causing me all this grief. The WORKAROUND is the following:

Can anyone translate this to CSV from Parquet. I have tried various permutations for CSV but no success yet (not sure if this will ever work).

BTW - there is supposed to be a bug fix for this in a 7.8.* release soon. This is the second huge roadblock by SIngleStore in me getting my data loaded. Hopefully this is not a pattern. The other issue was running a Pipeline on the Azure GOV cloud (not supported).

manoj · May 13, 2022, 5:45pm

What astonishes me is how have people been loading DATETIME values from a Pipeline using a file. I would have imagined this would have been a fundamental requirement and fixed on version 1.0.

The fact that using the function now() does set the DATETIME value indicates that the plumbing to set the DATETIME already exists but is not implemented correctly for Pipelines using the @var syntax.

manoj · May 13, 2022, 7:31pm

I figured out what they wanted me to do for the CSV. So here is the part of the solution for anyone who will encounter this.

When you are generating your CSV file you need to create a new field that represents the number of microseconds since the EPOCH (some unix crap). Following code is .NET C#

DateTime my_ts = Convert.ToDateTime(fields[fields.Length - 2]);

Int64 unixTimestamp = (Int64)my_ts.Subtract(new DateTime(1970, 1, 1)).TotalMilliseconds * 1000;

fields[fields.Length - 2] = my_ts.ToString(“yyyy-MM-dd HH:mm:ss”);
fields[fields.Length - 1] = unixTimestamp.ToString();

The last 4 fields of my CSV look like this:

5b5a410690e25d8bb, 2, 2019-06-18 19:10:19, 1560885019000000

Select Cast(timestampadd(MICROSECOND, 1560885026000000, from_unixtime(0)) as DateTime); = 2019-06-18 11:10:26 THIS IS CORRECT.

HOWEVER, in my SingleStore table’s my_ts field what should be ‘2019-06-18 11:10:26’ is actually ‘1969-12-31 16:00:00’ NOT GOOD

Here is part of my Pipeline code.

Create Pipeline etc …
.
@my_ts,
@my_ts_unix)
set my_ts = Cast(timestampadd(MICROSECOND, @my_ts_unix, from_unixtime(0)) as DateTime),
my_ts_unix = @my_ts_unix;

manoj · May 17, 2022, 12:07am

Just for fun I tried to use a UDF to see if this could work around any bug that I was encountering. No luck.

DELIMITER //

CREATE OR REPLACE FUNCTION ToDateTime (fromUnixTime LONG Not NULL) RETURNS DateTime AS

BEGIN

RETURN FROM_UNIXTIME(fromUnixTime/1000000);

END //

DELIMITER ;

MariaSilverhardt · May 23, 2022, 8:10pm

Hi Manoj! Awesome work! Looks like you solved this issue in your second to last post. If that’s the case, I’m going to mark this issue as resolved. Just want to check in and make sure with you.
Thanks

manoj · May 23, 2022, 8:29pm

This issue is still open and a real problem. It would be great if some over there can try to duplicate this simple use case.

manoj · May 23, 2022, 10:42pm

BTW - I am doing this on AZURE and not Amazon S3.

manoj · May 28, 2022, 2:20pm

The solution to this issue which I think is also a bit of a bug is to make sure the CSV string fed into the Pipeline is not UNICODE. I say a bit of a bug because if all the tables data types are of type TEXT then the Pipeline will work. However, if a Datetime is used then it will not work with UNICODE. My code below is the simple app code that I use to create a CSV string of the data I will be sending to the Pipeline.

                BlobClient blob = container.GetBlobClient(blobName);

                var dataArray = System.Text.Encoding.Unicode.GetBytes(data);
                using (var stream = new MemoryStream(dataArray))
                {
                    stream.Position = 0;
                    Response<BlobContentInfo> response = blob.Upload(stream);
                }

*********************************** Change To ************************************

                BlobClient blob = container.GetBlobClient(blobName);

                var dataArray = System.Text.Encoding.ASCII.GetBytes(data);
                using (var stream = new MemoryStream(dataArray))
                {
                    stream.Position = 0;
                    Response<BlobContentInfo> response = blob.Upload(stream);
                }