PIPELINE SKIP PARSER ERRORS with JSON

jisung.park · March 3, 2021, 8:52am

CREATE PIPELINE pipe_json AS
LOAD DATA KAFKA ‘…’
SKIP PARSER ERRORS
INTO TABLE comment
(
…
)
FORMAT json

When creating a PIPELINE, an error occurs as follows.

SQL Error [1706] [HY000]: Feature ‘SKIP PARSER ERRORS with FORMAT JSON’ is not supported by MemSQL.

Tell me if you are planning to support FROMAT JSON for SKIP PARASER ERRORS.
Instead of SKIP PARASER ERRORS, tell me how to ignore the PIPELINE error data and read the following data.

sasha · March 5, 2021, 8:59pm

There are no plans to support SKIP PARSER ERRORS with JSON. That mode is supported only for CSV, in which case it causes us to attempt to recover after discovering that the input data is in some sense invalid CSV, as far as we’re concerned.

Do you indeed want the pipeline to attempt to keep loading after hitting malformed JSON, or are you perhaps more interested in “non-parser” errors, e.g. with the SKIP DUPLICATE KEY ERRORS or SKIP CONSTRAINT ERRORS clauses?

In the first case, the best option is likely to set the global variable pipelines_stop_on_error to false. That will make it so that if pipelines hit a parser error, they’ll output an error to information_schema.pipelines_errors, skip the problematic file, and then continue loading from the next file.

jisung.park · March 8, 2021, 5:43am

Is there any way to perform the rest of the data normally except for the error data in one batch?

maximg · November 15, 2022, 11:42am

I have the same problem. Would also like to know whether there’s a way.