I tried to load data using two PIPELINEs, but ER_LOCK_WAIT_TIMEOUT failed.
It was tested in the following scenario.
CREATE one FS PIPELINE using INTO PROCEDURE.
The Procedure inserts the data into a table.
I started the PIPELINE and measured the eapsed time, but it was bad.
- Split the file(single -> number of db partitoions), but nothing changed.
- Reduced or increaed the partitions, but nothing changed.
So, I modified the PIPELINE and started two PIPELINEs at the same time.
After 60 seconds, ER_LOCK_WAIT_TIMEOUT occurs and one PIPELINE failed.
I tried the test in various ways, but nothing changed.
- Creating two PROCEDUREs and using them for each PIPELINE, but nothing changed.
Rather than inserting one by one row
INSERT_ALL, INSERT INTO table VALUES (…), (…),…; Using both methods, the performance was good.
But, Like I said above,
If two(or more) PIPELINEs are started, all of the above INSERT methods also cannot avoid ER_LOCK_WAIT_TIMEOUT failed.
I think, This is a big issue.
If there are multiple data files to be INSERT into one table and the format is various,
To use multiple PROCEDUREs, need to creating and starting each PIPELINE for PROCEDUREs.
However, due to this problem, only one of the multiple PIPELINEs will succeed and the other will fail. (Of course, the failed PIPELINE will continue to be retried, so someday it will succeed…)
This means that if PIPELINEs using INTO PROCEDURE are inserted into a single table, the INSERT operation cannot be parallelized.
Is this lock(TABLE(?) LOCK) absolutely necessary?
I’ve seen past articles (ER_LOCK_WAIT_TIMEOUT when two pipelines are running concurrently), but
I think this is by no means a solution.
Any other solutions?