Specify batch_size for pipelines

MemSQL pipelines offer a few settings at the moment: BATCH_INTERVAL, MAX_PARTITIONS_PER_BATCH, etc.

Requesting for BATCH_SIZE option as well.

Got it. Can you elaborate on what your scenario is and why you want to change BATCH_SIZE, what you want to set it to, and so forth?

I did open an internal feature request to track this.

Thank you.

In most of our cases, there is a steady flow of messages (usually <100 per batch) for the pipelines to deal with but there are times when there is a flood of 200,000+ messages. Not going to bore you with the details of why that happens… it is a business requirement.

When there is such a flood, we noticed that the cluster runs of memory, batch fails & the cluster restarts (probably because it crashed?). I now have to figure out custom solutions to deal with these messages. Not to mention, this is a disruption to it’s availability.

Yes, we have doubled the memory… and still faced issues. Regardless, for an event that happens every once in a while, it is not economical to leave all that unused memory.

If we are able to set MAX_BATCH_SIZE (or BATCH_SIZE), I am hoping it will cover our typical & the atypical scenarios.

Got it. I’ll make a note of this on our internal feature request for this.

So how are these 200,000+ messages represented during the “flood?” Are they individual records in a single text fail sent into the pipeline?

They are individual messages in a kafka topic.

Are you using pipelines to SPs or regular pipelines when you get OOM during the “flood?”

Due to lack of support for transformations at Helios, we had to resort to SPs.