Question 1

What are pipelines in SingleStore?

Accepted Answer

Pipelines in SingleStore are mechanisms for ingesting data from sources like Kafka, S3, or Google Cloud Storage directly into database tables. They can also perform ETL operations using stored procedures for data transformation during ingestion.

Question 2

How can I tune ingestion performance for pipelines in SingleStore?

Accepted Answer

Ingestion performance can be tuned using engine variables at both global and pipeline levels. These settings control aspects like concurrency, batch size, retry behavior, and error handling, allowing optimization for specific data sources and workloads.

Question 3

What are some key pipeline-related engine variables?

Accepted Answer

Commonly tuned variables include: pipelines_max_concurrent: Controls the number of concurrent pipelines. pipelines_max_offsets_per_batch_partition: Determines batch size for Kafka pipelines. pipelines_batch_interval: Sets the batch interval in milliseconds. pipelines_stop_on_error: Determines whether pipelines halt on ingestion errors.

Question 4

What’s the difference between global and per-pipeline configuration?

Accepted Answer

Global variables apply to all pipelines by default, while per-pipeline configurations override global settings for individual pipelines. Per-pipeline values can be defined during creation or updated later using the  ALTER PIPELINE command.

Question 5

How does MAX_PARTITIONS_PER_BATCH affect ingestion performance?

Accepted Answer

This setting controls the maximum number of source partitions processed in parallel. Increasing it allows higher ingestion parallelism, but should align with the number of SingleStore database partitions for optimal performance.

Question 6

How can I ensure exactly-once delivery in pipelines?

Accepted Answer

Enable the engine variable  pipelines_stored_proc_exactly_once. This guarantees that data transformations and loads occur exactly once per message, even in cases of retries or failures.

Question 7

What happens if I set pipelines_max_offsets_per_batch_partition too high or too low?

Accepted Answer

Too low: More batches are created, adding overhead and slowing ingestion. Too high: May cause data skew, especially with keyless sharding or few Kafka partitions. Balancing batch size is key for stability and performance.

Question 8

How do advanced engine variables enhance pipeline capabilities?

Accepted Answer

Variables like  advanced_hdfs_pipelines,  enable_eks_irsa, and  subprocess_ec2_metadata_timeout_ms enable advanced features such as Kerberos authentication, EKS IAM role support, and extended EC2 credential fetch time—making pipelines more secure and flexible in enterprise environments.

Question 9

Where can I find the full list of pipeline-related engine variables?

Accepted Answer

A comprehensive list of engine variables and configuration details is available in the SingleStore Documentation, including usage examples and best practices for tuning ingestion pipelines.

Question 10

What’s the best practice for balancing Kafka partitions and SingleStore partitions?

Accepted Answer

For optimal performance, align the number of database partitions with Kafka partitions. For example, if you have 16 Kafka partitions, creating 16 SingleStore partitions enables full parallel ingestion in a single batch.

Pipeline Tuning with Engine Variables

Ingestion tuning using some engine variables:

Let’s break down some real-time scenarios:

What are pipelines in SingleStore?

How can I tune ingestion performance for pipelines in SingleStore?

What are some key pipeline-related engine variables?

What’s the difference between global and per-pipeline configuration?

How does MAX_PARTITIONS_PER_BATCH affect ingestion performance?

How can I ensure exactly-once delivery in pipelines?

What happens if I set pipelines_max_offsets_per_batch_partition too high or too low?

How do advanced engine variables enhance pipeline capabilities?

Where can I find the full list of pipeline-related engine variables?

What’s the best practice for balancing Kafka partitions and SingleStore partitions?

On this page

Start building now

Start building with SingleStore

Explore more resources

Pipeline Tuning with Engine Variables

Ingestion tuning using some engine variables:

Let’s break down some real-time scenarios:

What are pipelines in SingleStore?

How can I tune ingestion performance for pipelines in SingleStore?

What are some key pipeline-related engine variables?

What’s the difference between global and per-pipeline configuration?

How does MAX_PARTITIONS_PER_BATCH affect ingestion performance?

How can I ensure exactly-once delivery in pipelines?

What happens if I set pipelines_max_offsets_per_batch_partition too high or too low?

How do advanced engine variables enhance pipeline capabilities?

Where can I find the full list of pipeline-related engine variables?

What’s the best practice for balancing Kafka partitions and SingleStore partitions?

On this page

Start building now

Related reading

Start building with SingleStore

Explore more resources