Query History in SingleStore

Where real-time data meets precision: Introducing SingleStore's query event tracing.

When data moves at lightning speed, the ability to not only capture but also understand and optimize real-time applications is vital as it becomes imperative to not only ingest the data, but ensure that queries are being executed exactly as you expect them to be. With SingleStore's query history, you can now extract your query workloads as they occur with the full query text, execution time, query owner and database your queries are being executed on — all in real time, without any impact to the performance expected from SingleStore.

How query event tracing works

When query event tracing is enabled, completed queries are traced into a ring buffer that can be consumed by SQL queries to analyze the traces. While enabled for SingleStore’s Managed Service by default, you can do so on your self-managed cluster by enabling the event tracing setting for the size of the ring buffer by setting trace_events_queue_size. For instance, a 16 MB ring buffer on every node in a cluster is set with the following query:

SET GLOBAL trace_events_queue_size = 1600000;
/* OR */
SET GLOBAL trace_events_queue_size = 16 * 1024 * 1024;

Next, we need to enable and configure an event trace. Since we want to trace queries — and supposing we want a history of queries that take over one second in total runtime — we can enable query tracing across the cluster by sending this query:

CREATE EVENT TRACE Query_completion WITH (Query_text = on,
Duration_threshold_ms = 1000);

This configuration will also trace failed query executions regardless of their duration and traces the full, sanitized query text. Some of the query information included in the trace includes the plan hash, connection id, resource pool and error message (if applicable).

The event traces are exposed in the information schema via the MV_TRACE_EVENTS view, which can be queried using basic DML queries. The view's interface is not designed to be user facing, but rather for external services that regularly query the view to easily consume and analyze the semi-structured data. SingleStore's monitoring service is set up to do all of this already!

Example use case

An unexpected cluster slow down

In this example, we have an application that regularly sends out queries to the table, expedited_orders, consisting of both transactional and analytical workloads. The table has a simple schema:

CREATE ROWSTORE TABLE if not exists expedited_orders(
order_id int,
order_date date,
key(order_id),
shard(order_id));

Unexpectedly, a significant application slowdown occurs at 2 a.m. when the workloads pick up. Looking at the Resource Utilization dashboard, we notice spikes in CPU usage, Disk I/O and average elapsed time per execution confirming high traffic, demanding workloads at that same time.

There's another visual dashboard that comes in handy in this situation: query history. Query history is powered by the new event tracing feature introduced in SingleStore 8.5, which is configured to trace expensive query executions — in this case, queries that take over one second to execute, configured as such:

CREATE EVENT TRACE Query_completion WITH (Query_text = on,
Duration_threshold_ms = 1000);

These query traces can be filtered by time range in the query history user interface. Filtering our time range to the time of the application slowdown, we can then sort the executed queries by descending order of the queries' elapsed time.

Doing so reveals our most expensive queries, and their full query texts reveal important parameters of that query's execution. We notice that one particular query — which is part of our analytical workload — is taking significantly longer time to execute than expected, and this query is frequently executed:

       SELECT order_date,count(order_id) FROM expedited_orders where order_date < "2024-01-01" and
order_date > "2023-12-25" group by order_date;

Using Visual Profile, we can re-run that exact query execution and get a visualization of its performance:

The query's profile tells us that the vast majority of query execution time goes into scanning and filtering rows — that's a strenuous query execution for the amount of data we're expecting the query to scan over to return results matching the query parameters. This scanning and filtering time can be mostly eliminated by creating an index on order_date (CREATE INDEX date_index ON expedited_orders (order_date);), allowing for much more efficient query execution:

Tracing the entire workload

Now, with significantly more efficient query execution, we can expect the application slowdown that occurs at 2 a.m. to improve considerably. However, we want to be entirely sure this query running against a column without an index was the culprit of our unexpected slowdown. Thus, we can reconfigure the query execution event tracing to trace all queries temporarily for the duration of our expected slowdown:

DROP EVENT TRACE Query_completion;

CREATE EVENT TRACE Query_completion WITH (Query_text = on,
Duration_threshold_ms = 0);

This configuration is not recommended outside of workload debugging purposes: Duration_threshold_ms being set to 0 means every single query execution will be traced! This leads to a lot of spammy query traces for query executions that are completely normal. But limiting the configuration to the specific workload that's being troubleshooted is an excellent use case for such a heavy-weight configuration!

Looking back the next morning, the application slowdown at 2 a.m. is resolved as a result of optimizing the expedited_orders table to have an index on a column very commonly used to filter results. However, Resource Utilization monitoring reveals there were still CPU and Disk I/O spikes at this time. They didn't saturate the server hardware, but there was little room left for more queries before the application would slow down again.

Query history reveals why: since event tracing was configured to trace every query during the slowdown period, query history now has every query execution that occurred in that period — a quick analysis of the workload reveals that a query with plan_id (insert_expedited_orders_491ac2e7850ed533) was executed many times relative to other queries:

INSERT INTO expedited_orders VALUES(t, now());

For various values of t., no execution was suspiciously slow. Rather there were several of these executions, revealing to us that the 2a.m. workload consumes much of the cluster's resources on numerous executions of a simple, fast query — this is the application's transactional workload at play!

This is very valuable information: it tells us that we need to either tune the application to space out the transactional workload over a longer time frame, or be ready to dynamically scale out that workload's resources when throughput picks up.

Summary

SingleStore's query history emerges as a powerful tool for enhancing the operational efficiency of modern applications in real time. The provided example showcases its effectiveness in diagnosing and addressing unexpected cluster slowdowns. By enabling users to trace queries, analyze execution details and visualize performance profiles, SingleStore empowers developers to proactively detect and resolve issues, leading to optimized query execution and resource utilization.

SingleStore's query history stands as a valuable asset for organizations seeking precision and agility in managing their real-time workloads.