Utilization Monitoring in SingleStore Helios

SingleStoreDB is the database of choice for developers building immersive applications that require real-time analytics.

To fully optimize applications developers, administrators and users alike need to understand the current system performance — as well as how to tune their queries SingleStoreDB now has enhanced native monitoring capabilities that allow users to easily visualize performance, identify potential bottlenecks, and tune and optimize queries to maximize performance as workloads scale.

Let’s get into the specifics of the monitoring capabilities we offer for our cloud customers by Workspaces ( a collection of compute resources).

v-cpu-utilizationvCPU Utilization

vCPU utilization can help identify performance bottlenecks, optimize resource usage and proactively address issues before they cause any disruption.

To see performance across many vCPUs we show the overall compute load, as well as max and min to identify when workloads are unevenly distributed across the workspace.

Figure 1: CPU utilization

In Figure 1, the workspace cluster is sized as an S-4, which gives 32 cores.

For this workspace at a given point in time:

  • Max-single-core-load reports the maximum CPU load across all available vCPU cores
  • Avg-single-core-load reports the average CPU load across all available vCPU cores
  • Min-single-core-load reports the minimum CPU load across all available vCPU cores

A large number of spikes are encouraged, as that implies maximum and optimum usage of the CPU without wasting any CPU resources or having idle times of no activity (low utilization).

Figure 2: CPU pinned

A pinned line hovering at or near 100% indicates that the system is maxed out, implying we either need to scale the workspace up, or tune long-running queries.

A noticeable difference between the max single-core load and avg single-core load may indicate a data skew (refer to figure 2) and proactive measures should be taken to redistribute the data evenly by changing shard keys to avoid hot spotting. However, data skew may not be the only reason; sometimes it's the nature of queries and in those cases, query rewriting is advised. Resource pools can also be set up so long-running queries do not hold resources affecting short tactical queries.

Default threshold:

  • Info alert →  vCPU(single core maximum) > 80% for 10 min
  • Warning alert →  vCPU(single core maximum) > 90% for 5 min
  • Critical alert →  vCPU(single core maximum) > 95% for 5 min

You can use these default thresholds, or define your own.

memory-utilizationMemory Utilization

This graph shows the Memory (RAM) utilization for the workspace. It helps us understand I/O  bottlenecks (if any) so we can take preventive measures to better handle the load, ensuring stability and optimum performance.

Figure 3: Memory utilization

Memory usage is influenced by table memory and query execution memory. Top consumers of memory include queries, rowstore tables, an in-memory segment of column-store tables, table metadata, in-memory plans, etc. You can run the command SHOW STATUS EXTENDED to see a detailed breakdown of memory usage.

If there is an I/O intensive query, we will see a spike since the query will repartition or shuffle data. Memory utilization can be optimized by one or a combination of the following:

  • Create rowstore tables as column store tables
  • Drop obsolete tables
  • Purge older data in large rowstore tables
  • Optimize/rewrite resource-intensive queries
  • Drop unused indexes

In SingleStore 8.0, we have disk spilling which helps avoid out-of-memory errors. Too many queries that spill (the query profile will have these details) indicate a workload that needs the Workspace to be scaled up, or queries to be rewritten.

persistent-cache-local-disk-utilizationPersistent Cache (Local Disk) Utilization

This graph shows the persistent cache utilization for the Workspace. It helps us understand the amount of warm data we have so we can take preventive actions to better handle the load, ensuring stability and optimum performance.

Figure 4: Persistent cache utilization

SingleStoreDB is designed for HTAP workloads (workloads that require the attributes of both operational and analytical databases). To deliver low-latency results on complex queries, SingleStoreDB caches data to be queried locally to compute. This persistent cache is fully managed by the system and should be sized to contain the entire working data set for optimal performance.

Persistent cache usage stores blobs, trace logs, audit logs, persistent plan cache, etc. — it is not purely a cache. Note: this is different from unlimited storage, which is object storage.

We can manage local disk space by dropping obsolete tables and purging older data in large tables.

read-write-queries-per-secondRead/ Write Queries Per Second

Figure 5: Read/write queries per second

This report helps you understand the total number of read queries executed per second relative to the total number of read queries failed per second.

A high number of read queries per second indicates a high amount of demand for data retrieval from the database, which could lead to increased latency or decreased performance if the database is not optimized for high read loads. You can optimize the read workload by decreasing database partitions, tuning long-running queries, utilizing right shard and sort keys and adding filter criteria to restrict the dataset.

This view — correlated with your vCPU or memory utilization — will give you a clear indication of performance bottlenecks if any and help you tune your resources to better handle the load, ensuring the stability and performance of your application.

If you notice a sudden spike in the read/write queries per second it could indicate a problem with the application, or even a potential security issue. Additionally, if you notice an uptick in the number of failed queries it is possible that the queries are not getting the resources required to be able to read/write data. This is a good indicator for you to take corrective actions! This data, along with the historical workload monitoring, will allow you to pin down problematic queries by yourself and take necessary actions!

rows-read-written-per-secondRows Read/Written Per Second

Figure 6: Rows read/written per second

Rows read/written is a performance metric that measures the number of rows affected during a read or write operation to a database. This metric helps us understand the amount of data being processed by a database at a point in time and helps get a sense of whether the database is working efficiently. This information can be used to optimize the database performance and resolve performance issues.

If you notice the number of rows read or written is very high or uneven, it could indicate that some queries or operations are taking longer to process than others — and can be due to factors including poor indexing, inefficient queries, or database design issues.

By monitoring this metric, you gain valuable insights into the amount of data being processed by your database, identify performance issues and take appropriate action to resolve them. By optimizing the rows read/written metric, you can improve the overall performance and efficiency of your database.

conclusionConclusion

SingleStoreDB provides you with the tools required to gain valuable insights from your database, while identifying and addressing performance issues. We’re just starting to scratch the surface on the observability front, and are working toward offering you the tools required to proactively identify issues and rectify them swiftly.

We hope this helps you on your SingleStore journey!

Get started building with SingleStoreDB today.


Share