Operationalizing SingleStore

In a recent webcast, we shared some tips and tricks on how you can operationalize SingleStore for configuring SingleStore Ops and SingleStore root user passwords; memory settings and health metrics; and how you can take backups and add nodes securely to your cluster.

Here are the topics we covered:

Permissioning a new cluster by adding a super-user to SingleStore Ops and via GRANT statements on the cluster itself
Best practices for configuring memory limits
Best practices for basic cluster monitoring, and how to retrieve the relevant metrics
How to take physical backups of a cluster, and how backups are stored in the filesystem
How to add new nodes to a cluster using the SingleStore Ops Command Line Interface (CLI)

We also received questions from webcast attendees, and here are the answers:

Question: In SingleStoreDB Self-Managed 6.x, do all backup files need to be present on all nodes when attempting a restore?
Any particular backup file needs to be accessible to the node that will restore the data on it. This is easiest to understand with an example.

Consider a cluster with one Master Aggregator, 2 Leaves, and 4 partitions per leaf, and a database named “my_database”. The data within my_database will by default consist of 8 partitions, along with metadata. When my_database is backed up, it goes to a single metadata backup file of the same name, along with 8 partition backup files named my_database_0, my_database_1, my_database_2, etc. Restoring the cluster successfully does not require that each of the partition backup files be accessible on every node, but it does require that each of them be accessible to at least one leaf node and that the metadata backup file be accessible to the master aggregator.

Question: How can a new Master Aggregator be deployed/assigned to a cluster?
Currently, the only way to create a new Master Aggregator is through the promotion of an existing Child Aggregator, using the AGGREGATOR SET AS MASTER command.

Question: Since SingleStore backups are distributed across the cluster in the same way as the data they back up, how can they be made robust against hardware failures?
If SingleStore backups are stored to a location that represents a locally-mounted filesystem on each host, then there are two approaches to reduce the risk of losing backup data due to hardware failure:

Instead of backing up to a local filesystem, the backups can be written directly to a network file system visible to all of the nodes in the cluster. This segregates the integrity of the backup files from the health of the hardware on which they were generated, at the cost of incurring some overhead during the backup process itself due to network latency.
Backups can be written locally, then moved or copied via a scripted process to either a network location or object storage system. This approach avoids the potential performance impact of the previous option, but requires adding an additional layer of complexity to the deployment, and is asynchronous.

Question: Does the node addition and rebalancing process change when operating a cluster in High Availability mode?
The operational procedure for adding new nodes remains the same in HA mode, and from the user perspective the rebalancing operation also does not change. An even number of nodes must be used to preserve the HA topology; Ops will distribute the new nodes automatically among the two availability groups as they are added.

Question: What are the architectural requirements for using the Ops CLI to add nodes, as was done in the webcast? When these requirements cannot be met, how can nodes be added?
Using Ops to deploy new SingleStore nodes throughout the cluster requires it be able to SSH between nodes, but (as was done in the webcast) this is not necessary if the new SingleStore node is simply deployed from a colocated agent downloaded onto the new host. This process does still require that Ops be able to access the SingleStore download site to download the SingleStore engine binaries.

If, for security or other concerns, the SingleStore engine binaries cannot be downloaded directly by Ops, the binaries can be transferred to the host manually, and Ops can complete the rest of the installation process using them. In this scenario, the binaries should be unzipped manually into the location specified in Ops’ settings.conf file.

If Ops cannot be used at all, SingleStore nodes can still be installed and added manually via the ADD LEAF command. In this case, we recommend engaging with a [SingleStore representative](mailto: sales@singlestore.com) to ensure best practices are followed during the node addition.

Question: How does the monitoring picture change if I’m mostly using columnstore?
The fundamental monitoring picture does not change. Even in a purely columnstore use case, SingleStore still leverages the rowstore during ingest and allocates memory aggressively for query execution performance, so it is still important to monitor memory utilization as described in the webcast. Columnstore data is typically compressed enough on disk that a properly sized cluster is unlikely to be at risk of encountering an out-of-disk condition. However, it is still more likely to be the case than with the rowstore, so system-level disk space monitoring is more important.

Question: What is a secure way to clear buffers in memory?
A best practice for SingleStore deployments precludes running other memory-intensive processes on the same hosts, and buffered memory should not cause an issue on a properly-sized and configured cluster.

If scenarios arise during operational use of SingleStore that appear to require flushing the memory buffer, we recommend reaching out to SingleStore support.