Prometheus/Grafana dashboards and alerts?

We’ve implemented a Prometheus/Grafana/Thanos monitoring setup here and are scraping metrics from our Singlestore clusters using the integrated exporter and promtheus. I’ve been looking at the documentation and found some dashboards but these seem to require that Grafana be running on the memsql cluster and use memsql as a datasource, which is not an option for us. I tried importing the dashboards to see if I could perhaps modify them to use the metrics coming in from prometheus but the doing so locks up my browser tab.

We’d additionally like to get some insight or advice on metrics coming from the exporter which would be good for alerts. Do you have any documentation?

Any information would be appreciated!
Thanks,
Garret

Hi @garret.coffman,

Can you please clarify what your roadblock is? Is it that you can’t run Grafana on the SingleStore cluster since you already have a Grafana instance. Or is the issue that you can’t leverage SingleStore as the data source for the metrics?

If it is the former and you already have a Grafana instance ready that can connect to the metrics SingleStore cluster, you can skip the step to install Grafana on the master aggregator and follow the instructions found here to add the SingleStore metrics data source once you have it set up.

Then you can download the dashboards from here and import them into Grafana.

Cheers,
Julie

Hi @garret.coffman. Thank you for trying out SingleStore monitoring.

Our Grafana dashboards indeed use a singlestore cluster to monitor another cluster (or itself). I am not sure why they lock up your browser, but the dashboard uses a MySQL datasource named ‘monitoring’ , and (just a wild guess ) perhaps configuring that will unblock.

Monitoring SingleStore is highly dependent on workload, but there are some high level metrics that could make sense out of the box to watch for sudden or gradual changes in traffic or errors.

memsql_status_uptime
memsql_status_query_compilation_failures
memsql_status_rows_affected_by_writes
memsql_status_rows_returned_by_reads
memsql_status_workload_management_queued_queries
memsql_status_failed_read_queries
memsql_status_failed_write_queries

Likewise, if you would like to track host metrics the memsql_sysinfo_* subsystem will be useful.

One of our grafana dashboards is all about memory allocations, which can reveal some causes of max memory errors. We currently don’t have a more detailed document, but it is on our roadmap.