Prometheus/Grafana dashboards and alerts?

garret.coffman · September 7, 2021, 3:25pm

We’ve implemented a Prometheus/Grafana/Thanos monitoring setup here and are scraping metrics from our Singlestore clusters using the integrated exporter and promtheus. I’ve been looking at the documentation and found some dashboards but these seem to require that Grafana be running on the memsql cluster and use memsql as a datasource, which is not an option for us. I tried importing the dashboards to see if I could perhaps modify them to use the metrics coming in from prometheus but the doing so locks up my browser tab.

We’d additionally like to get some insight or advice on metrics coming from the exporter which would be good for alerts. Do you have any documentation?

Any information would be appreciated!
Thanks,
Garret

jprice · September 8, 2021, 3:32pm

Hi @garret.coffman,

Can you please clarify what your roadblock is? Is it that you can’t run Grafana on the SingleStore cluster since you already have a Grafana instance. Or is the issue that you can’t leverage SingleStore as the data source for the metrics?

If it is the former and you already have a Grafana instance ready that can connect to the metrics SingleStore cluster, you can skip the step to install Grafana on the master aggregator and follow the instructions found here to add the SingleStore metrics data source once you have it set up.

Then you can download the dashboards from here and import them into Grafana.

Cheers,
Julie

m_k · September 8, 2021, 6:19pm

Hi @garret.coffman. Thank you for trying out SingleStore monitoring.

Our Grafana dashboards indeed use a singlestore cluster to monitor another cluster (or itself). I am not sure why they lock up your browser, but the dashboard uses a MySQL datasource named ‘monitoring’ , and (just a wild guess ) perhaps configuring that will unblock.

Monitoring SingleStore is highly dependent on workload, but there are some high level metrics that could make sense out of the box to watch for sudden or gradual changes in traffic or errors.

memsql_status_uptime
memsql_status_query_compilation_failures
memsql_status_rows_affected_by_writes
memsql_status_rows_returned_by_reads
memsql_status_workload_management_queued_queries
memsql_status_failed_read_queries
memsql_status_failed_write_queries

Likewise, if you would like to track host metrics the memsql_sysinfo_* subsystem will be useful.

One of our grafana dashboards is all about memory allocations, which can reveal some causes of max memory errors. We currently don’t have a more detailed document, but it is on our roadmap.

ton.hoang · May 28, 2023, 11:21am

After download the dashboard and import it to grafana, some work and some did not work.
Anyone have successfully import and display data please help

gkafity · May 30, 2023, 11:40am

Did you follow all steps in the docs? Which ones worked, and which ones did not work?

ton.hoang · May 30, 2023, 12:55pm

I’m install grafana via operator on openshift( grafana v 7.5.17 )
I downloaded 4 dashboard for SingleStore ( the memory dashboard did not display chart when other dashboard display enough information)

gkafity · June 1, 2023, 5:47pm

Understood, I’ll escalate this to our observability team. Thank you.

gkafity · June 1, 2023, 5:48pm

Are you leveraging SingleStoreDB Cloud or self managed? What version?

tdasarathan · June 1, 2023, 8:46pm

@ton.hoang - Can you please try upgrading Grafana to the latest version and then validate the dashboards ?

ton.hoang · June 1, 2023, 11:29pm

I use grafana community operator 4.10 ( the latest version) and grafana 7.5.17 is built in
I use SingleStore self host v8.1.5 with openshift 4.10

tdasarathan · June 2, 2023, 12:43am

Dashboards were built on Grafana 9, so wondering if the errors are due to version compatibility. Can you please share the error message to further debug ?