MemSQL Monitoring - blobs pipeline fails

I’ve followed the instruction of the Youtube video and manual on MemSQL web site.
I have 2 memsql clusters : Memsql 1 (10.49.211.106 ) and Memsql Monitor

on MemSQL1: (10.49.211.106 )
memsql-admin configure-monitoring --exporter-user root --exporter-password xxxx

on Memsql Monitor
memsql-admin start-monitoring --database-name metrics --exporter-host 10.49.211.106 --user root --password xxxx --retention-period 30

Both services starts. When I log into MemSQL Monitor and check the pipeline, the blob related pipeline fails.

All the stored proc and tables are there.

memsql> show tables ;
±------------------+
| Tables_in_metrics |
±------------------+
| act_samples |
| cluster_info |
| db_status |
| ints |
| metrics |
| mv_events |
| mv_queries |
±------------------+
7 rows in set (0.00 sec)

memsql> show procedures ;
±----------------------±--------+
| Procedures_in_metrics | Definer |
±----------------------±--------+
| load_blobs | root@% |
| load_metrics | root@% |
| purge_blobs | root@% |
| purge_metrics | root@% |
±----------------------±--------+
4 rows in set (0.00 sec)

        PARTITION: 7

*************************** 29. row ***************************
DATABASE_NAME: metrics
PIPELINE_NAME: 10.49.211.106_9104_blobs
ERROR_UNIX_TIMESTAMP: 1600984870.574344
ERROR_TYPE: Error
ERROR_CODE: 1934
ERROR_MESSAGE: Cannot extract data for pipeline. GET http://10.49.211.106:9104/samples returned non-200 error code (got 500)

       ERROR_KIND: Extract
        STD_ERROR: _$_SERVER_ERROR: GET http://10.49.211.106:9104/samples returned non-200 error code (got 500)

Hi Franck,

Thank you for reaching out! A couple of questions for you:

  • Can you confirm the Metrics pipeline is successful and that you have updated data in the table metrics.metrics? I want to confirm if this is solely related to the Samples pipeline, or if it may be affecting the process as a whole.
  • Can you send us the SHOW CREATE PIPELINE for the samples pipeline?
  • Can you confirm your MemSQL Engine version?

Thanks again,
Roxanna

Hi Roxanna,
I do see data in metrics.

memsql> select count() from metrics ;
±---------+
| count(
) |
±---------+
| 5570781 |
±---------+
1 row in set (0.06 sec)

memsql> select * from metrics limit 1\G
*************************** 1. row ***************************
labels: {“quantile”:“0.25”}
name: go_gc_duration_seconds
memsql_tags: {“cluster”:"",“host”:“10.49.211.36:9104”,“port”:"",“push_job”:"",“role”:“leaf”}
cluster:
host: NULL
port: NULL
role: NULL
extractor: go
subsystem: gc
job:
value: 0.000012145
intval: 0
time_sec: 1601827620
1 row in set (0.06 sec)

Here is the pipeline:

memsql> show create pipeline 10.49.211.36_9104_blobs \G
*************************** 1. row ***************************
Pipeline: 10.49.211.36_9104_blobs
Create Pipeline: CREATE PIPELINE 10.49.211.36_9104_blobs
AS LOAD DATA PROMETHEUS_EXPORTER ‘http://10.49.211.36:9104/samples
CONFIG ‘{“is_memsql_internal”:true, “download_type”:“samples”}’
BATCH_INTERVAL 15000
INTO PROCEDURE load_blobs
FORMAT JSON

Thank you for the information.

I’d like to dig into the Sample Pipeline since you’ve confirmed it appears to be an issue with that one specifically.

  • Can you try and replace the pipeline by hand and see if it is successful?
create or replace pipeline `blobs` as 
load data prometheus_exporter 
"10.49.211.36:9104" 
config '{"is_memsql_internal":true, 
"download_type":"samples"}' 
into procedure `load_blobs` format json;

Then, start the pipeline: START PIPELINE blobs;

Wait a few minutes and then check the samples table and pipeline status to see if it is still failing.

  • If that is not successful, can you try running the following queries by hand in the SQL client? (These are some of the queries used to collect sample data). We expect you may receive errors when running it. Please paste the outputs here.

Query 1

SELECT
			activity_type,
			activity_name,
			ifnull(database_name,'') database_name,
			ifnull(cpu_time_ms,0) cpu_time_ms,
			ifnull(cpu_wait_time_ms,0) cpu_wait_time_ms,
			ifnull(elapsed_time_ms,0) elapsed_time_ms,
			ifnull(lock_time_ms,0) lock_time_ms,
			ifnull(network_time_ms,0) network_time_ms,
			ifnull(disk_time_ms,0) disk_time_ms,
			ifnull(disk_b,0) disk_b,
			ifnull(network_b,0) network_b,
			ifnull(memory_bs,0) memory_bs,
			ifnull(memory_major_faults,0) memory_major_faults,
			ifnull(run_count,0) run_count,
			ifnull(success_count,0) success_count,
			ifnull(failure_count,0) failure_count
		FROM information_schema.mv_activities_cumulative
		WHERE	last_finished_timestamp > date_sub(now(), interval 5 minute)
			or run_count > 0`

Query 2:

SELECT
				COUNT(*) as count,
				ANY_VALUE(SEVERITY) as severity,
				EVENT_TYPE as eventtype,
				ANY_VALUE(IP_ADDR) as ipaddr,
				ANY_VALUE(PORT) as port,
				ANY_VALUE(TYPE) as type
			FROM INFORMATION_SCHEMA.MV_EVENTS AS events
			INNER JOIN INFORMATION_SCHEMA.MV_NODES AS nodes
				ON events.ORIGIN_NODE_ID=nodes.ID
			GROUP BY EVENT_TYPE, ORIGIN_NODE_ID

Also, my colleagues recognized your name and mentioned that you have an enterprise MemSQL License. We will be happy to help you through the forum, but note that you can always take these inquiries directly to our Support team as a paying customer (at support.memsql.com.)

Best,
Roxanna

Hi Roxanne,
Thank you for your feedback.
It is still failing. I’ve also run the queries you asked me to.
Let me know if I should take it to support.

memsql> show pipelines ;
±---------------------------±----------------------------------------------------------------------------------------------±----------+
| Pipelines_in_metrics | State | Scheduled |
±---------------------------±----------------------------------------------------------------------------------------------±----------+
| 10.49.211.106_9104_metrics | Running | False |
| 10.49.211.106_9104_blobs | Paused due to error. Run START PIPELINE or consider setting pipelines_stop_on_error to false | False |
| 10.49.211.36_9104_metrics | Running | False |
| 10.49.211.36_9104_blobs | Paused due to error. Run START PIPELINE or consider setting pipelines_stop_on_error to false | False |
| blobs | Paused due to error. Run START PIPELINE or consider setting pipelines_stop_on_error to false | False |
±---------------------------±----------------------------------------------------------------------------------------------±----------+

Query 1:

memsql> SELECT activity_type, activity_name,ifnull(database_name,’’) database_name, ifnull(cpu_time_ms,0) cpu_time_ms,
-> ifnull(cpu_wait_time_ms,0) cpu_wait_time_ms, ifnull(elapsed_time_ms,0) elapsed_time_ms,
-> ifnull(lock_time_ms,0) lock_time_ms, ifnull(network_time_ms,0) network_time_ms, ifnull(disk_time_ms,0) disk_time_ms, ifnull(disk_b,0) disk_b,
-> ifnull(network_b,0) network_b, ifnull(memory_bs,0) memory_bs, ifnull(memory_major_faults,0) memory_major_faults, ifnull(run_count,0) run_count,
-> ifnull(success_count,0) success_count, ifnull(failure_count,0) failure_count FROM information_schema.mv_activities_cumulative WHERE last_finished_timestamp > da
te_sub(now(), interval 5 minute) or run_count > 0 limit 10 ;
±--------------±-----------------------------------------------------------------------------------±--------------±------------±-----------------±----------------±-------------±----------------±-------------±------------±-----------±------------±--------------------±----------±--------------±--------------+
| activity_type | activity_name | database_name | cpu_time_ms | cpu_wait_time_ms | elapsed_time_ms | lock_time_ms | network_time_ms | disk_time_ms | disk_b | network_b | memory_bs | memory_major_faults | run_count | success_count | failure_count |
±--------------±-----------------------------------------------------------------------------------±--------------±------------±-----------------±----------------±-------------±----------------±-------------±------------±-----------±------------±--------------------±----------±--------------±--------------+
| Query | RunPipeline_10.49.211.36_9104_metrics_12313242322540803443__et_al_625c3d6541c787ea | metrics | 164903 | 0 | 3234814 | 4129 | 54494083 | 0 | 769645 | 2215164312 | 13871263579 | 0 | 0 | 17547 | 0 |
| Query | Select_4181678090803057996_8c851ea01f070a34 | metrics | 1 | 0 | 3 | 0 | 0 | 0 | 82868 | 108 | 446 | 0 | 0 | 2 | 0 |
| Query | RunPipeline_blobs_4181678090803057996__et_al_faa560350e416d46 | metrics | 155 | 0 | 226 | 461 | 907 | 0 | 826450 | 388865 | 403886 | 0 | 0 | 0 | 4 |
| Query | Insert_dual_61e8fdcdc47e02f7 | metrics | 1354 | 0 | 1316806 | 7 | 0 | 0 | 15514 | 193017 | 3571 | 0 | 0 | 17547 | 0 |
| System | Sharding GC | | 95444 | 0 | 156821 | 4860 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 40981911 | 0 |
| System | NetworkAckThreadFn | | 6567768 | 0 | 46457894373 | 112 | 46456115767 | 0 | 0 | 3420015944 | 0 | 0 | 48 | 0 | 0 |
| Database | RunBackgroundMerger | cluster | 1500 | 0 | 1995 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 409999 | 0 |
| Database | COMMIT_PIPELINE | metrics | 20108 | 0 | 62469 | 99 | 49928 | 0 | 0 | 28705677 | 0 | 0 | 0 | 79079 | 0 |
| Query | Select__MV_FINISHED_TASKS__et_al_6b34bd84922bb5a8 | metrics | 101 | 0 | 193 | 0 | 59 | 0 | 1146150 | 393311 | 505610 | 0 | 1 | 1 | 0 |
| System | IoSubmissionThreadFn | | 75041 | 0 | 4100725177 | 334 | 0 | 0 | 42284441600 | 0 | 0 | 0 | 4 | 0 | 0 |
±--------------±-----------------------------------------------------------------------------------±--------------±------------±-----------------±----------------±-------------±----------------±-------------±------------±-----------±------------±--------------------±----------±--------------±--------------+
10 rows in set (0.02 sec)

Query 2:

memsql> SELECT
-> COUNT(*) as count,
-> ANY_VALUE(SEVERITY) as severity,
-> EVENT_TYPE as eventtype,
-> ANY_VALUE(IP_ADDR) as ipaddr,
-> ANY_VALUE(PORT) as port,
-> ANY_VALUE(TYPE) as type
-> FROM INFORMATION_SCHEMA.MV_EVENTS AS events
-> INNER JOIN INFORMATION_SCHEMA.MV_NODES AS nodes
-> ON events.ORIGIN_NODE_ID=nodes.ID
-> GROUP BY EVENT_TYPE, ORIGIN_NODE_ID ;
±------±---------±-------------------±--------------±-----±-----+
| count | severity | eventtype | ipaddr | port | type |
±------±---------±-------------------±--------------±-----±-----+
| 1 | NOTICE | NODE_STARTING | 10.49.211.192 | 3306 | LEAF |
| 1 | NOTICE | SYSTEM_VAR_CHANGED | 10.49.211.192 | 3306 | LEAF |
| 1 | NOTICE | NODE_STARTING | 10.49.211.233 | 3306 | LEAF |
| 1 | NOTICE | SYSTEM_VAR_CHANGED | 10.49.211.36 | 3306 | MA |
| 1 | NOTICE | NODE_STARTING | 10.49.211.58 | 3306 | CA |
| 1 | NOTICE | NODE_STARTING | 10.49.211.36 | 3306 | MA |
| 1 | NOTICE | SYSTEM_VAR_CHANGED | 10.49.211.58 | 3306 | CA |
| 1 | NOTICE | SYSTEM_VAR_CHANGED | 10.49.211.233 | 3306 | LEAF |
| 2 | NOTICE | LEAF_ADD | 10.49.211.36 | 3306 | MA |
| 2 | NOTICE | NODE_ONLINE | 10.49.211.36 | 3306 | MA |
| 6 | WARNING | PIPELINE_STOPPED | 10.49.211.36 | 3306 | MA |
| 1 | NOTICE | AGGREGATOR_ADD | 10.49.211.36 | 3306 | MA |
±------±---------±-------------------±--------------±-----±-----+
12 rows in set (0.07 sec)

Hi Franck,

Thanks for the update. It looks like the queries used to collect this data are returning output. We have not yet been able to reproduce the error.

One additional question about your environment; I notice you have two different metrics and blobs pipelines (not including the test one you just created per my instruction). Are these IP addresses of two separate source clusters that you are pulling data from into one remote cluster? Any additional information you can provide about the setup will be useful.

10.49.211.106
10.49.211.36

Can you try starting the exporter by hand and checking the information it outputs once started?

To start the exporter by hand, please follow these instructions:

  1. Kill the existing exporter process. kill <pid>
  2. Copy the memsql_exporter_ma.args file to the /etc/memsql directory. This file contains parameters for memsql_exporter that can be used to enable or disable specific functions.
sudo cp /opt/memsql-server-<version>/memsql_exporter/conf/memsql_exporter_ma.args /etc/memsql/
  1. Start memsql_exporter with debug log level and pipe the log output into a file.
sudo /usr/bin/memsql_exporter @/etc/memsql/memsql_exporter_ma.args --log.level debug

After you start it, you should see a log of information in the file or in your terminal if you don’t pipe it into a file. Please paste the output of the log here or attach it.

Best,
Roxanna