Leaf exceeding memory

Hello,
We are running memsql 6.8(128 GB free license) with a heterogeneous workload and we see the leaves occasionally exceeding the provisioned memory limit leading to query errors. None of the tables are using a lot of memory and profiling hasn’t revealed any bad actors.
memsql-report is showing the following errors:

FAIL Found 14999 'buffer manager memory allocation failure' errors in last 7 days for 172.40.21.220:3306 (C99E9EBDD2)
FAIL Found 14350 'buffer manager memory allocation failure' errors in last 7 days for 172.40.21.239:3306 (C99E9EBDD2)
FAIL Malloc_active_memory too high on node C99E9EBDD25CFC195BDDD65DB0D7551E7BA3CA0E (5.16 GB)
FAIL Malloc_active_memory too high on node 01D22BEB264B9C8991927CCFDD2027E51552F3B4 (5.29 GB)

From show status extended the below component looks large on both the leaves…
Alloc_durability_large 22101.626 MB

Please let me know how we can go about debugging this further

Thanks,
Sunil

Hi Sunil,

Its likely those errors are from queries which failed with out of memory errors.

Can you post “show status extended” from one of your leaves that hit that error?

Alloc_durabilitly_large is part of the MemSQL 6.X durability code (buffer used to commit transactions). You can decrease it by decreasing the transaction_buffer system variable. Its a startup only variable, so will need to be set in memsql.cnf file on each node - tools or ops can do that for you SingleStoreDB Cloud · SingleStore Documentation). MemSQL 7.X no longer uses fixed sized buffers to commit transactions, so this memory use will drop when you upgrade as well.

-Adam

Hello Adam,
Thanks a lot for the response.
Below is the show status extended from the report collected at the time of the problem.
Please let me know if anything jumps out of this info.
Once again, thanks a lot for your time.

+-----------------------------------------------------------+--------------------------------------------------------------+
|                       Variable_name                       |                            Value                             |
+-----------------------------------------------------------+--------------------------------------------------------------+
| Aborted_clients                                           | 9121                                                         |
| Aborted_connects                                          | 2                                                            |
| Bytes_received                                            | 2585487428551                                                |
| Bytes_sent                                                | 3049540944929                                                |
| Connections                                               | 12890                                                        |
| Max_used_connections                                      | 3445                                                         |
| Queries                                                   | 1185509648                                                   |
| Questions                                                 | 1185509648                                                   |
| Threads_cached                                            | 872                                                          |
| Threads_connected                                         | 2740                                                         |
| Threads_created                                           | 1814                                                         |
| Threads_running                                           | 361                                                          |
| Threads_background                                        | 345                                                          |
| Threads_shutdown                                          | 40302                                                        |
| Threads_idle                                              | 1798                                                         |
| Ready_queue                                               | 0                                                            |
| Idle_queue                                                | 0                                                            |
| Context_switches                                          | 573054295                                                    |
| Context_switch_misses                                     | 532845                                                       |
| Workload_management_queued_queries                        | 0                                                            |
| Workload_management_active_queries                        | 0                                                            |
| Workload_management_active_threads                        | 0                                                            |
| Workload_management_active_connections                    | 0                                                            |
| Columnstore_ingest_management_queued_queries              | 0                                                            |
| Columnstore_ingest_management_active_queries              | 0                                                            |
| Columnstore_ingest_management_estimated_segments_to_flush | 0                                                            |
| Columnstore_ingest_management_estimated_memory            | 109.922 (+0.946) MB                                          |
| Uptime                                                    | 948689                                                       |
| Prepared_stmt_count                                       | 0                                                            |
| Auto_attach_remaining_seconds                             | 0                                                            |
| Data_directory                                            | /var/lib/memsql/8d1e9e0b-4bba-40fd-b1ea-8f9d8dd58f86/data    |
| Plancache_directory                                       | /var/lib/memsql/8d1e9e0b-4bba-40fd-b1ea-8f9d8dd58f86/plan... |
| Transaction_logs_directory                                | /var/lib/memsql/8d1e9e0b-4bba-40fd-b1ea-8f9d8dd58f86/data... |
| Segments_directory                                        | /var/lib/memsql/8d1e9e0b-4bba-40fd-b1ea-8f9d8dd58f86/data... |
| Snapshots_directory                                       | /var/lib/memsql/8d1e9e0b-4bba-40fd-b1ea-8f9d8dd58f86/data... |
| Threads_waiting_for_disk_space                            | 0                                                            |
| License                                                   | BGYyOWIyOTI3OWQyZjQ3ZjNiNjdkYmViYjU4YzdhMDQ5AAAAAAAAAAAAA... |
| License_version                                           | 4                                                            |
| License_capacity                                          | 131072 MB                                                    |
| License_expiration                                        | 0                                                            |
| Seconds_until_expiration                                  | -1                                                           |
| License_key                                               | f29b29279d2f47f3b67dbebb58c7a049                             |
| License_type                                              | free                                                         |
| Maximum_cluster_capacity                                  | 131072 MB                                                    |
| Query_compilations                                        | 36802                                                        |
| Query_compilation_failures                                | 0                                                            |
| Inflight_async_compilations                               | 0                                                            |
| GCed_versions_last_sweep                                  | 290                                                          |
| Average_garbage_collection_duration                       | 232 ms                                                       |
| Total_server_memory                                       | 51659.9 (+5390.4) MB                                         |
| Total_io_pool_memory                                      | 0.1 MB                                                       |
| Free_io_pool_memory                                       | 0.0 MB                                                       |
| Alloc_thread_stacks                                       | 2159.000 MB                                                  |
| Malloc_active_memory                                      | 5288.381 (+3.980) MB                                         |
| Malloc_transaction_cached_memory                          | 1054.136 MB                                                  |
| Buffer_manager_memory                                     | 6678.6 (+266.0) MB                                           |
| Buffer_manager_cached_memory                              | 1.0 (-398.4) MB                                              |
| Buffer_manager_unrecycled_memory                          | 6.9 (+4.4) MB                                                |
| Alloc_skiplist_tower                                      | 1168.000 (-0.625) MB                                         |
| Alloc_variable                                            | 1456.500 MB                                                  |
| Alloc_table_primary                                       | 1234.875 (-0.750) MB                                         |
| Alloc_deleted_version                                     | 673.875 (+0.375) MB                                          |
| Alloc_internal_key_node                                   | 453.000 MB                                                   |
| Alloc_hash_buckets                                        | 2227.446 MB                                                  |
| Alloc_table_metadata_cache                                | 11.500 MB                                                    |
| Alloc_unit_images                                         | 1921.391 (+0.204) MB                                         |
| Alloc_unit_ifn_thunks                                     | 36.743 (+0.016) MB                                           |
| Alloc_object_code_images                                  | 609.118 (+0.083) MB                                          |
| Alloc_compiled_unit_sections                              | 373.352 (+0.055) MB                                          |
| Alloc_databases_list_entry                                | 13.125 MB                                                    |
| Alloc_plan_cache                                          | 10.750 MB                                                    |
| Alloc_warnings                                            | 213.125 (+0.375) MB                                          |
| Alloc_replication_large                                   | 3096.000 MB                                                  |
| Alloc_durability_large                                    | 22101.626 MB                                                 |
| Alloc_skynet_replication                                  | 0.375 MB                                                     |
| Alloc_sharding_partitions                                 | 0.250 MB                                                     |
| Alloc_log_replay                                          | 2049.953 (+1.609) MB                                         |
| Alloc_mmap_memory                                         | 7168.000 (+5120.000) MB                                      |
| Alloc_mmap_file                                           | 3072.000 MB                                                  |
| Alloc_client_connection                                   | 180.000 (+10.000) MB                                         |
| Alloc_protocol_packet                                     | 342.375 (+0.125) MB                                          |
| Alloc_large_incremental                                   | 0.250 (+0.125) MB                                            |
| Alloc_background_tasks                                    | 913.250 (+650.625) MB                                        |
| Alloc_table_memory                                        | 7213.696 (-1.000) MB                                         |
| Alloc_variable_bucket_16                                  | allocs:2473390  alloc_MB:37.7  buffer_MB:38.5  cached_buf... |
| Alloc_variable_bucket_24                                  | allocs:216550  alloc_MB:5.0  buffer_MB:5.5  cached_buffer... |
| Alloc_variable_bucket_32                                  | allocs:339019  alloc_MB:10.3  buffer_MB:10.9  cached_buff... |
| Alloc_variable_bucket_40                                  | allocs:870757  alloc_MB:33.2  buffer_MB:120.9  cached_buf... |
| Alloc_variable_bucket_48                                  | allocs:29019  alloc_MB:1.3  buffer_MB:1.6  cached_buffer_... |
| Alloc_variable_bucket_56                                  | allocs:24085  alloc_MB:1.3  buffer_MB:3.6  cached_buffer_... |
| Alloc_variable_bucket_64                                  | allocs:34319  alloc_MB:2.1  buffer_MB:2.8  cached_buffer_... |
| Alloc_variable_bucket_72                                  | allocs:16092  alloc_MB:1.1  buffer_MB:1.5  cached_buffer_... |
| Alloc_variable_bucket_80                                  | allocs:7449  alloc_MB:0.6  buffer_MB:0.9  cached_buffer_M... |
| Alloc_variable_bucket_88                                  | allocs:23751  alloc_MB:2.0  buffer_MB:2.6  cached_buffer_... |
| Alloc_variable_bucket_104                                 | allocs:134888  alloc_MB:13.4  buffer_MB:14.2  cached_buff... |
| Alloc_variable_bucket_128                                 | allocs:43119  alloc_MB:5.3  buffer_MB:5.5  cached_buffer_... |
| Alloc_variable_bucket_160                                 | allocs:1073049  alloc_MB:163.7  buffer_MB:275.8  cached_b... |
| Alloc_variable_bucket_200                                 | allocs:184550  alloc_MB:35.2  buffer_MB:36.1  cached_buff... |
| Alloc_variable_bucket_248                                 | allocs:347557  alloc_MB:82.2  buffer_MB:468.2  cached_buf... |
| Alloc_variable_bucket_312                                 | allocs:31341  alloc_MB:9.3  buffer_MB:23.5  cached_buffer... |
| Alloc_variable_bucket_384                                 | allocs:29237  alloc_MB:10.7  buffer_MB:12.6  cached_buffe... |
| Alloc_variable_bucket_480                                 | allocs:707  alloc_MB:0.3  buffer_MB:2.2  cached_buffer_MB... |
| Alloc_variable_bucket_600                                 | allocs:1188  alloc_MB:0.7  buffer_MB:5.0  cached_buffer_M... |
| Alloc_variable_bucket_752                                 | allocs:14572  alloc_MB:10.5  buffer_MB:12.5  cached_buffe... |
| Alloc_variable_bucket_936                                 | allocs:2766  alloc_MB:2.5  buffer_MB:5.6  cached_buffer_M... |
| Alloc_variable_bucket_1168                                | allocs:1714  alloc_MB:1.9  buffer_MB:4.0  cached_buffer_M... |
| Alloc_variable_bucket_1480                                | allocs:1129  alloc_MB:1.6  buffer_MB:4.1  cached_buffer_M... |
| Alloc_variable_bucket_1832                                | allocs:701  alloc_MB:1.2  buffer_MB:3.5  cached_buffer_MB... |
| Alloc_variable_bucket_2288                                | allocs:577  alloc_MB:1.3  buffer_MB:3.2  cached_buffer_MB... |
| Alloc_variable_bucket_2832                                | allocs:424  alloc_MB:1.1  buffer_MB:4.8  cached_buffer_MB... |
| Alloc_variable_bucket_3528                                | allocs:552  alloc_MB:1.9  buffer_MB:6.5  cached_buffer_MB... |
| Alloc_variable_bucket_4504                                | allocs:893  alloc_MB:3.8  buffer_MB:7.6  cached_buffer_MB... |
| Alloc_variable_bucket_5680                                | allocs:1002  alloc_MB:5.4  buffer_MB:6.8  cached_buffer_M... |
| Alloc_variable_bucket_6224                                | allocs:262  alloc_MB:1.6  buffer_MB:2.6  cached_buffer_MB... |
| Alloc_variable_bucket_7264                                | allocs:234  alloc_MB:1.6  buffer_MB:3.0  cached_buffer_MB... |
| Alloc_variable_bucket_9344                                | allocs:126  alloc_MB:1.1  buffer_MB:1.9  cached_buffer_MB... |
| Alloc_variable_bucket_11896                               | allocs:46  alloc_MB:0.5  buffer_MB:1.6  cached_buffer_MB:0.6 |
| Alloc_variable_bucket_14544                               | allocs:41  alloc_MB:0.6  buffer_MB:1.6  cached_buffer_MB:0.4 |
| Alloc_variable_bucket_18696                               | allocs:40  alloc_MB:0.7  buffer_MB:2.4  cached_buffer_MB:0.9 |
| Alloc_variable_bucket_21816                               | allocs:21  alloc_MB:0.4  buffer_MB:2.8  cached_buffer_MB:1.6 |
| Alloc_variable_bucket_26184                               | allocs:31  alloc_MB:0.8  buffer_MB:2.2  cached_buffer_MB:0.9 |
| Alloc_variable_bucket_32728                               | allocs:36  alloc_MB:1.1  buffer_MB:2.8  cached_buffer_MB:1.2 |
| Alloc_variable_bucket_43648                               | allocs:27  alloc_MB:1.1  buffer_MB:3.4  cached_buffer_MB:1.8 |
| Alloc_variable_bucket_65472                               | allocs:3221  alloc_MB:201.1  buffer_MB:241.4  cached_buff... |
| Alloc_variable_bucket_130960                              | allocs:787  alloc_MB:98.3  buffer_MB:100.2  cached_buffer... |
| Alloc_variable_cached_buffers                             | 35.2 (+0.6) MB                                               |
| Alloc_variable_allocated                                  | 755.6 MB                                                     |
| Successful_read_queries                                   | 1925989179                                                   |
| Successful_write_queries                                  | 352672744                                                    |
| Failed_read_queries                                       | 3420                                                         |
| Failed_write_queries                                      | 3558972                                                      |
| Rows_returned_by_reads                                    | 1213155914                                                   |
| Rows_affected_by_writes                                   | 217704556                                                    |
| Execution_time_of_reads                                   | 266196603 ms                                                 |
| Execution_time_of_write                                   | 262665970 ms                                                 |
| Transaction_buffer_wait_time                              | 0 ms                                                         |
| Transaction_log_flush_wait_time                           | 0 ms                                                         |
| Row_lock_wait_time                                        | 718482 ms                                                    |
| Ssl_accept_renegotiates                                   | 0                                                            |
| Ssl_accepts                                               | 0                                                            |
| Ssl_callback_cache_hits                                   | 0                                                            |
| Ssl_client_connects                                       | 0                                                            |
| Ssl_connect_renegotiates                                  | 0                                                            |
| Ssl_ctx_verify_depth                                      | 18446744073709551615                                         |
| Ssl_ctx_verify_mode                                       | 0                                                            |
| Ssl_default_timeout                                       | 0                                                            |
| Ssl_finished_accepts                                      | 0                                                            |
| Ssl_finished_connects                                     | 0                                                            |
| Ssl_session_cache_hits                                    | 0                                                            |
| Ssl_session_cache_misses                                  | 0                                                            |
| Ssl_session_cache_overflows                               | 0                                                            |
| Ssl_session_cache_size                                    | 20480                                                        |
| Ssl_session_cache_timeouts                                | 0                                                            |
| Ssl_sessions_reused                                       | 0                                                            |
| Ssl_used_session_cache_entries                            | 0                                                            |
| Ssl_verify_depth                                          | 0                                                            |
| Ssl_verify_mode                                           | 0                                                            |
| Ssl_cipher                                                |                                                              |
| Ssl_cipher_list                                           |                                                              |
| Ssl_version                                               |                                                              |
| Ssl_session_cache_mode                                    | SERVER                                                       |
+-----------------------------------------------------------+--------------------------------------------------------------+

Hi Sunil,

I would definitely lower transaction_buffer as mentioned above. I would lower it to 8 mb or 16 mb.

memsql-admin update-config --all --key "transaction_buffer" --value "16m"

I don’t see anything else you could tune to lower memory use right now.

Does your cluster have a lot of tables created? There is some small memory overhead per table (a few mb), but it can add up with you have 1000s of tables.

-Adam

Hi Adam,
Thanks a lot for the tip.
Our tables run into 100s, definitely not 1000s.
Would there be any negative impact of lowering the transaction buffer, especially the MemSQL responsiveness?

Thanks,
Sunil

It may impact the throughput of bursty row store write workloads a bit - its hard to say how much (workloads that write a few rows per transaction, but in aggregate run 100ks to millions of those a second).

I forgot to mention, that the cluster needs to restart for the change to take effect.

Another bigger hammer is to upgrade to MemSQL 7.X which doesn’t have a static transaction buffer - its dynamically sized.

-Adam

Sure Adam. Understood.
Thanks a lot for detailing all the options.