Memory Issue after scaling down/up with Memsql K8S Operator

Hi Team,

We have deployed MEMSQL version 7.1.3 on AKS with 2 aggregators and 8 nodes and plan to scale down the cluster to 4 nodes at a set time in the week.
We’ve noticed that after scaling up back to 8 nodes, the memory and disk usage is different between the first 4 nodes and the last 4 nodes.
We checked for data skew before and after resizing the cluster and the data was evenly distributed across all nodes
Now we have some queries that are failing due to maximum_memory on the same 2 nodes (2 and 3), and we can also see that these 2 nodes have more memory and disk usage.
What could be the reason for that?

Deployment details:
Kubernetes version - v1.18.10
Memsql operator image version - memsql/operator:1.2.1-e8970d4d
Memsql node image version - memsql/node:centos-7.1.13-11ddea2a3a
2 Aggregators - 3 CPU 12GB RAM - Standard_D4ds_v4
8 Nodes - 28CPU 115GB RAM - Standard_D32ds_v4 (1 numa node)

Thanks and Regards
Chen.

Hi Chen,

I think we need a bit more information to debug.

To start with can I see show status extended output from one of the leaves with higher memory use and one of the leaves with lower memory use?

-Adam

Hi Adam,

Leaf with higher memory:
±----------------------------------------------------------±-------------------------------------------------------------------------------------------------------------------------------------------------+
| Variable_name | Value |
±----------------------------------------------------------±-------------------------------------------------------------------------------------------------------------------------------------------------+
| Aborted_clients | 954341 |
| Aborted_connects | 5 |
| Bytes_received | 1830584355796 |
| Bytes_sent | 1593436857799 |
| Connections | 963369 |
| Max_used_connections | 2745 |
| Queries | 136959305 |
| Questions | 136959305 |
| Threads_cached | 557 |
| Threads_connected | 2306 |
| Threads_created | 624 |
| Threads_running | 1 |
| Threads_background | 1 |
| Threads_shutdown | 30070 |
| Threads_idle | 2239 |
| Ready_queue | 0 |
| Idle_queue | 0 |
| Context_switches | 9825068 |
| Context_switch_misses | 207758 |
| Active_dedicated_admin_connections | 0 |
| Total_dedicated_admin_connections | 0 |
| Workload_management_queued_queries | 0 |
| Workload_management_active_queries | 0 |
| Workload_management_active_threads | 0 |
| Workload_management_active_connections | 0 |
| Columnstore_ingest_management_queued_queries | 0 |
| Columnstore_ingest_management_active_queries | 0 |
| Columnstore_ingest_management_max_concurrency | 0 |
| Columnstore_ingest_management_estimated_segments_to_flush | 0 |
| Columnstore_ingest_management_estimated_memory | 0.000 MB |
| Uptime | 2065048 |
| Prepared_stmt_count | 0 |
| Auto_attach_remaining_seconds | 0 |
| Data_directory | /var/lib/memsql/instance/data |
| Plancache_directory | /var/lib/memsql/instance/plancache |
| Transaction_logs_directory | /var/lib/memsql/instance/data/logs |
| Segments_directory | /var/lib/memsql/instance/data/blobs |
| Snapshots_directory | /var/lib/memsql/instance/data/snapshots |
| Disk_space_reserved_for_secondary_index | 0 |
| Threads_waiting_for_disk_space | 0 |
| License | xxx |
| License_version | 4 |
| License_capacity | 50 units |
| Used_instance_license_units | 4 |
| License_expiration | 1636617600 |
| Seconds_until_expiration | 24341174 |
| License_key | xxx |
| License_type | enterprise |
| Maximum_cluster_capacity | 50 units |
| Query_compilations | 18315 |
| Query_compilation_failures | 0 |
| Inflight_async_compilations | 2 |
| GCed_versions_last_sweep | 0 |
| Average_garbage_collection_duration | 254 ms |
| Total_server_memory | 52714.8 (-53085.0) MB |
| Total_io_pool_memory | 242.5 MB |
| Free_io_pool_memory | 197.9 (+0.5) MB |
| Alloc_thread_stacks | 625.000 (+53.000) MB |
| Malloc_active_memory | 2987.303 (-8997.518) MB |
| Malloc_transaction_cached_memory | 717.843 MB |
| Linux_resident_memory | 55534.872 (-53098.585) MB |
| Linux_resident_shared_memory | 4107.594 (-652.234) MB |
| Buffer_manager_memory | 29858.6 (-40343.1) MB |
| Buffer_manager_cached_memory | 26436.4 (+26435.5) MB |
| Buffer_manager_unrecycled_memory | 1.9 (+1.9) MB |
| Alloc_skiplist_tower | 1191.000 (+5.000) MB |
| Alloc_variable | 253.625 (+6.625) MB |
| Alloc_table_primary | 499.250 (+0.500) MB |
| Alloc_deleted_version | 458.625 (+165.625) MB |
| Alloc_internal_key_node | 238.125 MB |
| Alloc_hash_buckets | 1165.659 MB |
| Alloc_table_metadata_cache | 14.875 (+0.125) MB |
| Alloc_unit_images | 10014.628 (-2340.152) MB |
| Alloc_unit_ifn_thunks | 222.743 (-64.179) MB |
| Alloc_object_code_images | 4182.079 (-889.451) MB |
| Alloc_compiled_unit_sections | 2506.598 (-503.558) MB |
| Alloc_databases_list_entry | 10.625 MB |
| Alloc_plan_cache | 7.000 (+0.875) MB |
| Alloc_query_execution | 0.000 (-75974.501) MB |
| Alloc_warnings | 373.250 (+4.000) MB |
| Alloc_replication | 29.375 (-0.375) MB |
| Alloc_sharding_partitions | 0.375 MB |
| Alloc_mmap_file | 848.000 MB |
| Alloc_client_connection | 56.000 (+56.000) MB |
| Alloc_protocol_packet | 288.125 (-6.250) MB |
| Alloc_large_incremental | 0.000 (-8972.501) MB |
| Alloc_distributed_transaction | 0.000 (-10.500) MB |
| Alloc_profile_stats | 0.125 MB |
| Alloc_table_autostats | 909.611 MB |
| Alloc_system_tasks | 0.000 (-0.250) MB |
| Alloc_table_memory | 4715.894 (+177.750) MB |
| Alloc_variable_bucket_16 | allocs:356168 alloc_MB:5.4 buffer_MB:58.8 cached_buffer_MB:1.9 |
| Alloc_variable_bucket_24 | allocs:261821 alloc_MB:6.0 buffer_MB:42.8 cached_buffer_MB:1.5 |
| Alloc_variable_bucket_32 | allocs:45964 alloc_MB:1.4 buffer_MB:18.0 cached_buffer_MB:1.8 |
| Alloc_variable_bucket_40 | allocs:22492 alloc_MB:0.9 buffer_MB:9.1 cached_buffer_MB:0.1 |
| Alloc_variable_bucket_48 | allocs:19443 alloc_MB:0.9 buffer_MB:8.4 cached_buffer_MB:3.2 |
| Alloc_variable_bucket_56 | allocs:11570 alloc_MB:0.6 buffer_MB:9.5 cached_buffer_MB:1.8 |
| Alloc_variable_bucket_64 | allocs:7956 alloc_MB:0.5 buffer_MB:8.2 cached_buffer_MB:1.9 |
| Alloc_variable_bucket_72 | allocs:13161 alloc_MB:0.9 buffer_MB:6.4 cached_buffer_MB:0.1 |
| Alloc_variable_bucket_80 | allocs:5872 alloc_MB:0.4 buffer_MB:11.0 cached_buffer_MB:3.6 |
| Alloc_variable_bucket_88 | allocs:3208 alloc_MB:0.3 buffer_MB:7.4 cached_buffer_MB:1.8 |
| Alloc_variable_bucket_104 | allocs:4153 alloc_MB:0.4 buffer_MB:11.6 cached_buffer_MB:1.9 |
| Alloc_variable_bucket_128 | allocs:12570 alloc_MB:1.5 buffer_MB:10.2 cached_buffer_MB:0.6 |
| Alloc_variable_bucket_160 | allocs:19090 alloc_MB:2.9 buffer_MB:5.5 cached_buffer_MB:0.0 |
| Alloc_variable_bucket_200 | allocs:1020 alloc_MB:0.2 buffer_MB:0.5 cached_buffer_MB:0.0 |
| Alloc_variable_bucket_248 | allocs:377 alloc_MB:0.1 buffer_MB:0.5 cached_buffer_MB:0.0 |
| Alloc_variable_bucket_312 | allocs:7044 alloc_MB:2.1 buffer_MB:7.0 cached_buffer_MB:0.0 |
| Alloc_variable_bucket_384 | allocs:45 alloc_MB:0.0 buffer_MB:1.0 cached_buffer_MB:0.0 |
| Alloc_variable_bucket_480 | allocs:512 alloc_MB:0.2 buffer_MB:3.8 cached_buffer_MB:0.0 |
| Alloc_variable_bucket_600 | allocs:292 alloc_MB:0.2 buffer_MB:2.9 cached_buffer_MB:0.0 |
| Alloc_variable_bucket_752 | allocs:71 alloc_MB:0.1 buffer_MB:0.9 cached_buffer_MB:0.0 |
| Alloc_variable_bucket_936 | allocs:60 alloc_MB:0.1 buffer_MB:1.0 cached_buffer_MB:0.1 |
| Alloc_variable_bucket_1168 | allocs:52 alloc_MB:0.1 buffer_MB:0.8 cached_buffer_MB:0.0 |
| Alloc_variable_bucket_1480 | allocs:48 alloc_MB:0.1 buffer_MB:0.4 cached_buffer_MB:0.0 |
| Alloc_variable_bucket_1832 | allocs:53 alloc_MB:0.1 buffer_MB:0.5 cached_buffer_MB:0.0 |
| Alloc_variable_bucket_2288 | allocs:22 alloc_MB:0.0 buffer_MB:0.9 cached_buffer_MB:0.5 |
| Alloc_variable_bucket_2832 | allocs:289 alloc_MB:0.8 buffer_MB:1.2 cached_buffer_MB:0.1 |
| Alloc_variable_bucket_3528 | allocs:35 alloc_MB:0.1 buffer_MB:0.4 cached_buffer_MB:0.0 |
| Alloc_variable_bucket_4504 | allocs:108 alloc_MB:0.5 buffer_MB:2.8 cached_buffer_MB:0.9 |
| Alloc_variable_bucket_5680 | allocs:140 alloc_MB:0.8 buffer_MB:0.9 cached_buffer_MB:0.0 |
| Alloc_variable_bucket_6224 | allocs:15 alloc_MB:0.1 buffer_MB:0.2 cached_buffer_MB:0.0 |
| Alloc_variable_bucket_7264 | allocs:14 alloc_MB:0.1 buffer_MB:0.1 cached_buffer_MB:0.0 |
| Alloc_variable_bucket_9344 | allocs:796 alloc_MB:7.1 buffer_MB:13.2 cached_buffer_MB:0.0 |
| Alloc_variable_bucket_11896 | allocs:0 alloc_MB:0.0 buffer_MB:0.1 cached_buffer_MB:0.1 |
| Alloc_variable_bucket_14544 | allocs:1 alloc_MB:0.0 buffer_MB:0.4 cached_buffer_MB:0.2 |
| Alloc_variable_bucket_18696 | allocs:2 alloc_MB:0.0 buffer_MB:0.5 cached_buffer_MB:0.2 |
| Alloc_variable_bucket_21816 | allocs:0 alloc_MB:0.0 buffer_MB:0.2 cached_buffer_MB:0.2 |
| Alloc_variable_bucket_26184 | allocs:0 alloc_MB:0.0 buffer_MB:0.5 cached_buffer_MB:0.5 |
| Alloc_variable_bucket_32728 | allocs:1 alloc_MB:0.0 buffer_MB:1.0 cached_buffer_MB:0.9 |
| Alloc_variable_bucket_43648 | allocs:0 alloc_MB:0.0 buffer_MB:1.6 cached_buffer_MB:1.6 |
| Alloc_variable_bucket_65472 | allocs:0 alloc_MB:0.0 buffer_MB:0.8 cached_buffer_MB:0.8 |
| Alloc_variable_bucket_130960 | allocs:7 alloc_MB:0.9 buffer_MB:2.8 cached_buffer_MB:1.9 |
| Alloc_variable_cached_buffers | 28.2 (+9.6) MB |
| Alloc_variable_allocated | 35.7 MB |
| Successful_read_queries | 17153670 |
| Successful_write_queries | 51573231 |
| Failed_read_queries | 19448 |
| Failed_write_queries | 10111 |
| Rows_returned_by_reads | 814793799 |
| Rows_affected_by_writes | 17235195761 |
| Execution_time_of_reads | 8095953108 ms |
| Execution_time_of_write | 3802106080 ms |
| Transaction_buffer_wait_time | 0 ms |
| Transaction_log_flush_wait_time | 8 ms |
| Row_lock_wait_time | 258790 ms |
| Ingest_errors_disk_space_use | 819 Bytes |
| Total_blobs_submitted_for_fsync | 584078 |
| Total_blobs_processed_for_fsync | 584078 |
| Ssl_ctx_verify_depth | 18446744073709551615 |
| Ssl_session_cache_size | 20480

Leaf with lower memory use:

| Aborted_clients | 84920 |
| Aborted_connects | 1 |
| Bytes_received | 254645346137 |
| Bytes_sent | 108015427390 |
| Connections | 87261 |
| Max_used_connections | 937 |
| Queries | 3164116 |
| Questions | 3164116 |
| Threads_cached | 543 |
| Threads_connected | 1411 |
| Threads_created | 610 |
| Threads_running | 1 |
| Threads_background | 1 |
| Threads_shutdown | 1579 |
| Threads_idle | 1344 |
| Ready_queue | 0 |
| Idle_queue | 0 |
| Context_switches | 952026 |
| Context_switch_misses | 5954 |
| Active_dedicated_admin_connections | 0 |
| Total_dedicated_admin_connections | 0 |
| Workload_management_queued_queries | 0 |
| Workload_management_active_queries | 0 |
| Workload_management_active_threads | 0 |
| Workload_management_active_connections | 0 |
| Columnstore_ingest_management_queued_queries | 0 |
| Columnstore_ingest_management_active_queries | 0 |
| Columnstore_ingest_management_max_concurrency | 0 |
| Columnstore_ingest_management_estimated_segments_to_flush | 0 |
| Columnstore_ingest_management_estimated_memory | 0.000 MB |
| Uptime | 162496 |
| Prepared_stmt_count | 0 |
| Auto_attach_remaining_seconds | 0 |
| Data_directory | /var/lib/memsql/instance/data |
| Plancache_directory | /var/lib/memsql/instance/plancache |
| Transaction_logs_directory | /var/lib/memsql/instance/data/logs |
| Segments_directory | /var/lib/memsql/instance/data/blobs |
| Snapshots_directory | /var/lib/memsql/instance/data/snapshots |
| Disk_space_reserved_for_secondary_index | 0 |
| Threads_waiting_for_disk_space | 0 |
| License | xxx |
| License_version | 4 |
| License_capacity | 50 units |
| Used_instance_license_units | 4 |
| License_expiration | 1636617600 |
| Seconds_until_expiration | 24341027 |
| License_key | xxx |
| License_type | enterprise |
| Maximum_cluster_capacity | 50 units |
| Query_compilations | 781 |
| Query_compilation_failures | 0 |
| Inflight_async_compilations | 0 |
| GCed_versions_last_sweep | 0 |
| Average_garbage_collection_duration | 236 ms |
| Total_server_memory | 35886.1 (-69913.9) MB |
| Total_io_pool_memory | 54.6 MB |
| Free_io_pool_memory | 6.8 (+1.1) MB |
| Alloc_thread_stacks | 611.000 (-4.000) MB |
| Malloc_active_memory | 2497.953 (-9070.292) MB |
| Malloc_transaction_cached_memory | 521.639 (+141.267) MB |
| Linux_resident_memory | 36961.122 (-69447.992) MB |
| Linux_resident_shared_memory | 684.950 (+38.727) MB |
| Buffer_manager_memory | 29071.4 (-60923.2) MB |
| Buffer_manager_cached_memory | 26405.8 (+26405.8) MB |
| Buffer_manager_unrecycled_memory | 2.2 (+2.2) MB |
| Alloc_skiplist_tower | 775.375 (+16.750) MB |
| Alloc_variable | 185.125 (+15.625) MB |
| Alloc_table_primary | 517.125 (+0.250) MB |
| Alloc_deleted_version | 458.500 (+157.750) MB |
| Alloc_internal_key_node | 239.000 MB |
| Alloc_hash_buckets | 1165.659 MB |
| Alloc_table_metadata_cache | 8.750 (+0.250) MB |
| Alloc_unit_images | 971.872 (+55.763) MB |
| Alloc_unit_ifn_thunks | 22.954 (+1.825) MB |
| Alloc_object_code_images | 365.387 (+17.421) MB |
| Alloc_compiled_unit_sections | 215.684 (+8.622) MB |
| Alloc_databases_list_entry | 5.875 MB |
| Alloc_plan_cache | 6.750 (+0.625) MB |
| Alloc_query_execution | 0.000 (-96783.626) MB |
| Alloc_warnings | 194.125 (-0.125) MB |
| Alloc_replication | 12.000 (-0.875) MB |
| Alloc_sharding_partitions | 0.375 MB |
| Alloc_mmap_file | 848.000 MB |
| Alloc_client_connection | 84.000 (+84.000) MB |
| Alloc_protocol_packet | 176.250 (-44.125) MB |
| Alloc_large_incremental | 0.000 (-9233.001) MB |
| Alloc_distributed_transaction | 0.000 (-10.500) MB |
| Alloc_profile_stats | 0.125 MB |
| Alloc_table_autostats | 909.611 MB |
| Alloc_system_tasks | 0.000 (-0.250) MB |
| Alloc_table_memory | 4250.394 (+190.375) MB |
| Alloc_variable_bucket_16 | allocs:354410 alloc_MB:5.4 buffer_MB:35.5 cached_buffer_MB:2.1 |
| Alloc_variable_bucket_24 | allocs:260485 alloc_MB:6.0 buffer_MB:24.2 cached_buffer_MB:1.0 |
| Alloc_variable_bucket_32 | allocs:45715 alloc_MB:1.4 buffer_MB:11.5 cached_buffer_MB:1.4 |
| Alloc_variable_bucket_40 | allocs:22498 alloc_MB:0.9 buffer_MB:11.2 cached_buffer_MB:0.2 |
| Alloc_variable_bucket_48 | allocs:19375 alloc_MB:0.9 buffer_MB:8.4 cached_buffer_MB:3.0 |
| Alloc_variable_bucket_56 | allocs:11479 alloc_MB:0.6 buffer_MB:8.9 cached_buffer_MB:1.8 |
| Alloc_variable_bucket_64 | allocs:7945 alloc_MB:0.5 buffer_MB:7.5 cached_buffer_MB:2.2 |
| Alloc_variable_bucket_72 | allocs:9569 alloc_MB:0.7 buffer_MB:4.0 cached_buffer_MB:0.1 |
| Alloc_variable_bucket_80 | allocs:5815 alloc_MB:0.4 buffer_MB:7.2 cached_buffer_MB:1.8 |
| Alloc_variable_bucket_88 | allocs:3204 alloc_MB:0.3 buffer_MB:7.2 cached_buffer_MB:1.8 |
| Alloc_variable_bucket_104 | allocs:4188 alloc_MB:0.4 buffer_MB:7.1 cached_buffer_MB:1.8 |
| Alloc_variable_bucket_128 | allocs:12524 alloc_MB:1.5 buffer_MB:9.9 cached_buffer_MB:0.1 |
| Alloc_variable_bucket_160 | allocs:17977 alloc_MB:2.7 buffer_MB:2.9 cached_buffer_MB:0.0 |
| Alloc_variable_bucket_200 | allocs:1018 alloc_MB:0.2 buffer_MB:0.4 cached_buffer_MB:0.0 |
| Alloc_variable_bucket_248 | allocs:377 alloc_MB:0.1 buffer_MB:0.4 cached_buffer_MB:0.0 |
| Alloc_variable_bucket_312 | allocs:7044 alloc_MB:2.1 buffer_MB:7.5 cached_buffer_MB:0.0 |
| Alloc_variable_bucket_384 | allocs:48 alloc_MB:0.0 buffer_MB:0.2 cached_buffer_MB:0.0 |
| Alloc_variable_bucket_480 | allocs:515 alloc_MB:0.2 buffer_MB:6.2 cached_buffer_MB:0.0 |
| Alloc_variable_bucket_600 | allocs:292 alloc_MB:0.2 buffer_MB:1.1 cached_buffer_MB:0.8 |
| Alloc_variable_bucket_752 | allocs:71 alloc_MB:0.1 buffer_MB:0.2 cached_buffer_MB:0.0 |
| Alloc_variable_bucket_936 | allocs:60 alloc_MB:0.1 buffer_MB:0.2 cached_buffer_MB:0.0 |
| Alloc_variable_bucket_1168 | allocs:52 alloc_MB:0.1 buffer_MB:0.5 cached_buffer_MB:0.4 |
| Alloc_variable_bucket_1480 | allocs:48 alloc_MB:0.1 buffer_MB:0.4 cached_buffer_MB:0.2 |
| Alloc_variable_bucket_1832 | allocs:53 alloc_MB:0.1 buffer_MB:1.0 cached_buffer_MB:0.6 |
| Alloc_variable_bucket_2288 | allocs:22 alloc_MB:0.0 buffer_MB:0.5 cached_buffer_MB:0.2 |
| Alloc_variable_bucket_2832 | allocs:289 alloc_MB:0.8 buffer_MB:1.2 cached_buffer_MB:0.1 |
| Alloc_variable_bucket_3528 | allocs:35 alloc_MB:0.1 buffer_MB:0.4 cached_buffer_MB:0.1 |
| Alloc_variable_bucket_4504 | allocs:108 alloc_MB:0.5 buffer_MB:2.8 cached_buffer_MB:1.6 |
| Alloc_variable_bucket_5680 | allocs:140 alloc_MB:0.8 buffer_MB:1.2 cached_buffer_MB:0.0 |
| Alloc_variable_bucket_6224 | allocs:15 alloc_MB:0.1 buffer_MB:0.1 cached_buffer_MB:0.0 |
| Alloc_variable_bucket_7264 | allocs:14 alloc_MB:0.1 buffer_MB:0.2 cached_buffer_MB:0.0 |
| Alloc_variable_bucket_9344 | allocs:796 alloc_MB:7.1 buffer_MB:7.6 cached_buffer_MB:0.0 |
| Alloc_variable_bucket_11896 | allocs:0 alloc_MB:0.0 buffer_MB:0.1 cached_buffer_MB:0.1 |
| Alloc_variable_bucket_14544 | allocs:0 alloc_MB:0.0 buffer_MB:0.2 cached_buffer_MB:0.2 |
| Alloc_variable_bucket_18696 | allocs:1 alloc_MB:0.0 buffer_MB:0.4 cached_buffer_MB:0.2 |
| Alloc_variable_bucket_21816 | allocs:1 alloc_MB:0.0 buffer_MB:0.1 cached_buffer_MB:0.0 |
| Alloc_variable_bucket_26184 | allocs:0 alloc_MB:0.0 buffer_MB:0.5 cached_buffer_MB:0.5 |
| Alloc_variable_bucket_32728 | allocs:1 alloc_MB:0.0 buffer_MB:1.0 cached_buffer_MB:0.9 |
| Alloc_variable_bucket_43648 | allocs:0 alloc_MB:0.0 buffer_MB:1.6 cached_buffer_MB:1.6 |
| Alloc_variable_bucket_65472 | allocs:1 alloc_MB:0.1 buffer_MB:0.5 cached_buffer_MB:0.4 |
| Alloc_variable_bucket_130960 | allocs:7 alloc_MB:0.9 buffer_MB:2.6 cached_buffer_MB:1.8 |
| Alloc_variable_cached_buffers | 27.1 (+7.1) MB |
| Alloc_variable_allocated | 35.2 MB |
| Successful_read_queries | 1544121 |
| Successful_write_queries | 236801 |
| Failed_read_queries | 384 |
| Failed_write_queries | 661 |
| Rows_returned_by_reads | 37701936 |
| Rows_affected_by_writes | 1274585266 |
| Execution_time_of_reads | 609920804 ms |
| Execution_time_of_write | 341479392 ms |
| Transaction_buffer_wait_time | 0 ms |
| Transaction_log_flush_wait_time | 0 ms |
| Row_lock_wait_time | 24265 ms |
| Ingest_errors_disk_space_use | 0 Bytes |
| Total_blobs_submitted_for_fsync | 68762 |
| Total_blobs_processed_for_fsync | 68762 |
| Ssl_ctx_verify_depth | 18446744073709551615 |
| Ssl_session_cache_size | 20480 |

Hi Chen,

It looks like the plancache is the main difference in memory use.

On the leaf with higher memory use:

| Alloc_unit_images | 10014.628 (-2340.152) MB |
| Alloc_unit_ifn_thunks | 222.743 (-64.179) MB |
| Alloc_object_code_images | 4182.079 (-889.451) MB |
| Alloc_compiled_unit_sections | 2506.598 (-503.558) MB |

On the leaf with lower memory use:

| Alloc_unit_images | 971.872 (+55.763) MB |
| Alloc_unit_ifn_thunks | 22.954 (+1.825) MB |
| Alloc_object_code_images | 365.387 (+17.421) MB |
| Alloc_compiled_unit_sections | 215.684 (+8.622) MB |

This is memory for query plans stored in the plancache. Its likely the new nodes haven’t compiled as many query shapes as of yet. Does your application generate a lot of unique query shapes (does it vary filters or join patterns from query to query)?

Some things to look into:

  • Check on the types of queries you see in the plancache (select * from information_schema.plancache). Do you see anything unexpected? Singlestore will cache a single plan if you vary the value of constants (say where c = 1 and where c = 2), but otherwise we run the optimizer and generate new plans.
  • plan_expiration_minutes is the knob that controls how aggressively plans are removed from the plancache (if not re-used in plan_expiration_minutes its evicted). You could try lowering this knob (its 12 hours by default I believe). If you want to flush the plancache entirely you can set it to 0 and wait for a bit and then set it back to a good value.
  • Singlestore 7.3 uses less memory for plans, you would probably save 10 GB+ just by upgrading.

-Adam