Kubernetes leaf node licensing

Ben · April 25, 2023, 8:20am

I have spun up a basic Kubernetes insallation on a single physical host machine with the following specifications:

CPUs: 32
RAM: 128Gb

However, when trying to spin up more than 1 leaf node, I hit a licensing error saying that the second leaf node would take me from 4 to 8 licensing units.

I have my leaf height set to 1, and can see the expected limits (8cpu, 32Gb ram) being applied to the Kubernetes pods, so it was my understanding that each of these leaf nodes should only be using a single licensing unit.

Have I misunderstood something here? Happy to share any logs/configs that might be helpful.

hanson · April 25, 2023, 4:28pm

Probably you are hitting the issue that on-prem licensing is 8 cores and 32 GB per unit, but S2MS “height 1” I believe is 8 cores and 64 GB of RAM.

And the k8s operator is setup for S2MS

So I would check that your pods actually have 32 GB. Height 1 should have 64 GB.

(which is 2 license units).

Are you using the free license?

If you use toolbox instead of k8s it should just work.

Ben · April 26, 2023, 7:33am

Hi Hanson,

Yes, I am using the free license.

The documentation for the operator states that leaf height of 1 is 32Gb of RAM, and I can confirm that that is what I am seeing set as the requests/limits on my pods. Additionally, if I set the leaf height to 0.5 then I can see the pod limits being set at 16Gb of RAM, but am still hitting the same licensing issue.

Also, I am not seeing it try and use 2 units per leaf, but 4 (when it should be using 1), so I am not sure how this could be the issue.

I’m not sure if this is still accurate, but the information I’m using for reference on how the licensing calculation should work on kubernetes is from a forum post from a couple of years back: Using the Free License on a Kubernetes Cluster with more memory than allowed - #2 by hanson

Also, we are changing our licensing code so it will track container hard memory and cpu limits instead of the underlying host limits. This change should be released in a 7.1 patch release sometime this month. This will make running MemSQL free edition in k8s a lot easier then today.

Ben · May 2, 2023, 7:59am

I have run some additional tests (using a trial Enterprise licesnse), and everywhere I have been able to check, I can see the limits being applied to the pods, but the license calculation is still returning incorrect information.

This morning I tried setting the leaf height to 0.5 and running it under heavy load, and by monitoring the CPU of the machine as a whole I can confirm that it is definitely working within the limits set on the kubernetes pods.

With 3 leaves at height 1, the machine is bursting up to 24 cores. With 3 leaves at height 0.5, it is bursting up to 12 cores. But each leaf is using 4 licnese units, no matter the height (ie it’s calculating based on the full resources of the machine - not the pod limits).

hanson · May 3, 2023, 12:28am

I don’t have a solution to offer you for the license check in the multi-node configuration. I’ve asked somebody who may know better to take a look.

Consider just running one leaf node and use lots of partitions, like one partition per core, to keep under the licensing limit but still get the benefit of all your hardware. Since you are one host anyway, HA is not significant; you don’t really need a multi-leaf configuration.

hanson · May 3, 2023, 12:38am

Our internal expert says:
the engine license checks are cgroup aware… so he has a config issue of some kind
(as long as he using a relatively new engine version… this was changed ~3 years ago

me:
You mean if he puts each leaf in a different cgroup with the right caps, it should work?

answer:
yep
k8s will do that for him… or should
one thing that maybe a confounding factor
is if he has a very old memory-based license in use
he needs to have a newer unit license
“newer” as in 4+ years ago

However, if you are getting an error about license units, you probably have the right license type.

If you do

show status extended like ‘%license%’;

it should say if the limit is in MB or Units.

Ben · May 3, 2023, 9:06am

Hi Hanson, thanks for escalating this one. The license should be correct, as it is reporting ‘4 units’ to me, and I am getting the same behaviour with our free license and a trial Enterprise license.

I appreciate it probably seems like as misconfiguration, but the cgroup configuration does seem to be being applied correctly.

What I believe should be the source of truth for the applied limits, the cgroups themselves, are reporting the specified limits for both Memory (32Gb) and CPUs (8) when querying either the node filesystem or the kubernetes cAdvisor :

│ └─kubepods-burstable-pod60cdb7b8_0384_4b9b_b550_e9732e93388c.slice
│   ├─cri-containerd-2fad473233e86d93d261d0acf36fbefe8ae7beca953685dd89755cf89a703468.scope
│   │ ├─3147876 bash /etc/memsql/scripts/exporter-startup-script
│   │ ├─3147939 bash /etc/memsql/scripts/exporter-startup-script
│   │ └─3147941 /bin/memsql_exporter
│   ├─cri-containerd-4be5077585d9c8fcd5d65463963d1b2d6043de797d0dbe7d120285fe63adb571.scope
│   │ └─3147569 /pause
│   └─cri-containerd-17b0218bb66421af2e31f6208d4a29764dd691470616d12da14e11fc926d042e.scope
│     ├─3147671 bash /assets/startup-node
│     ├─3148292 /opt/memsql-server-8.0.17-0553658f69/memsqld_safe --auto-restart disable --defaults-file /var/lib/memsql/instance/memsql.cnf --memsqld /opt/memsql-server-8.0.17-0553658f69/memsqld --user 999
│     ├─3148370 /opt/memsql-server-8.0.17-0553658f69/memsqld --defaults-file /var/lib/memsql/instance/memsql.cnf --user 999
│     └─3148470 /opt/memsql-server-8.0.17-0553658f69/memsqld --defaults-file /var/lib/memsql/instance/memsql.cnf --user 999



ben@server$ cat /sys/fs/cgroup/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod60cdb7b8_0384_4b9b_b550_e9732e93388c.slice/cpu.max
810000 100000
ben@server$ cat /sys/fs/cgroup/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod60cdb7b8_0384_4b9b_b550_e9732e93388c.slice/memory.max
34464595968

ben@server:~$ cat /sys/fs/cgroup/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod60cdb7b8_0384_4b9b_b550_e9732e93388c.slice/cri-containerd-17b0218bb66421af2e31f6208d4a29764dd691470616d12da14e11fc926d042e.scope/cpu.max
800000 100000
ben@server:~$ cat /sys/fs/cgroup/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod60cdb7b8_0384_4b9b_b550_e9732e93388c.slice/cri-containerd-17b0218bb66421af2e31f6208d4a29764dd691470616d12da14e11fc926d042e.scope/memory.max
34359738368


ben@server:~$ curl http://localhost:8001/api/v1/nodes/server/proxy/metrics/cadvisor | grep "container_spec_cpu_quota" | grep "node-sdb-cluster-leaf-"
container_spec_cpu_quota{container="",id="/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod56219a10_fc11_41bd_b38f_55c17052eba6.slice",image="",name="",namespace="default",pod="node-sdb-cluster-leaf-ag1-1"} 810000
container_spec_cpu_quota{container="",id="/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod60cdb7b8_0384_4b9b_b550_e9732e93388c.slice",image="",name="",namespace="default",pod="node-sdb-cluster-leaf-ag1-0"} 810000
container_spec_cpu_quota{container="",id="/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod80e9b598_efc4_4cc9_a994_4c71c97fdf12.slice",image="",name="",namespace="default",pod="node-sdb-cluster-leaf-ag1-2"} 810000
container_spec_cpu_quota{container="exporter",id="/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod56219a10_fc11_41bd_b38f_55c17052eba6.slice/cri-containerd-b348d5a3d45b20fab66856850564fab6ed8a5dffb1efab771ec2c62f61cec61c.scope",image="docker.io/singlestore/node:latest",name="b348d5a3d45b20fab66856850564fab6ed8a5dffb1efab771ec2c62f61cec61c",namespace="default",pod="node-sdb-cluster-leaf-ag1-1"} 10000
container_spec_cpu_quota{container="exporter",id="/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod60cdb7b8_0384_4b9b_b550_e9732e93388c.slice/cri-containerd-2fad473233e86d93d261d0acf36fbefe8ae7beca953685dd89755cf89a703468.scope",image="docker.io/singlestore/node:latest",name="2fad473233e86d93d261d0acf36fbefe8ae7beca953685dd89755cf89a703468",namespace="default",pod="node-sdb-cluster-leaf-ag1-0"} 10000
container_spec_cpu_quota{container="exporter",id="/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod80e9b598_efc4_4cc9_a994_4c71c97fdf12.slice/cri-containerd-6c5bf1f3a444cacdc156c911249554df44997438afdb8c112c84492e7ce0fc59.scope",image="docker.io/singlestore/node:latest",name="6c5bf1f3a444cacdc156c911249554df44997438afdb8c112c84492e7ce0fc59",namespace="default",pod="node-sdb-cluster-leaf-ag1-2"} 10000
container_spec_cpu_quota{container="node",id="/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod56219a10_fc11_41bd_b38f_55c17052eba6.slice/cri-containerd-a63c83b26f97159d9f04e052ec682ab47b4486ebead03cf93d1b72b51e08bf1e.scope",image="docker.io/singlestore/node:latest",name="a63c83b26f97159d9f04e052ec682ab47b4486ebead03cf93d1b72b51e08bf1e",namespace="default",pod="node-sdb-cluster-leaf-ag1-1"} 800000
container_spec_cpu_quota{container="node",id="/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod60cdb7b8_0384_4b9b_b550_e9732e93388c.slice/cri-containerd-17b0218bb66421af2e31f6208d4a29764dd691470616d12da14e11fc926d042e.scope",image="docker.io/singlestore/node:latest",name="17b0218bb66421af2e31f6208d4a29764dd691470616d12da14e11fc926d042e",namespace="default",pod="node-sdb-cluster-leaf-ag1-0"} 800000
container_spec_cpu_quota{container="node",id="/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod80e9b598_efc4_4cc9_a994_4c71c97fdf12.slice/cri-containerd-a71d39d49d070f71e2b3daef9cd5b21d613fc5af2190af397f9f34985cd1ab8d.scope",image="docker.io/singlestore/node:latest",name="a71d39d49d070f71e2b3daef9cd5b21d613fc5af2190af397f9f34985cd1ab8d",namespace="default",pod="node-sdb-cluster-leaf-ag1-2"} 800000

However, when I query the database for its limits, it appears that the memory limit is being recognised, but not the CPU:

MySQL [information_schema]> SELECT * FROM mv_nodes;
+----+---------------------------------------------+------+---------------+---------------+------+--------+--------------------+----------+---------------+---------------------+----------------+----------------------+--------------------+------------------------+--------+---------+
| ID | IP_ADDR                                     | PORT | EXTERNAL_HOST | EXTERNAL_PORT | TYPE | STATE  | AVAILABILITY_GROUP | NUM_CPUS | MAX_MEMORY_MB | MAX_TABLE_MEMORY_MB | MEMORY_USED_MB | TABLE_MEMORY_USED_MB | TOTAL_DATA_DISK_MB | AVAILABLE_DATA_DISK_MB | UPTIME | VERSION |
+----+---------------------------------------------+------+---------------+---------------+------+--------+--------------------+----------+---------------+---------------------+----------------+----------------------+--------------------+------------------------+--------+---------+
|  4 | node-sdb-cluster-leaf-ag1-2.svc-sdb-cluster | 3306 | NULL          |          NULL | LEAF | online |                  1 |       32 |         29491 |               26541 |           6495 |                  737 |            3752978 |                3424411 |  69913 | 8.0.17  |
|  3 | node-sdb-cluster-leaf-ag1-1.svc-sdb-cluster | 3306 | NULL          |          NULL | LEAF | online |                  1 |       32 |         29491 |               26541 |           6464 |                  758 |            3752978 |                3424411 |  69913 | 8.0.17  |
|  2 | node-sdb-cluster-leaf-ag1-0.svc-sdb-cluster | 3306 | NULL          |          NULL | LEAF | online |                  1 |       32 |         29491 |               26541 |           6671 |                  753 |            3752978 |                3424411 |  69913 | 8.0.17  |
|  1 | node-sdb-cluster-master-0.svc-sdb-cluster   | 3306 | NULL          |          NULL | MA   | online |               NULL |       32 |         14745 |               13270 |            484 |                   56 |            3752978 |                3424411 |  69990 | 8.0.17  |
+----+---------------------------------------------+------+---------------+---------------+------+--------+--------------------+----------+---------------+---------------------+----------------+----------------------+--------------------+------------------------+--------+---------+
4 rows in set (0.026 sec)

(for what it’s worth, it also seems to be getting the disk calculation wrong here too, and reading the host’s OS disk and not the PVs they have assigned.)

As an additional test, I have tried limiting the number of CPUs assigned and run it at high load, and can confirm that it is being limited by the cgroup configuration, and it is not able to exceed the limits set on the pods.

The fact that it is being limited in actuality, and the memory limits are being correctly recognised, and that querying the cgroups on the filesystem or via kubernetes appears to show the correct limits, all lead me to think that the cgroups are correctly defined and configured.

We will be looking to expand the kubernets cluster to multiple hosts in future, which is why we need to get this right now, and due to the limitations on the way the leaf height variable works, it’s important that we can run more than one leaf per phsyical host.

Are you able to shed any light on how/where it is checking these CPU limits, and why it might be recognising the applied memory limit but not the CPU limit?

Ben · May 16, 2023, 8:17am

I think I have resolved this now. I tried downgrading my cgroups version from V2 to V1, and the license calculation is now working correctly.

It looks like Singlestore is just not compatible with the new version.

adam · May 20, 2023, 6:41am

Thanks for reporting this Ben. Your right we don’t support cgroups V2. The engine is not able to figure out the the number of “logical” cores its allowed to use for licensing purposes (among other things it uses the core count internally for) so its default to the hosts core count. We have a task open to fix this.