Reconciler error : failed to get service endpoint (svc-sdb-cluster-ddl): no ingress endpoint found

HI,

I am deploying SS Cluster using Kubernetes.Unable to spin up Aggregator and Leaf Nodes.
sdb-operator pods are created and working.

Error:Reconciler error : failed to get service endpoint (svc-sdb-cluster-ddl): no ingress endpoint found

Another log:attempting to acquire leader lease default/memsql-operator-lock-sdb-cluster...

Logs:

[root@learning-1 ss_kubernetese]# kubectl logs deployment/sdb-operator
2022/10/13 07:35:19 deleg.go:121        {controller.memsql}     reconciliation cause: statefulset       namespace: "default"  clusterName: "sdb-cluster"  serviceName: "svc-sdb-cluster-ddl"  namespace: "default"
2022/10/13 07:35:19 deleg.go:121        {controller.memsql}     reconciliation cause: statefulset       namespace: "default"  clusterName: "sdb-cluster"  serviceName: "svc-sdb-cluster-ddl"  namespace: "default"
2022/10/13 07:35:19 deleg.go:121        {controller.memsql}     reconciliation cause: statefulset       namespace: "default"  clusterName: "sdb-cluster"  serviceName: "svc-sdb-cluster"  namespace: "default"
2022/10/13 07:35:19 deleg.go:121        {controller.memsql}     reconciliation cause: statefulset       namespace: "default"  clusterName: "sdb-cluster"  serviceName: "svc-sdb-cluster"  namespace: "default"

2022/10/13 08:52:06 logr.go:249 {controller.memsql}     Reconciling MemSQL Cluster.     Request.Name: "sdb-cluster"  Request.Namespace: "default"
2022/10/13 08:52:06 deleg.go:121        {memsql}        can't find operator deployment, trying uncached client  key: "default/operator-sdb-cluster"
2022/10/13 08:52:06 deleg.go:135        {memsql}        can't find operator deployment, metrics service will not be created     error: "deployments.apps "operator-sdb-cluster" not found"
2022/10/13 08:52:06 deleg.go:135        {controller.memsql}     Reconciler error, will retry after      10m0s: "error"  failed to get service endpoint (svc-sdb-cluster-ddl): no ingress endpoint found



[root@learning-1 ss_kubernetese]# kubectl logs deployment/sdb-operator
Found 2 pods, using pod/sdb-operator-564b9d7d97-l6x22
2022/10/13 09:03:20 deleg.go:121        {cmd}   Go Version: go1.18.2
2022/10/13 09:03:20 deleg.go:121        {cmd}   Go OS/Arch: linux/amd64
2022/10/13 09:03:20 deleg.go:121        {cmd}   Operator Version: 3.0.33
2022/10/13 09:03:20 deleg.go:121        {cmd}   Commit Hash: db8f5aff
2022/10/13 09:03:20 deleg.go:121        {cmd}   Build Time: 2022-09-08T14:43:05Z
2022/10/13 09:03:20 deleg.go:121        {cmd}   Options:
2022/10/13 09:03:20 deleg.go:121        {cmd}   --cores-per-unit: 8.000000
2022/10/13 09:03:20 deleg.go:121        {cmd}   --memory-per-unit: 32.000000
2022/10/13 09:03:20 deleg.go:121        {cmd}   --overpack-factor: 0.000000
2022/10/13 09:03:20 deleg.go:121        {cmd}   --extra-cidrs: []
2022/10/13 09:03:20 deleg.go:121        {cmd}   --external-dns-domain-name: {false }
2022/10/13 09:03:20 deleg.go:121        {cmd}   --external-dns-ttl: {false 0}
2022/10/13 09:03:20 deleg.go:121        {cmd}   --ssl-secret-name:
2022/10/13 09:03:20 deleg.go:121        {cmd}   --merge-service-annotations: true
2022/10/13 09:03:20 deleg.go:121        {cmd}   --backup-default-deadline-seconds: 3600
2022/10/13 09:03:20 deleg.go:121        {cmd}   --backup-incremental-default-deadline-seconds: 3600
2022/10/13 09:03:20 deleg.go:121        {cmd}   --cluster-id: sdb-cluster
2022/10/13 09:03:20 deleg.go:121        {cmd}   --fs-group-id: 5555
2022/10/13 09:03:20 deleg.go:121        {controller-runtime.metrics}    Metrics server is starting to listen    addr: "0.0.0.0:9090"
2022/10/13 09:03:21 deleg.go:121        {cmd}   starting manager
2022/10/13 09:03:21 logr.go:249 Starting server kind: "metrics"  addr: "[::]:9090"  path: "/metrics"
I1013 09:03:21.196052       1 leaderelection.go:248] attempting to acquire leader lease default/memsql-operator-lock-sdb-cluster...



[root@learning-1 4px]# kubectl get pods
NAME                          READY   STATUS    RESTARTS   AGE
sdb-operator-9565d987-5rg8l   1/1     Running   0          96m
sdb-operator-9565d987-hjt5j   1/1     Running   0          96m
[root@learning-1 4px]# kubectl get  memsqlclusters.memsql.com/sdb-cluster
NAME          AGGREGATORS   LEAVES   REDUNDANCY LEVEL   AGE
sdb-cluster   0             0        1                  124m
[root@learning-1 4px]#  kubectl get services
NAME                  TYPE           CLUSTER-IP      EXTERNAL-IP   PORT(S)          AGE
kubernetes            ClusterIP      10.96.0.1       <none>        443/TCP          135m
svc-sdb-cluster       ClusterIP      None            <none>        3306/TCP         125m
svc-sdb-cluster-ddl   LoadBalancer   10.102.231.87   <pending>     3306:30052/TCP   125m

[root@learning-1 ss_kubernetese]# kubectl describe pod
Name:             sdb-operator-564b9d7d97-6xs8d
Namespace:        default
Priority:         0
Service Account:  sdb-operator
Node:             learning-2/10.138.0.3
Start Time:       Thu, 13 Oct 2022 09:03:18 +0000
Labels:           name=sdb-operator
                  pod-template-hash=564b9d7d97
Annotations:      <none>
Status:           Running
IP:               10.244.1.32
IPs:
  IP:           10.244.1.32
Controlled By:  ReplicaSet/sdb-operator-564b9d7d97
Containers:
  sdb-operator:
    Container ID:  containerd://0586b50eef3d95b561ee335de7678ca2826a3958dda2ba5a45976e510b62744f
    Image:         singlestore/operator:3.0.32-db8f5aff
    Image ID:      docker.io/memsql/operator@sha256:cd39e13744e57142eff3fe8e3e55dbb4526778b5331cd0bf4d26c9d2f3526031
    Port:          <none>
    Host Port:     <none>
    Args:
      --merge-service-annotations
      --fs-group-id
      5555
      --cluster-id
      sdb-cluster
    State:          Running
      Started:      Thu, 13 Oct 2022 09:03:21 +0000
    Ready:          True
    Restart Count:  0
    Environment:
      WATCH_NAMESPACE:  default (v1:metadata.namespace)
      POD_NAME:         sdb-operator-564b9d7d97-6xs8d (v1:metadata.name)
      OPERATOR_NAME:    sdb-operator
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-4c9pf (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             True
  ContainersReady   True
  PodScheduled      True
Volumes:
  kube-api-access-4c9pf:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   BestEffort
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type    Reason     Age    From               Message
  ----    ------     ----   ----               -------
  Normal  Scheduled  6m15s  default-scheduler  Successfully assigned default/sdb-operator-564b9d7d97-6xs8d to learning-2
  Normal  Pulling    6m13s  kubelet            Pulling image "singlestore/operator:3.0.32-db8f5aff"
  Normal  Pulled     6m12s  kubelet            Successfully pulled image "singlestore/operator:3.0.32-db8f5aff" in 775.624181ms
  Normal  Created    6m12s  kubelet            Created container sdb-operator
  Normal  Started    6m12s  kubelet            Started container sdb-operator

Can Someone pls resolve this issue , I have attached logs.

sdb-operator.yaml

[root@learning-1 ss_kubernetese]# cat sdb-operator.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: sdb-operator
  labels:
    app.kubernetes.io/component: operator
spec:
  replicas: 2
  selector:
    matchLabels:
      name: sdb-operator
  template:
    metadata:
      labels:
        name: sdb-operator
    spec:
      serviceAccountName: sdb-operator
      containers:
        - name: sdb-operator
          image: singlestore/operator:3.0.32-db8f5aff
          imagePullPolicy: Always
          args: [
            # Cause the operator to merge rather than replace annotations on services
            "--merge-service-annotations",
            # Allow the process inside the container to have read/write access to the `/var/lib/memsql` volume.
            "--fs-group-id", "5555",
            "--cluster-id", "sdb-cluster"
                 ]
          env:
            - name: WATCH_NAMESPACE
              valueFrom:
                fieldRef:
                  fieldPath: metadata.namespace
            - name: POD_NAME
              valueFrom:
                fieldRef:
                  fieldPath: metadata.name
            - name: OPERATOR_NAME
              value: "sdb-operator"

sdb-cluster.yaml:

[root@learning-1 ss_kubernetese]# cat sdb-cluster.yaml
apiVersion: memsql.com/v1alpha1
kind: MemsqlCluster
metadata:
  name: sdb-cluster
spec:
  license: <>
  adminHashedPassword: "*9177CC8207174BDBB5ED66B2140C75171283F15D"
  nodeImage:
    repository: singlestore/node
    tag: alma-7.8.17-69cee1f1a3

  redundancyLevel: 1

  serviceSpec:
    objectMetaOverrides:
      labels:
        custom: label
      annotations:
        custom: annotations

  aggregatorSpec:
    count: 1
    height: 0.5
    storageGB: 512
    storageClass: standard

    objectMetaOverrides:
      annotations:
        optional: annotation
      labels:
        optional: label

  leafSpec:
    count: 1
    height: 0.5
    storageGB: 1024
    storageClass: standard

    objectMetaOverrides:
      annotations:
        optional: annotation
      labels:
        optional: label
  usersSpec:
  rootServiceUser: true

This issue is pending long, I tried finding solution but unable to get it.

hello,

It looks like there is an issue with the ddl service. If the ddl service is a LoadBalancer, the operator will wait until an ingress endpoint is created and populated in the service’s .Status.LoadBalancer.Ingress. This usually happens when running without an ingress controller set up to create an externally accessible IP for the service. You can find more details here, Ingress Controllers | Kubernetes

Alternatively If you are just connecting from within the k8s cluster, you can set the services to be clusterIp

serviceSpec:
    type: "ClusterIP"

Let me know if this helps, On my side I’ll go make this error more verbose.

Regards,
Brooks

1 Like

Hi @bremy ,

I tried adding type: "ClusterIP" in sdb-cluster.yaml. But still aggr,leaf nodes are not spinning.

[root@learning-1 ss_kubernetese]# kubectl get all
NAME                                READY   STATUS    RESTARTS   AGE
pod/node-sdb-cluster-leaf-ag1-0     0/2     Pending   0          34m
pod/node-sdb-cluster-master-0       0/2     Pending   0          34m
pod/sdb-operator-84bf4b74dd-6jv9z   1/1     Running   0          34m

NAME                          TYPE           CLUSTER-IP      EXTERNAL-IP   PORT(S)          AGE
service/kubernetes            ClusterIP      10.96.0.1       <none>        443/TCP          44m
service/svc-sdb-cluster       ClusterIP      None            <none>        3306/TCP         34m
service/svc-sdb-cluster-ddl   ClusterIP      10.103.60.18    <none>        3306/TCP         34m
service/svc-sdb-cluster-dml   LoadBalancer   10.109.72.127   <pending>     3306:32497/TCP   8m24s

NAME                           READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/sdb-operator   1/1     1            1           34m

NAME                                      DESIRED   CURRENT   READY   AGE
replicaset.apps/sdb-operator-84bf4b74dd   1         1         1       34m

NAME                                         READY   AGE
statefulset.apps/node-sdb-cluster-leaf-ag1   0/1     34m
statefulset.apps/node-sdb-cluster-master     0/1     34m

Error:

2022-10-28T11:01:42.990Z        INFO    controller/configmaps_secrets.go:59     reconciliation cause: memsqlcluster     {"clusterName": "sdb-cluster", "namespace": "default"}
2022-10-28T11:01:42.990Z        INFO    controller/controller.go:114    Reconciling MemSQL Cluster.     {"Request.Namespace": "default", "Request.Name": "sdb-cluster"}
2022-10-28T11:01:42.990Z        INFO    memsql/metrics.go:58    can't find operator deployment, trying uncached client  {"key": "default/operator-sdb-cluster"}
2022-10-28T11:01:43.034Z        ERROR   memsql/metrics.go:61    can't find operator deployment, metrics service will not be created     {"error": "deployments.apps \"operator-sdb-cluster\" not found"}
freya/kube/memsql.NewMetricsServiceAction.func1
        freya/kube/memsql/metrics.go:61
freya/kube/memsql.ComposeActions.func1
        freya/kube/memsql/action.go:22
freya/kube/controller.(*Reconciler).Reconcile
        freya/kube/controller/controller.go:296
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile
        sigs.k8s.io/controller-runtime@v0.11.0/pkg/internal/controller/controller.go:114
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
        sigs.k8s.io/controller-runtime@v0.11.0/pkg/internal/controller/controller.go:311
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
        sigs.k8s.io/controller-runtime@v0.11.0/pkg/internal/controller/controller.go:266
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
        sigs.k8s.io/controller-runtime@v0.11.0/pkg/internal/controller/controller.go:227
2022-10-28T11:01:43.034Z        INFO    memsql/util.go:60       creating object {"type": "*v1.Secret", "name": "sdb-cluster"}
2022-10-28T11:01:43.074Z        INFO    memsql/util.go:60       creating object {"type": "*v1.Service", "name": "svc-sdb-cluster"}
2022-10-28T11:01:43.118Z        INFO    memsql/util.go:60       creating object {"type": "*v1.Service", "name": "svc-sdb-cluster-ddl"}
2022-10-28T11:01:43.119Z        INFO    controller/configmaps_secrets.go:94     reconciliation cause: statefulset       {"namespace": "default", "clusterName": "sdb-cluster", "serviceName": "svc-sdb-cluster", "namespace": "default"}
2022-10-28T11:01:43.161Z        INFO    controller/configmaps_secrets.go:94     reconciliation cause: statefulset       {"namespace": "default", "clusterName": "sdb-cluster", "serviceName": "svc-sdb-cluster-ddl", "namespace": "default"}
2022-10-28T11:01:43.161Z        INFO    memsql/util.go:60       creating object {"type": "*v1.Service", "name": "svc-sdb-cluster-dml"}
2022-10-28T11:01:43.203Z        INFO    controller/configmaps_secrets.go:94     reconciliation cause: statefulset       {"namespace": "default", "clusterName": "sdb-cluster", "serviceName": "svc-sdb-cluster-dml", "namespace": "default"}
2022-10-28T11:01:43.203Z        INFO    memsql/util.go:90       creating object {"type": "*v1beta1.PodDisruptionBudget", "name": "agg-sdb-cluster"}
2022-10-28T11:01:43.243Z        INFO    memsql/util.go:60       creating object {"type": "*v1.ConfigMap", "name": "ms-pusher-sdb-cluster"}
2022-10-28T11:01:43.281Z        INFO    memsql/util.go:60       creating object {"type": "*v1.ConfigMap", "name": "node-sdb-cluster-master"}
2022-10-28T11:01:43.319Z        INFO    memsql/nodes.go:134     Creating a New STS      {"name": "node-sdb-cluster-master"}
2022-10-28T11:01:43.319Z        INFO    memsql/util.go:171      creating object {"type": "*v1.StatefulSet", "name": "node-sdb-cluster-master"}
2022-10-28T11:01:43.359Z        INFO    controller/configmaps_secrets.go:94     reconciliation cause: statefulset       {"namespace": "default", "clusterName": "sdb-cluster", "statefulsetName": "node-sdb-cluster-master", "namespace": "default"}
2022-10-28T11:01:43.360Z        INFO    memsql/nodes.go:153     Wait for first STS pod  {"STS name": "node-sdb-cluster-master"}
2022-10-28T11:01:43.360Z        INFO    controller/controller.go:300    Transition to phase pending on missing phase value
2022-10-28T11:01:43.360Z        INFO    controller/controller.go:321    Updating operator version       {"previous version": "", "new version": "3.0.60"}
2022-10-28T11:01:43.360Z        INFO    controller/controller.go:328    Updating observed generation    {"previous value": 0, "new value": 1}
2022-10-28T11:01:43.391Z        INFO    controller/configmaps_secrets.go:94     reconciliation cause: statefulset       {"namespace": "default", "clusterName": "sdb-cluster", "statefulsetName": "node-sdb-cluster-master", "namespace": "default"}
2022-10-28T11:01:43.391Z        INFO    controller/configmaps_secrets.go:94     reconciliation cause: statefulset       {"namespace": "default", "clusterName": "sdb-cluster", "statefulsetName": "node-sdb-cluster-master", "namespace": "default"}
2022-10-28T11:01:43.453Z        INFO    controller/errors.go:78 RetryError: will retry after 1s: did not find pod for statefulset node-sdb-cluster-master
2022-10-28T11:01:43.453Z        INFO    controller/controller.go:114    Reconciling MemSQL Cluster.     {"Request.Namespace": "default", "Request.Name": "sdb-cluster"}
2022-10-28T11:01:43.453Z        INFO    memsql/metrics.go:58    can't find operator deployment, trying uncached client  {"key": "default/operator-sdb-cluster"}
2022-10-28T11:01:43.453Z        INFO    controller/configmaps_secrets.go:55     skipping reconcile request because cluster spec has not changed
2022-10-28T11:01:43.453Z        INFO    controller/configmaps_secrets.go:55     skipping reconcile request because cluster spec has not changed
2022-10-28T11:01:43.490Z        ERROR   memsql/metrics.go:61    can't find operator deployment, metrics service will not be created     {"error": "deployments.apps \"operator-sdb-cluster\" not found"}
freya/kube/memsql.NewMetricsServiceAction.func1
        freya/kube/memsql/metrics.go:61
freya/kube/memsql.ComposeActions.func1
        freya/kube/memsql/action.go:22
freya/kube/controller.(*Reconciler).Reconcile
        freya/kube/controller/controller.go:296
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile
        sigs.k8s.io/controller-runtime@v0.11.0/pkg/internal/controller/controller.go:114
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
        sigs.k8s.io/controller-runtime@v0.11.0/pkg/internal/controller/controller.go:311
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
        sigs.k8s.io/controller-runtime@v0.11.0/pkg/internal/controller/controller.go:266
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
        sigs.k8s.io/controller-runtime@v0.11.0/pkg/internal/controller/controller.go:227
2022-10-28T11:01:43.493Z        INFO    memsql/nodes.go:153     Wait for first STS pod  {"STS name": "node-sdb-cluster-master"}
2022-10-28T11:01:43.493Z        INFO    memsql/connection.go:24 Connect to the Master Aggregator
2022-10-28T11:01:43.941Z        ERROR   memsql/connection.go:31 Failed to connect to voting member      {"index": 0, "error": "dial tcp: lookup node-sdb-cluster-master-0.svc-sdb-cluster on 10.96.0.10:53: no such host"}

svc-sdb-cluster is None.

I checked with ingress controller:It’s working while I am trying to connect with other deployment, it’s working fine.

While checking for SS Cluster still getting error.

Error: no ingress endpoint found

2022-10-31T11:38:09.817Z        INFO    memsql/util.go:60       creating object {"type": "*v1.Secret", "name": "sdb-cluster"}
2022-10-31T11:38:09.858Z        INFO    memsql/util.go:60       creating object {"type": "*v1.Service", "name": "svc-sdb-cluster"}
2022-10-31T11:38:09.900Z        INFO    memsql/util.go:60       creating object {"type": "*v1.Service", "name": "svc-sdb-cluster-ddl"}
2022-10-31T11:38:09.900Z        INFO    controller/configmaps_secrets.go:94     reconciliation cause: statefulset       {"namespace": "ingress-nginx", "clusterName": "sdb-cluster", "serviceName": "svc-sdb-cluster", "namespace": "ingress-nginx"}
2022-10-31T11:38:09.947Z        INFO    controller/configmaps_secrets.go:94     reconciliation cause: statefulset       {"namespace": "ingress-nginx", "clusterName": "sdb-cluster", "serviceName": "svc-sdb-cluster-ddl", "namespace": "ingress-nginx"}
2022-10-31T11:38:09.947Z        INFO    controller/controller.go:300    Transition to phase pending on missing phase value
2022-10-31T11:38:09.947Z        INFO    controller/controller.go:321    Updating operator version       {"previous version": "", "new version": "3.0.60"}
2022-10-31T11:38:09.947Z        INFO    controller/controller.go:328    Updating observed generation    {"previous value": 0, "new value": 1}
2022-10-31T11:38:10.055Z        ERROR   controller/errors.go:95 Reconciler error        {"will retry after": "1s", "error": "failed to get service endpoint (svc-sdb-cluster-ddl): no ingress endpoint found"}
2022-10-31T11:38:10.055Z        INFO    controller/controller.go:114    Reconciling MemSQL Cluster.     {"Request.Namespace": "ingress-nginx", "Request.Name": "sdb-cluster"}
2022-10-31T11:38:10.055Z        INFO    memsql/metrics.go:58    can't find operator deployment, trying uncached client  {"key": "ingress-nginx/operator-sdb-cluster"}
2022-10-31T11:38:10.056Z        INFO    controller/configmaps_secrets.go:55     skipping reconcile request because cluster spec has not changed
2022-10-31T11:38:10.056Z        INFO    controller/configmaps_secrets.go:55     skipping reconcile request because cluster spec has not changed
2022-10-31T11:38:10.096Z        ERROR   memsql/metrics.go:61    can't find operator deployment, metrics service will not be created     {"error": "deployments.apps \"operator-sdb-cluster\" not found"}

ingress.yaml:

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: hello-world-ing
  annotations:
    kubernetes.io/ingress.class: "nginx"
spec:
  tls:
  - secretName: tls-secret
  rules:
  - http:
      paths:
        - path: /
          pathType: Prefix
          backend:
            service:
              name: svc-sdb-cluster-ddl
              port:
                number: 3306

Logs:

[root@learning-1 ss_kubernetese]# kubectl get all --all-namespaces
NAMESPACE       NAME                                            READY   STATUS      RESTARTS         AGE
ingress-nginx   pod/ingress-nginx-controller-69fbbf9f9c-n8zkf   1/1     Running     0                19m
ingress-nginx   pod/sdb-operator-84bf4b74dd-b7h2k               1/1     Running     0                2m53s
NAMESPACE       NAME                                         TYPE           CLUSTER-IP       EXTERNAL-IP     PORT(S)                      AGE
default         service/kubernetes                           ClusterIP      10.96.0.1        <none>          443/TCP                      3d1h
ingress-nginx   service/ingress-nginx-controller             LoadBalancer   10.98.172.132    172.31.71.xxx  80:30709/TCP,443:32160/TCP   19m
ingress-nginx   service/ingress-nginx-controller-admission   ClusterIP      10.105.227.128   <none>          443/TCP                      19m
ingress-nginx   service/svc-sdb-cluster                      ClusterIP      None             <none>          3306/TCP                     2m44s
ingress-nginx   service/svc-sdb-cluster-ddl                  LoadBalancer   10.96.35.64      <pending>       3306:31037/TCP               2m44s
kube-system     service/kube-dns                             ClusterIP      10.96.0.10       <none>          53/UDP,53/TCP,9153/TCP       3d1h

NAMESPACE      NAME                             DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR            AGE
kube-flannel   daemonset.apps/kube-flannel-ds   2         2         2       2            2           <none>                   3d1h
kube-system    daemonset.apps/kube-proxy        2         2         2       2            2           kubernetes.io/os=linux   3d1h

NAMESPACE       NAME                                       READY   UP-TO-DATE   AVAILABLE   AGE
ingress-nginx   deployment.apps/ingress-nginx-controller   1/1     1            1           19m
ingress-nginx   deployment.apps/sdb-operator               1/1     1            1           2m53s
kube-system     deployment.apps/coredns                    2/2     2            2           3d1h

NAMESPACE       NAME                                                  DESIRED   CURRENT   READY   AGE
ingress-nginx   replicaset.apps/ingress-nginx-controller-69fbbf9f9c   1         1         1       19m
ingress-nginx   replicaset.apps/sdb-operator-84bf4b74dd               1         1         1       2m53s
kube-system     replicaset.apps/coredns-64897985d                     2         2         2       3d1h

NAMESPACE       NAME                                       COMPLETIONS   DURATION   AGE
ingress-nginx   job.batch/ingress-nginx-admission-create   1/1           5s         19m
ingress-nginx   job.batch/ingress-nginx-admission-patch    1/1           6s         19m

[root@learning-1 ss_kubernetese]# kubectl get ingress -n ingress-nginx
NAME      CLASS    HOSTS   ADDRESS         PORTS     AGE
ingress   <none>   *       172.31.71.xxx   80, 443   10m

Sorry you are still hitting issues,
In your first case using the ClusterIp, the statefulsets have been created but the pods aren’t spinning up and are in the pending state. Could you do a kubectl describe pod pod/node-sdb-cluster-leaf-ag1-0 that should provide information on why they are stuck in the pending state

In the second case it looks like service/svc-sdb-cluster-ddl is stuck in the <pending> state. Could you confirm the ip is not actually populated in the service by running kubectl get service service/svc-sdb-cluster-ddl -o json and looking at the ip in the status?

Hi @bremy ,
PFA logs.

[root@learning-1 ss_kubernetese]# kubectl describe pod node-sdb-cluster-leaf-ag1-0
Name:           node-sdb-cluster-leaf-ag1-0
Namespace:      default
Priority:       0
Node:           <none>
Labels:         app.kubernetes.io/component=leaf
                app.kubernetes.io/instance=sdb-cluster
                app.kubernetes.io/name=memsql-cluster
                controller-revision-hash=node-sdb-cluster-leaf-ag1-75cc4b4859
                memsql.com/availability-group=1
                memsql.com/role-tier=leaf
                memsql.com/workspace=singlestore-central
                optional=label
                statefulset.kubernetes.io/pod-name=node-sdb-cluster-leaf-ag1-0
Annotations:    hash.configmap.memsql.com/node-sdb-cluster-leaf-ag1: f4cfbfdb3a04575fbd206cca24efe0875a308e76f87255f77ca90e4748b33e6a
                hash.secret.memsql.com/sdb-cluster: 3acdd3d2f8d36bcf98873780e4cdcf8ca8433cce72cdf49aed69a53616ff1278
                optional: annotation
                prometheus.io/port: 91xx
                prometheus.io/scrape: true
Status:         Pending
IP:
IPs:            <none>
Controlled By:  StatefulSet/node-sdb-cluster-leaf-ag1
Containers:
  node:
    Image:      singlestore/node:alma-7.8.17-69cee1f1a3
    Port:       <none>
    Host Port:  <none>
    Limits:
      cpu:     4
      memory:  16Gi
    Requests:
      cpu:      4
      memory:   16Gi
    Liveness:   exec [/etc/memsql/scripts/liveness-probe] delay=10s timeout=10s period=10s #success=1 #failure=3
    Readiness:  exec [/etc/memsql/scripts/readiness-probe] delay=10s timeout=10s period=10s #success=1 #failure=3
    Startup:    exec [/etc/memsql/scripts/startup-probe] delay=0s timeout=120s period=5s #success=1 #failure=2147483647
    Environment:
      RELEASE_ID:
      ROOT_PASSWORD:     <set to the key 'ROOT_PASSWORD' in secret 'sdb-cluster'>  Optional: false
      PRE_START_SCRIPT:  /etc/memsql/scripts/update-config-script
      MALLOC_ARENA_MAX:  4
    Mounts:
      /etc/memsql/extra from additional-files (rw)
      /etc/memsql/extra-secret from additional-secrets (rw)
      /etc/memsql/scripts from scripts (rw)
      /etc/memsql/share from global-additional-files (rw)
      /var/lib/memsql from pv-storage (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-2rc7h (ro)
  exporter:
    Image:      singlestore/node:alma-7.8.17-69cee1f1a3
    Port:       9104/TCP
    Host Port:  0/TCP
    Command:
      /etc/memsql/scripts/exporter-startup-script
    Limits:
      cpu:     100m
      memory:  100Mi
    Requests:
      cpu:     100m
      memory:  90Mi
    Environment:
      RELEASE_ID:
      DATA_SOURCE_NAME:  <set to the key 'DATA_SOURCE_NAME' in secret 'sdb-cluster'>  Optional: false
    Mounts:
      /etc/memsql/extra from additional-files (rw)
      /etc/memsql/extra-secret from additional-secrets (rw)
      /etc/memsql/scripts from scripts (rw)
      /etc/memsql/share from global-additional-files (rw)
      /var/lib/memsql from pv-storage (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-2rc7h (ro)
Conditions:
  Type           Status
  PodScheduled   False
Volumes:
  pv-storage:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  pv-storage-node-sdb-cluster-leaf-ag1-0
    ReadOnly:   false
  scripts:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      node-sdb-cluster-leaf-ag1
    Optional:  false
  additional-files:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      sdb-cluster-additional-files
    Optional:  true
  additional-secrets:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  sdb-cluster-additional-secrets
    Optional:    true
  global-additional-files:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      global-additional-files
    Optional:  true
  kube-api-access-2rc7h:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   Burstable
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason            Age                 From               Message
  ----     ------            ----                ----               -------
  Warning  FailedScheduling  53s (x24 over 23m)  default-scheduler  0/2 nodes are available: 2 pod has unbound immediate PersistentVolumeClaims.
[root@learning-1 ss_kubernetese]# kubectl get service svc-sdb-cluster-ddl -o json -n ingress-nginx
{
    "apiVersion": "v1",
    "kind": "Service",
    "metadata": {
        "annotations": {
            "custom": "annotations",
            "service.beta.kubernetes.io/aws-load-balancer-connection-idle-timeout": "4000",
            "singlestore.com/recreate-annotations-count": "0"
        },
        "creationTimestamp": "2022-10-31T11:38:09Z",
        "labels": {
            "app.kubernetes.io/component": "master",
            "app.kubernetes.io/instance": "sdb-cluster",
            "app.kubernetes.io/name": "memsql-cluster",
            "custom": "label"
        },
        "name": "svc-sdb-cluster-ddl",
        "namespace": "ingress-nginx",
        "ownerReferences": [
            {
                "apiVersion": "memsql.com/v1alpha1",
                "controller": true,
                "kind": "MemsqlCluster",
                "name": "sdb-cluster",
                "uid": "79e760b9-fd6d-4681-b9c4-8ca66c9b27fd"
            }
        ],
        "resourceVersion": "66349",
        "uid": "7264bcfd-7528-4cc8-8439-1aecb7eb5460"
    },
    "spec": {
        "allocateLoadBalancerNodePorts": true,
        "clusterIP": "10.96.35.64",
        "clusterIPs": [
            "10.96.35.64"
        ],
        "externalTrafficPolicy": "Cluster",
        "internalTrafficPolicy": "Cluster",
        "ipFamilies": [
            "IPv4"
        ],
        "ipFamilyPolicy": "SingleStack",
        "ports": [
            {
                "name": "memsql",
                "nodePort": 31037,
                "port": 3306,
                "protocol": "TCP",
                "targetPort": 3306
            }
        ],
        "selector": {
            "app.kubernetes.io/component": "master",
            "app.kubernetes.io/instance": "sdb-cluster",
            "app.kubernetes.io/name": "memsql-cluster"
        },
        "sessionAffinity": "None",
        "type": "LoadBalancer"
    },
    "status": {
        "loadBalancer": {}
    }
}

Want to re-orient this from the beginning to help provide some insights that may help you resolve these issues.

This error: Reconciler error : failed to get service endpoint (svc-sdb-cluster-ddl): no ingress endpoint found

…is referring to ingress into the service itself through and endpoint and not an ingress resource. It’s easy to get these two confused as they are both referred to as “ingress”, but really it’s just saying it needs an input endpoint for the service. Currently there is no configured endpoint to the service (outside of the internal service IP):

svc-sdb-cluster-ddl LoadBalancer 10.102.231.87 <pending> 3306:30052/TCP 125m

See where it says “Pending”? That should be the external IP which is given to the resource via an external load balancer (because the Service is of the load balancer type).

As an example here I have a test Kube cluster running with our operator up, and when I check for ingress resources there is none:

> kubectl get po -A
NAMESPACE     NAME                                        READY   STATUS      RESTARTS   AGE
default       node-memsql-cluster-aggregator-0            2/2     Running     0          8m42s
default       node-memsql-cluster-leaf-ag1-0              2/2     Running     0          8m42s
default       node-memsql-cluster-leaf-ag1-1              2/2     Running     0          8m42s
default       node-memsql-cluster-master-0                2/2     Running     0          8m42s
> kubectl get ing -A
No resources found

It looks as though you installed the ingress-nginx controller which will provide a controller for ingress resources, but we’re actually not looking for an ingress controller or resource. We’re looking for ingress into the service through an endpoint.

Did you manually change the namespaces to ingress-nginx? I would not recommend doing that for any of the resources associated with the cluster (they should be in their own namespace, or just in the default/standard one). Typically when using ingress resources they don’t need to be located in the same namespace as the controller but rather just refer to the controller (in this case through the ClassName). All in all I would clear out the ingress controller and resources-- those can be used for pointing to multiple services or to provide a DNS endpoint for multiple services but that is not needed to just get the cluster up and running (and can always be added later).

It appears this cluster likely does not have an external load balancer, or that it is not configured properly. Usually these are provided by cloud providers or on-prem/on-metal with a solution like MetalLB. If it did (or it was configured) then when the Operator spawns the DDL endpoint it would assign an external IP to the service allowing it to be reached.

It’s likely because of these reconciliation errors we were unable to spin anything up even after changing to ClusterIP. Kubernetes can be finnicky with it’s controllers and operators if the initial deployment presents problems it is trying to fix (but has a fundamental misunderstanding of due to the initial config settings). It appears before we made the switch to ClusterIP the Pods were not spawned at all, but afterwards they were spawned but would not acquire a Ready 2/2 state.

Before:

kubectl get pods
NAME                          READY   STATUS    RESTARTS   AGE
sdb-operator-9565d987-5rg8l   1/1     Running   0          96m
sdb-operator-9565d987-hjt5j   1/1     Running   0          96m

After:

NAME                                READY   STATUS    RESTARTS   AGE
pod/node-sdb-cluster-leaf-ag1-0     0/2     Pending   0          34m
pod/node-sdb-cluster-master-0       0/2     Pending   0          34m
pod/sdb-operator-84bf4b74dd-6jv9z   1/1     Running   0          34m

To see more of why we can run a describe on the pods with kubectl describe pod [POD NAME]. Here you should see in the events why the containers are failing to start.

Here are a few suggestions I would make:

  • If you can clear out this current cluster so you can start over. Get the Kubernetes cluster set up completely with its CNI included and running (coredns is running when you run kubectl get po -A). Do not apply the SingleStore Operator manifests yet. We want the Kube cluster ready for SingleStore, but not have it installed yet.
  • From there if you want to use a LoadBalancer service then you will need to install or configure an external load balancer. More on this here: Service | Kubernetes

On cloud providers which support external load balancers, setting the type field to LoadBalancer provisions a load balancer for your Service. The actual creation of the load balancer happens asynchronously, and information about the provisioned balancer is published in the Service’s .status.loadBalancer field.

Example from their docs on the ingress we’re looking for in the service (this is just under the above text):

...
status:
  loadBalancer:
    ingress:
    - ip: 192.0.2.127
  • You can change the service to ClusterIP in the sdb-cluster.yaml manifest and use that from the get go to get beyond this (without an external load balancer). However a LoadBalancer service will help with configuring cross-cluster replication if you plan to use cross-cluster DR later. A Cluster IP will not have a default external IP and will need to be configured with one manually if you so choose, but this change alone will allow the cluster to spin up (but reaching it from a remote host will require more manual configuration). At that point you could (as an example) point an ingress resource at the service as well to provide ingress through that but again it’s not required or necessary unless you determine that for your environment,

  • If the Pods are generated but do not go into a Ready 2/2 state then it is likely some other issue where this is manifesting from. Run kubectl describe pods [POD NAME] to check the events and see if you can determine why it’s failing to start the containers and bring them online.

All in all Kubernetes deployments require a significant understanding of the environment and Kubernetes itself. We typically advise customers use our cloud offering, or if they choose to utilize the Kubernetes operator that they have an extensive understanding of their environment. There are so many disparate CNIs, Load Balancers, and general configurations that can be given to a Kubernetes cluster set up that at a certain point we can only provide so much troubleshooting (and a local Kube sys admin has to take a look). Bringing the cluster online is one thing, but being able to troubleshoot it once it is online is another question entirely.