Troubleshoot Installation#

Check Deployment Logs#

If you run into an error during deployment (i.e., installer.sh ... -c install), you can safely retry the (i.e., installer.sh ... -c install) as the operation is idempotent.

installer.sh also prints out a message similar to the following, but with the specific deployment that errored:

You can probably find out more by running: kubectl -n <namespace> logs <deployment>

Check Status of your ESS Services#

After you have deployed ESS into your Kubernetes cluster, you can check the status of the various ESS services:

kubectl get all -n ess

The operation returns the various ESS services and their status:

NAME                                             READY   STATUS              RESTARTS   AGE
pod/ess-event-entity-operator-777894bd8b-8trdm   3/3     Running             0          6m41s
pod/ess-event-kafka-0                            2/2     Running             0          7m26s
pod/ess-event-zookeeper-0                        1/1     Running             0          8m16s
pod/ess-identity-5bccbb4dc5-trtx2                0/1     PodInitializing     0          4m33s
pod/ess-jwks-865bbf9d-pzk8k                      0/1     PodInitializing     0          4m33s
pod/ess-ldp-8d86f99c7-7lbwv                      0/1     PodInitializing     0          4m33s
pod/grafana-86fd4d596-s4bd6                      0/1     PodInitializing     0          4m33s
pod/postgres-6698d9f79b-zdzmk                    1/1     Running             0          4m34s
pod/postgres-metrics-6b6858f64c-tc9ll            1/1     Running             0          4m33s
pod/prometheus-7c99bb7467-fz2vl                  0/1     PodInitializing     0          4m31s
pod/proxy-deployment-65cdd8df8d-g66h8            0/1     ContainerCreating   0          4m33s
pod/strimzi-cluster-operator-7d6cd6bdf7-f5gxg    1/1     Running             0          9m

NAME                                 TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)                      AGE
service/ess-event-kafka-bootstrap    ClusterIP   10.104.219.3     <none>        9091/TCP,9092/TCP,9093/TCP   7m27s
service/ess-event-kafka-brokers      ClusterIP   None             <none>        9091/TCP,9092/TCP,9093/TCP   7m27s
service/ess-event-zookeeper-client   ClusterIP   10.102.144.249   <none>        2181/TCP                     8m17s
service/ess-event-zookeeper-nodes    ClusterIP   None             <none>        2181/TCP,2888/TCP,3888/TCP   8m17s
service/ess-identity                 ClusterIP   10.103.178.23    <none>        10000/TCP                    4m34s
service/ess-jwks                     ClusterIP   10.109.171.117   <none>        10200/TCP                    4m34s
service/ess-ldp                      ClusterIP   10.106.68.230    <none>        10100/TCP                    4m34s
service/grafana                      NodePort    10.107.50.19     <none>        3000:31197/TCP               4m33s
service/postgres                     ClusterIP   None             <none>        5432/TCP                     4m34s
service/postgres-metrics             ClusterIP   10.96.194.157    <none>        9187/TCP                     4m33s
service/prometheus                   ClusterIP   10.98.200.169    <none>        9090/TCP                     4m32s
service/proxy                        NodePort    10.97.196.97     <none>        443:31885/TCP                4m34s

NAME                                        READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/ess-event-entity-operator   1/1     1            1           6m41s
deployment.apps/ess-identity                0/1     1            0           4m33s
deployment.apps/ess-jwks                    0/1     1            0           4m33s
deployment.apps/ess-ldp                     0/1     1            0           4m33s
deployment.apps/grafana                     0/1     1            0           4m33s
deployment.apps/postgres                    1/1     1            1           4m34s
deployment.apps/postgres-metrics            1/1     1            1           4m33s
deployment.apps/prometheus                  0/1     1            0           4m32s
deployment.apps/proxy-deployment            0/1     1            0           4m33s
deployment.apps/strimzi-cluster-operator    1/1     1            1           9m

NAME                                                   DESIRED   CURRENT   READY   AGE
replicaset.apps/ess-event-entity-operator-777894bd8b   1         1         1       6m41s
replicaset.apps/ess-identity-5bccbb4dc5                1         1         0       4m33s
replicaset.apps/ess-jwks-865bbf9d                      1         1         0       4m33s
replicaset.apps/ess-ldp-8d86f99c7                      1         1         0       4m33s
replicaset.apps/grafana-86fd4d596                      1         1         0       4m33s
replicaset.apps/postgres-6698d9f79b                    1         1         1       4m34s
replicaset.apps/postgres-metrics-6b6858f64c            1         1         1       4m33s
replicaset.apps/prometheus-7c99bb7467                  1         1         0       4m32s
replicaset.apps/proxy-deployment-65cdd8df8d            1         1         0       4m33s
replicaset.apps/strimzi-cluster-operator-7d6cd6bdf7    1         1         1       9m

NAME                                   READY   AGE
statefulset.apps/ess-event-kafka       1/1     7m26s
statefulset.apps/ess-event-zookeeper   1/1     8m16s

Debug a Service in Pending State#

When a service is in Pending instead of Running status, you can investigate by issuing the kubectl describe command:

kubectl describe <resource>/<resource name>

For example, consider the following pod statuses:

NAME                                             READY   STATUS    RESTARTS   AGE
pod/ess-event-entity-operator-777894bd8b-8trdm   3/3     Running   0          10m
pod/ess-event-kafka-0                            2/2     Running   0          10m
pod/ess-event-zookeeper-0                        1/1     Running   0          11m
pod/ess-identity-5bccbb4dc5-trtx2                1/1     Running   0          8m6s
pod/ess-jwks-865bbf9d-pzk8k                      1/1     Running   0          8m6s
pod/ess-ldp-8d86f99c7-7lbwv                      1/1     Running   0          8m6s
pod/grafana-86fd4d596-s4bd6                      1/1     Running   0          8m6s
pod/postgres-6698d9f79b-zdzmk                    0/1     Pending   0          8m7s
pod/postgres-metrics-6b6858f64c-tc9ll            1/1     Running   0          8m6s
pod/prometheus-7c99bb7467-fz2vl                  1/1     Running   0          8m4s
pod/proxy-deployment-65cdd8df8d-g66h8            1/1     Running   0          8m6s
pod/strimzi-cluster-operator-7d6cd6bdf7-f5gxg    1/1     Running   0          12m

The pod/postgres-6698d9f79b-zdzmk is in Pending status. To investigate, issue the kubectl describe command, where <resource> is pod and <resource name> is postgres-6698d9f79b-zdzmk:

kubectl describe pod/postgres-6698d9f79b-zdzmk

In the output, go to the Events section:

Name:           postgres-6698d9f79b-zdzmk
Namespace:      default
Priority:       0
Node:           <none>
Labels:         app=postgres
                pod-template-hash=6698d9f79b
Annotations:    <none>
Status:         Pending
IP:
IPs:            <none>
Controlled By:  ReplicaSet/postgres-6698d9f79b
Containers:
  postgres:
    Image:      postgres:12
    Port:       5432/TCP
    Host Port:  0/TCP
    Environment:
      POSTGRES_DB:        <set to the key 'ess.jdbc.db' in secret 'inrupt-ess-secret'>        Optional: false
      POSTGRES_USER:      <set to the key 'ess.jdbc.username' in secret 'inrupt-ess-secret'>  Optional: false
      POSTGRES_PASSWORD:  <set to the key 'ess.jdbc.password' in secret 'inrupt-ess-secret'>  Optional: false
    Mounts:
      /var/lib/postgresql/data from postgresdb-storage (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-g4h6k (ro)
Conditions:
  Type           Status
  PodScheduled   False
Volumes:
  postgresdb-storage:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  postgres-persistent-storage
    ReadOnly:   false
  default-token-g4h6k:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-g4h6k
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason            Age                    From               Message
  ----     ------            ----                   ----               -------
  Warning  FailedScheduling  8m20s (x2 over 8m20s)  default-scheduler  persistentvolumeclaim "postgres-persistent-storage" not found
  Warning  FailedScheduli

The Events section lists the reason why the service did not start; namely, persistentvolumeclaim "postgres-persistent-storage" not found.

Alternatively, you can also access the Events through the kubectl get events command:

kubectl get events --sort-by=.metadata.creationTimestamp --field-selector involvedObject.name=postgres-6698d9f79b-zdzmk

The output also includes the persistentvolumeclaim "postgres-persistent-storage" not found message:

LAST SEEN   TYPE      REASON              OBJECT                                              MESSAGE
2m8s        Warning   FailedScheduling    pod/postgres-6698d9f79b-zdzmk                       persistentvolumeclaim "postgres-persistent-storage" not found
60s         Warning   FailedScheduling    pod/postgres-6698d9f79b-zdzmk                       running "VolumeBinding" filter plugin for pod "postgres-6698d9f79b-l877j": pod has unbound immediate PersistentVolumeClaims
...

To investigate:

  1. Check the current Persistent Volume Claims:

    kubectl get pvc
    

    The output shows the PV Claim for postgres-persistent-storage is Pending.

    NAME                            STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
    identity-persistent-storage     Bound    pvc-97927457-91c1-4f0e-93db-0cab3fd5eb72   20Gi       RWX            standard       22m
    postgres-persistent-storage     Pending                                                                                      11m
    prometheus-persistent-storage   Bound    pvc-3b19949c-fcf6-4bd9-a38a-0b76d2b7d990   10Gi       RWX            standard       22m
    
  2. Check the current Persistent Volumes:

    kubectl get pv
    

    postgres-persistent-storage does not appear in the output:

    NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM                               STORAGECLASS   REASON   AGE
    pvc-3b19949c-fcf6-4bd9-a38a-0b76d2b7d990   10Gi       RWX            Delete           Bound    ess/prometheus-persistent-storage   standard                24m
    pvc-97927457-91c1-4f0e-93db-0cab3fd5eb72   20Gi       RWX            Delete           Bound    ess/identity-persistent-storage     standard                24m
    

To fix, try to recreate the Volume:

kubectl apply -f 04_storage/

Check also that you have enough disk space available that can be allocated.

For more information on Persistent Volumes (PV) issue, see Kubernetes Persistent Volumes documentation.