Troubleshoot Installation#

Check Deployment Logs#

If you run into an error during deployment (i.e., installer.sh ... -c install), you can safely retry the (i.e., installer.sh ... -c install) as the operation is idempotent.

installer.sh also prints out a message similar to the following, but with the specific deployment that errored:

You can probably find out more by running: kubectl -n <namespace> logs <deployment>

Check Status of Your ESS Services#

You can check the status of the various ESS services:

kubectl get all -n ess

The operation returns the various ESS services and their status (the content has been abbreviated):

NAME                                               READY   STATUS             RESTARTS   AGE
pod/audit-elasticsearch-554f8584f8-8pcbb           1/1     Running            0          15m
pod/audit-elasticsearch-metrics-678bcd857c-nxj65   1/1     Running            0          15m
pod/ess-access-7d8977db8b-n7w59                    1/1     Running            0          15m
pod/ess-authz-59d7549fb4-t5ssx                     1/1     Running            0          15m
pod/ess-event-entity-operator-86b4f4b6fb-nwfqx     3/3     Running            0          5m40s
pod/ess-event-kafka-0                              1/1     Running            1          6m5s
pod/ess-event-zookeeper-0                          1/1     Running            1          13m
pod/ess-identity-static-7c7d468cdb-f9smb           1/1     Running            0          15m
pod/ess-index-6ddf787ff5-46pqn                     1/1     Running            0          15m
pod/ess-index-builder-6f795f8cff-b9s7l             1/1     Running            0          15m
pod/ess-ldp-f9bcf9b5-xcvws                         1/1     Running            0          15m
pod/ess-notification-cd8c49d88-qr798               1/1     Running            0          15m
pod/ess-signup-static-86bd6d687c-jtjth             1/1     Running            0          15m
pod/ess-websocket-6fd9d4897-2sclp                  1/1     Running            0          15m
pod/fluentd-auditor-79cf74df8b-rrwkh               1/1     Running            0          15m
pod/grafana-dc484589f-nh6tb                        1/1     Running            0          15m
pod/index-elasticsearch-85b859f7c4-xjd22           1/1     Running            0          15m
pod/index-elasticsearch-metrics-f8ff5ddb6-hvvfc    1/1     Running            0          15m
pod/mongo-5964cd47bc-t8r6b                         1/1     Running            0          15m
pod/openid-mongodb-7df5785b8-nqq5h                 1/1     Running            0          15m
pod/postgres-77cb8cb455-c4mmr                      1/1     Running            0          15m
pod/postgres-metrics-6f68c4b98d-mf5xq              1/1     Running            0          15m
pod/prometheus-868989df99-hc2vf                    1/1     Running            0          15m
pod/strimzi-cluster-operator-d4b769796-kzzww       1/1     Running            0          15m

NAME                                  TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)                      AGE
service/audit-elasticsearch           ClusterIP   10.106.210.10    <none>        9200/TCP                     15m
service/audit-elasticsearch-metrics   ClusterIP   10.99.137.97     <none>        9114/TCP                     15m
service/ess-access                    ClusterIP   10.96.54.107     <none>        443/TCP                      15m
service/ess-authz                     ClusterIP   10.105.64.124    <none>        443/TCP,9000/TCP             15m
service/ess-event-kafka-bootstrap     ClusterIP   10.110.122.64    <none>        9091/TCP,9092/TCP,9093/TCP   6m6s

...

Debug a Service#

When a service is not in Running status, you can investigate by issuing the kubectl describe command:

kubectl describe -n ess <resource>/<resource name>

For example, consider the following pod statuses (the status output has been abbreviated):

NAME                                               READY   STATUS       RESTARTS   AGE
...

pod/mongo-5964cd47bc-t8r6b                         1/1     Running      0          8m6s
pod/openid-mongodb-7df74d79b8-7nljx                0/1     Running      0          8m6s
pod/postgres-77cb8cb455-c4mmr                      0/1     Pending      0          8m7s
pod/postgres-metrics-6f68c4b98d-mf5xq              1/1     Running      0          8m6s
pod/prometheus-868989df99-hc2vf                    1/1     Running      0          8m6s
pod/strimzi-cluster-operator-d4b769796-kzzww       1/1     Running      0          8m6s

...

The pod/postgres-77cb8cb455-c4mmr is in Pending status. To investigate, issue the kubectl describe command, where <resource> is pod and <resource name> is postgres-77cb8cb455-c4mmr:

kubectl describe -n ess pod/postgres-77cb8cb455-c4mmr

In the output, go to the Events section (the output has been abbreviated):

Name:         postgres-77cb8cb455-c4mmr
Namespace:    ess
Priority:     0

...

QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                 node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason            Age                    From               Message
  ----     ------            ----                   ----               -------
  Warning  FailedScheduling  8m20s (x2 over 8m20s)  default-scheduler  persistentvolumeclaim "postgres-persistent-storage" not found
  Warning  FailedScheduli

The Events section lists the reason why the service did not start; namely, persistentvolumeclaim "postgres-persistent-storage" not found.

Alternatively, you can also access the Events through the kubectl get events command:

kubectl -n ess get events --sort-by=.metadata.creationTimestamp --field-selector involvedObject.name=postgres-77cb8cb455-c4mmr

The output also includes the persistentvolumeclaim "postgres-persistent-storage" not found message:

LAST SEEN   TYPE      REASON              OBJECT                           MESSAGE
2m8s        Warning   FailedScheduling    pod/postgres-77cb8cb455-c4mmr    persistentvolumeclaim "postgres-persistent-storage" not found
60s         Warning   FailedScheduling    pod/postgres-77cb8cb455-c4mmr    running "VolumeBinding" filter plugin for pod "postgres-77cb8cb455-c4mmr": pod has unbound immediate PersistentVolumeClaims
...

To investigate:

  1. Check the current Persistent Volume Claims:

    kubectl -n ess get pvc
    

    The output shows the PV Claim for postgres-persistent-storage is Pending.

    NAME                            STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
    mongo-persistent-storage        Bound    pvc-1982526a-0ecd-4dd9-a214-326a3eb01578   10Gi       RWO            standard       41m
    postgres-persistent-storage     Pending                                                                                      41m
    prometheus-persistent-storage   Bound    pvc-bf6213b1-eab0-466f-a1fb-0eac97fac403   10Gi       RWO            standard       41m
    
  2. Check the current Persistent Volumes:

    kubectl -n ess get pv
    

    postgres-persistent-storage does not appear in the output:

    NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM                               STORAGECLASS   REASON   AGE
    pvc-1982526a-0ecd-4dd9-a214-326a3eb01578   10Gi       RWO            Delete           Bound    ess/mongo-persistent-storage        standard                43m
    pvc-bf6213b1-eab0-466f-a1fb-0eac97fac403   10Gi       RWO            Delete           Bound    ess/prometheus-persistent-storage   standard                43m
    
  3. To fix, ensure that you have sufficient disk space available.

For more information on Persistent Volumes (PV) issue, see Kubernetes Persistent Volumes documentation.