Troubleshoot Installation#
Check Deployment Logs#
If you run into an error during deployment (i.e., installer.sh
... -c install
), you can safely retry the (i.e., installer.sh
... -c install
) as the operation is idempotent.
installer.sh
also prints out a message similar to the
following, but with the specific deployment that errored:
You can probably find out more by running: kubectl -n <namespace> logs <deployment>
Check Status of Your ESS Services#
You can check the status of the various ESS services:
kubectl get all -n ess
The operation returns the various ESS services and their status (the content has been abbreviated):
NAME READY STATUS RESTARTS AGE
pod/audit-elasticsearch-554f8584f8-8pcbb 1/1 Running 0 15m
pod/audit-elasticsearch-metrics-678bcd857c-nxj65 1/1 Running 0 15m
pod/ess-access-7d8977db8b-n7w59 1/1 Running 0 15m
pod/ess-authz-59d7549fb4-t5ssx 1/1 Running 0 15m
pod/ess-event-entity-operator-86b4f4b6fb-nwfqx 3/3 Running 0 5m40s
pod/ess-event-kafka-0 1/1 Running 1 6m5s
pod/ess-event-zookeeper-0 1/1 Running 1 13m
pod/ess-identity-static-7c7d468cdb-f9smb 1/1 Running 0 15m
pod/ess-index-6ddf787ff5-46pqn 1/1 Running 0 15m
pod/ess-index-builder-6f795f8cff-b9s7l 1/1 Running 0 15m
pod/ess-ldp-f9bcf9b5-xcvws 1/1 Running 0 15m
pod/ess-notification-cd8c49d88-qr798 1/1 Running 0 15m
pod/ess-signup-static-86bd6d687c-jtjth 1/1 Running 0 15m
pod/ess-websocket-6fd9d4897-2sclp 1/1 Running 0 15m
pod/fluentd-auditor-79cf74df8b-rrwkh 1/1 Running 0 15m
pod/grafana-dc484589f-nh6tb 1/1 Running 0 15m
pod/index-elasticsearch-85b859f7c4-xjd22 1/1 Running 0 15m
pod/index-elasticsearch-metrics-f8ff5ddb6-hvvfc 1/1 Running 0 15m
pod/mongo-5964cd47bc-t8r6b 1/1 Running 0 15m
pod/openid-mongodb-7df5785b8-nqq5h 1/1 Running 0 15m
pod/postgres-77cb8cb455-c4mmr 1/1 Running 0 15m
pod/postgres-metrics-6f68c4b98d-mf5xq 1/1 Running 0 15m
pod/prometheus-868989df99-hc2vf 1/1 Running 0 15m
pod/strimzi-cluster-operator-d4b769796-kzzww 1/1 Running 0 15m
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/audit-elasticsearch ClusterIP 10.106.210.10 <none> 9200/TCP 15m
service/audit-elasticsearch-metrics ClusterIP 10.99.137.97 <none> 9114/TCP 15m
service/ess-access ClusterIP 10.96.54.107 <none> 443/TCP 15m
service/ess-authz ClusterIP 10.105.64.124 <none> 443/TCP,9000/TCP 15m
service/ess-event-kafka-bootstrap ClusterIP 10.110.122.64 <none> 9091/TCP,9092/TCP,9093/TCP 6m6s
...
Debug a Service#
When a service is not in Running
status, you can investigate by
issuing the kubectl describe
command:
kubectl describe -n ess <resource>/<resource name>
For example, consider the following pod
statuses (the status output has
been abbreviated):
NAME READY STATUS RESTARTS AGE
...
pod/mongo-5964cd47bc-t8r6b 1/1 Running 0 8m6s
pod/openid-mongodb-7df74d79b8-7nljx 0/1 Running 0 8m6s
pod/postgres-77cb8cb455-c4mmr 0/1 Pending 0 8m7s
pod/postgres-metrics-6f68c4b98d-mf5xq 1/1 Running 0 8m6s
pod/prometheus-868989df99-hc2vf 1/1 Running 0 8m6s
pod/strimzi-cluster-operator-d4b769796-kzzww 1/1 Running 0 8m6s
...
The pod/postgres-77cb8cb455-c4mmr
is in Pending status. To
investigate, issue the kubectl describe
command, where
<resource>
is pod
and <resource name>
is
postgres-77cb8cb455-c4mmr
:
kubectl describe -n ess pod/postgres-77cb8cb455-c4mmr
In the output, go to the Events
section (the output has been
abbreviated):
Name: postgres-77cb8cb455-c4mmr
Namespace: ess
Priority: 0
...
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 8m20s (x2 over 8m20s) default-scheduler persistentvolumeclaim "postgres-persistent-storage" not found
Warning FailedScheduli
The Events
section lists the reason why the service did not start;
namely, persistentvolumeclaim "postgres-persistent-storage" not
found
.
Alternatively, you can also access the Events through the kubectl get
events
command:
kubectl -n ess get events --sort-by=.metadata.creationTimestamp --field-selector involvedObject.name=postgres-77cb8cb455-c4mmr
The output also includes the persistentvolumeclaim
"postgres-persistent-storage" not found
message:
LAST SEEN TYPE REASON OBJECT MESSAGE
2m8s Warning FailedScheduling pod/postgres-77cb8cb455-c4mmr persistentvolumeclaim "postgres-persistent-storage" not found
60s Warning FailedScheduling pod/postgres-77cb8cb455-c4mmr running "VolumeBinding" filter plugin for pod "postgres-77cb8cb455-c4mmr": pod has unbound immediate PersistentVolumeClaims
...
To investigate:
Check the current Persistent Volume Claims:
kubectl -n ess get pvc
The output shows the PV Claim for
postgres-persistent-storage
is Pending.NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE mongo-persistent-storage Bound pvc-1982526a-0ecd-4dd9-a214-326a3eb01578 10Gi RWO standard 41m postgres-persistent-storage Pending 41m prometheus-persistent-storage Bound pvc-bf6213b1-eab0-466f-a1fb-0eac97fac403 10Gi RWO standard 41m
Check the current Persistent Volumes:
kubectl -n ess get pv
postgres-persistent-storage
does not appear in the output:NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE pvc-1982526a-0ecd-4dd9-a214-326a3eb01578 10Gi RWO Delete Bound ess/mongo-persistent-storage standard 43m pvc-bf6213b1-eab0-466f-a1fb-0eac97fac403 10Gi RWO Delete Bound ess/prometheus-persistent-storage standard 43m
To fix, ensure that you have sufficient disk space available.
For more information on Persistent Volumes (PV) issue, see Kubernetes Persistent Volumes documentation.