Troubleshoot Installation#
Check Deployment Logs#
If you run into an error during deployment (i.e., installer.sh
... -c install
), you can safely retry the (i.e., installer.sh
... -c install
) as the operation is idempotent.
installer.sh
also prints out a message similar to the
following, but with the specific deployment that errored:
You can probably find out more by running: kubectl -n <namespace> logs <deployment>
Check Status of your ESS Services#
After you have deployed ESS into your Kubernetes cluster, you can check the status of the various ESS services:
kubectl get all -n ess
The operation returns the various ESS services and their status:
NAME READY STATUS RESTARTS AGE
pod/ess-event-entity-operator-777894bd8b-8trdm 3/3 Running 0 6m41s
pod/ess-event-kafka-0 2/2 Running 0 7m26s
pod/ess-event-zookeeper-0 1/1 Running 0 8m16s
pod/ess-identity-5bccbb4dc5-trtx2 0/1 PodInitializing 0 4m33s
pod/ess-jwks-865bbf9d-pzk8k 0/1 PodInitializing 0 4m33s
pod/ess-ldp-8d86f99c7-7lbwv 0/1 PodInitializing 0 4m33s
pod/grafana-86fd4d596-s4bd6 0/1 PodInitializing 0 4m33s
pod/postgres-6698d9f79b-zdzmk 1/1 Running 0 4m34s
pod/postgres-metrics-6b6858f64c-tc9ll 1/1 Running 0 4m33s
pod/prometheus-7c99bb7467-fz2vl 0/1 PodInitializing 0 4m31s
pod/proxy-deployment-65cdd8df8d-g66h8 0/1 ContainerCreating 0 4m33s
pod/strimzi-cluster-operator-7d6cd6bdf7-f5gxg 1/1 Running 0 9m
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/ess-event-kafka-bootstrap ClusterIP 10.104.219.3 <none> 9091/TCP,9092/TCP,9093/TCP 7m27s
service/ess-event-kafka-brokers ClusterIP None <none> 9091/TCP,9092/TCP,9093/TCP 7m27s
service/ess-event-zookeeper-client ClusterIP 10.102.144.249 <none> 2181/TCP 8m17s
service/ess-event-zookeeper-nodes ClusterIP None <none> 2181/TCP,2888/TCP,3888/TCP 8m17s
service/ess-identity ClusterIP 10.103.178.23 <none> 10000/TCP 4m34s
service/ess-jwks ClusterIP 10.109.171.117 <none> 10200/TCP 4m34s
service/ess-ldp ClusterIP 10.106.68.230 <none> 10100/TCP 4m34s
service/grafana NodePort 10.107.50.19 <none> 3000:31197/TCP 4m33s
service/postgres ClusterIP None <none> 5432/TCP 4m34s
service/postgres-metrics ClusterIP 10.96.194.157 <none> 9187/TCP 4m33s
service/prometheus ClusterIP 10.98.200.169 <none> 9090/TCP 4m32s
service/proxy NodePort 10.97.196.97 <none> 443:31885/TCP 4m34s
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/ess-event-entity-operator 1/1 1 1 6m41s
deployment.apps/ess-identity 0/1 1 0 4m33s
deployment.apps/ess-jwks 0/1 1 0 4m33s
deployment.apps/ess-ldp 0/1 1 0 4m33s
deployment.apps/grafana 0/1 1 0 4m33s
deployment.apps/postgres 1/1 1 1 4m34s
deployment.apps/postgres-metrics 1/1 1 1 4m33s
deployment.apps/prometheus 0/1 1 0 4m32s
deployment.apps/proxy-deployment 0/1 1 0 4m33s
deployment.apps/strimzi-cluster-operator 1/1 1 1 9m
NAME DESIRED CURRENT READY AGE
replicaset.apps/ess-event-entity-operator-777894bd8b 1 1 1 6m41s
replicaset.apps/ess-identity-5bccbb4dc5 1 1 0 4m33s
replicaset.apps/ess-jwks-865bbf9d 1 1 0 4m33s
replicaset.apps/ess-ldp-8d86f99c7 1 1 0 4m33s
replicaset.apps/grafana-86fd4d596 1 1 0 4m33s
replicaset.apps/postgres-6698d9f79b 1 1 1 4m34s
replicaset.apps/postgres-metrics-6b6858f64c 1 1 1 4m33s
replicaset.apps/prometheus-7c99bb7467 1 1 0 4m32s
replicaset.apps/proxy-deployment-65cdd8df8d 1 1 0 4m33s
replicaset.apps/strimzi-cluster-operator-7d6cd6bdf7 1 1 1 9m
NAME READY AGE
statefulset.apps/ess-event-kafka 1/1 7m26s
statefulset.apps/ess-event-zookeeper 1/1 8m16s
Debug a Service in Pending State#
When a service is in Pending
instead of Running
status,
you can investigate by issuing the kubectl describe
command:
kubectl describe <resource>/<resource name>
For example, consider the following pod statuses:
NAME READY STATUS RESTARTS AGE
pod/ess-event-entity-operator-777894bd8b-8trdm 3/3 Running 0 10m
pod/ess-event-kafka-0 2/2 Running 0 10m
pod/ess-event-zookeeper-0 1/1 Running 0 11m
pod/ess-identity-5bccbb4dc5-trtx2 1/1 Running 0 8m6s
pod/ess-jwks-865bbf9d-pzk8k 1/1 Running 0 8m6s
pod/ess-ldp-8d86f99c7-7lbwv 1/1 Running 0 8m6s
pod/grafana-86fd4d596-s4bd6 1/1 Running 0 8m6s
pod/postgres-6698d9f79b-zdzmk 0/1 Pending 0 8m7s
pod/postgres-metrics-6b6858f64c-tc9ll 1/1 Running 0 8m6s
pod/prometheus-7c99bb7467-fz2vl 1/1 Running 0 8m4s
pod/proxy-deployment-65cdd8df8d-g66h8 1/1 Running 0 8m6s
pod/strimzi-cluster-operator-7d6cd6bdf7-f5gxg 1/1 Running 0 12m
The pod/postgres-6698d9f79b-zdzmk
is in Pending status. To
investigate, issue the kubectl describe
command, where
<resource>
is pod
and <resource name>
is
postgres-6698d9f79b-zdzmk
:
kubectl describe pod/postgres-6698d9f79b-zdzmk
In the output, go to the Events
section:
Name: postgres-6698d9f79b-zdzmk
Namespace: default
Priority: 0
Node: <none>
Labels: app=postgres
pod-template-hash=6698d9f79b
Annotations: <none>
Status: Pending
IP:
IPs: <none>
Controlled By: ReplicaSet/postgres-6698d9f79b
Containers:
postgres:
Image: postgres:12
Port: 5432/TCP
Host Port: 0/TCP
Environment:
POSTGRES_DB: <set to the key 'ess.jdbc.db' in secret 'inrupt-ess-secret'> Optional: false
POSTGRES_USER: <set to the key 'ess.jdbc.username' in secret 'inrupt-ess-secret'> Optional: false
POSTGRES_PASSWORD: <set to the key 'ess.jdbc.password' in secret 'inrupt-ess-secret'> Optional: false
Mounts:
/var/lib/postgresql/data from postgresdb-storage (rw)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-g4h6k (ro)
Conditions:
Type Status
PodScheduled False
Volumes:
postgresdb-storage:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: postgres-persistent-storage
ReadOnly: false
default-token-g4h6k:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-g4h6k
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 8m20s (x2 over 8m20s) default-scheduler persistentvolumeclaim "postgres-persistent-storage" not found
Warning FailedScheduli
The Events
section lists the reason why the service did not start;
namely, persistentvolumeclaim "postgres-persistent-storage" not
found
.
Alternatively, you can also access the Events through the kubectl get
events
command:
kubectl get events --sort-by=.metadata.creationTimestamp --field-selector involvedObject.name=postgres-6698d9f79b-zdzmk
The output also includes the persistentvolumeclaim
"postgres-persistent-storage" not found
message:
LAST SEEN TYPE REASON OBJECT MESSAGE
2m8s Warning FailedScheduling pod/postgres-6698d9f79b-zdzmk persistentvolumeclaim "postgres-persistent-storage" not found
60s Warning FailedScheduling pod/postgres-6698d9f79b-zdzmk running "VolumeBinding" filter plugin for pod "postgres-6698d9f79b-l877j": pod has unbound immediate PersistentVolumeClaims
...
To investigate:
Check the current Persistent Volume Claims:
kubectl get pvc
The output shows the PV Claim for
postgres-persistent-storage
is Pending.NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE identity-persistent-storage Bound pvc-97927457-91c1-4f0e-93db-0cab3fd5eb72 20Gi RWX standard 22m postgres-persistent-storage Pending 11m prometheus-persistent-storage Bound pvc-3b19949c-fcf6-4bd9-a38a-0b76d2b7d990 10Gi RWX standard 22m
Check the current Persistent Volumes:
kubectl get pv
postgres-persistent-storage
does not appear in the output:NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE pvc-3b19949c-fcf6-4bd9-a38a-0b76d2b7d990 10Gi RWX Delete Bound ess/prometheus-persistent-storage standard 24m pvc-97927457-91c1-4f0e-93db-0cab3fd5eb72 20Gi RWX Delete Bound ess/identity-persistent-storage standard 24m
To fix, try to recreate the Volume:
kubectl apply -f 04_storage/
Check also that you have enough disk space available that can be allocated.
For more information on Persistent Volumes (PV) issue, see Kubernetes Persistent Volumes documentation.