Citadel Health Checking
You can enable Citadel’s health checking feature to detect the failures of the Citadel CSR (Certificate Signing Request) service. When a failure is detected, Kubelet automatically restarts the Citadel container.
When the health checking feature is enabled, the prober client module in Citadel periodically checks the health status of Citadel’s CSR gRPC server. It does this by sending CSRs to the gRPC server and verifies the responses. If Citadel is healthy, the prober client updates the modification time of the health status file. Otherwise, it does nothing. Citadel relies on a Kubernetes liveness and readiness probe with command line to check the modification time of the health status file on the pod. If the file is not updated for a period, Kubelet will restart the Citadel container.
Before you begin
Follow the Istio installation guide to install Istio with mutual TLS enabled.
Deploying Citadel with health checking
To enable health checking, redeploy Citadel:
$ istioctl manifest generate --set values.global.mtls.enabled=true,values.security.citadelHealthCheck=true > citadel-health-check.yaml
$ kubectl apply -f citadel-health-check.yaml
Verify that health checking works
Citadel will log the health checking results. Run the following in command line:
$ kubectl logs `kubectl get po -n istio-system | grep istio-citadel | awk '{print $1}'` -n istio-system | grep "CSR signing service"
You will see the output similar to:
... CSR signing service is healthy (logged every 100 times).
The log above indicates the periodic health checking is working. The default health checking interval is 15 seconds and is logged once every 100 checks.
(Optional) Configuring the health checking
This section talks about how to modify the health checking configuration. Open the file
citadel-health-check.yaml
, and locate the following lines.
...
- --liveness-probe-path=/tmp/ca.liveness # path to the liveness health checking status file
- --liveness-probe-interval=60s # interval for health checking file update
- --probe-check-interval=15s # interval for health status check
livenessProbe:
exec:
command:
- /usr/local/bin/istio_ca
- probe
- --probe-path=/tmp/ca.liveness # path to the liveness health checking status file
- --interval=125s # the maximum time gap allowed between the file mtime and the current sys clock.
initialDelaySeconds: 60
periodSeconds: 60
...
The paths to the health status files are liveness-probe-path
and probe-path
.
You should update the paths in Citadel and in the livenessProbe
at the same time.
If Citadel is healthy, the value of the liveness-probe-interval
entry determines the interval used to update the
health status file.
The Citadel health checking controller uses the value of the probe-check-interval
entry to determine the interval to
call the Citadel CSR service.
The interval
is the maximum time elapsed since the last update of the health status file, for the prober to consider
Citadel as healthy.
The values in the initialDelaySeconds
and periodSeconds
entries determine the initial delay and the interval between
each activation of the livenessProbe
.
Prolonging probe-check-interval
will reduce the health checking overhead, but there will be a greater lagging for the
prober to get notified on the unhealthy status.
To avoid the prober restarting Citadel due to temporary unavailability, the interval
on the prober can be
configured to be more than N
times of the liveness-probe-interval
. This will allow the prober to tolerate N-1
continuously failed health checks.
Cleanup
To disable health checking on Citadel:
$ istioctl manifest apply --set values.global.mtls.enabled=true