Troubleshooting Multicluster

This page describes how to troubleshoot issues with Istio deployed to multiple clusters and/or networks. Before reading this, you should take the steps in Multicluster Installation and read the Deployment Models guide.

Cross-Cluster Load Balancing

The most common, but also broad problem with multi-network installations is that cross-cluster load balancing doesn’t work. Usually this manifests itself as only seeing responses from the cluster-local instance of a Service:

$ for i in $(seq 10); do kubectl --context=$CTX_CLUSTER1 -n sample exec sleep-dd98b5f48-djwdw -c sleep -- curl -s helloworld:5000/hello; done
Hello version: v1, instance: helloworld-v1-578dd69f69-j69pf
Hello version: v1, instance: helloworld-v1-578dd69f69-j69pf
Hello version: v1, instance: helloworld-v1-578dd69f69-j69pf
...

When following the guide to verify multicluster installation we would expect both v1 and v2 responses, indicating traffic is going to both clusters.

There are many possible causes to the problem:

Locality Load Balancing

Locality load balancing can be used to make clients prefer that traffic go to the nearest destination. If the clusters are in different localities (region/zone), locality load balancing will prefer the local-cluster and is working as intended. If locality load balancing is disabled, or the clusters are in the same locality, there may be another issue.

Trust Configuration

Cross-cluster traffic, as with intra-cluster traffic, relies on a common root of trust between the proxies. The default Istio installation will use their own individually generated root certificate-authorities. For multi-cluster, we must manually configure a shared root of trust. Follow Plug-in Certs below or read Identity and Trust Models to learn more.

Plug-in Certs:

To verify certs are configured correctly, you can compare the root-cert in each cluster:

$ diff \
   <(kubectl --context="${CTX_CLUSTER1}" -n istio-system get secret cacerts -ojsonpath='{.data.root-cert\.pem}') \
   <(kubectl --context="${CTX_CLUSTER2}" -n istio-system get secret cacerts -ojsonpath='{.data.root-cert\.pem}')

You can follow the Plugin CA Certs guide, ensuring to run the steps for every cluster.

Step-by-step Diagnosis

If you’ve gone through the sections above and are still having issues, then it’s time to dig a little deeper.

The following steps assume you’re following the HelloWorld verification. Before continuing, make sure both helloworld and sleep are deployed in each cluster.

From each cluster, find the endpoints the sleep service has for helloworld:

$ istioctl --context $CTX_CLUSTER1 proxy-config endpoint sleep-dd98b5f48-djwdw.sample | grep helloworld

Troubleshooting information differs based on the cluster that is the source of traffic:

$ istioctl --context $CTX_CLUSTER1 proxy-config endpoint sleep-dd98b5f48-djwdw.sample | grep helloworld
10.0.0.11:5000                   HEALTHY     OK                outbound|5000||helloworld.sample.svc.cluster.local

Only one endpoint is shown, indicating the control plane cannot read endpoints from the remote cluster. Verify that remote secrets are configured properly.

$ kubectl get secrets --context=$CTX_CLUSTER1 -n istio-system -l "istio/multiCluster=true"
  • If the secret is missing, create it.
  • If the secret is present:
    • Look at the config in the secret. Make sure the cluster name is used as the data key for the remote kubeconfig.
    • If the secret looks correct, check the logs of istiod for connectivity or permissions issues reaching the remote Kubernetes API server. Log messages may include Failed to add remote cluster from secret along with an error reason.
Was this information useful?
Do you have any suggestions for improvement?

Thanks for your feedback!