Configuring failover for external services

Learn how to configure locality load balancing and failover for endpoints that are outside of your mesh.

Jun 4, 2021 | By Ram Vennam - Solo.io

Istio’s powerful APIs can be used to solve a variety of service mesh use cases. Many users know about its strong ingress and east-west capabilities but it also offers many features for egress (outgoing) traffic. This is especially useful when your application needs to talk to an external service - such as a database endpoint provided by a cloud provider. There are often multiple endpoints to chose from depending on where your workload is running. For example, Amazon’s DynamoDB provides several endpoints across their regions. You typically want to choose the endpoint closest to your workload for latency reasons, but you may need to configure automatic failover to another endpoint in case things are not working as expected.

Similar to services running inside the service mesh, you can configure Istio to detect outliers and failover to a healthy endpoint, while still being completely transparent to your application. In this example, we’ll use Amazon DynamoDB endpoints and pick a primary region that is the same or close to workloads running in a Google Kubernetes Engine (GKE) cluster. We’ll also configure a failover region.

RoutingEndpoint
Primaryhttp://dynamodb.us-east-1.amazonaws.com
Failoverhttp://dynamodb.us-west-1.amazonaws.com

failover

Define external endpoints using a ServiceEntry

Locality load balancing works based on region or zone, which are usually inferred from labels set on the Kubernetes nodes. First, determine the location of your workloads:

$ kubectl describe node | grep failure-domain.beta.kubernetes.io/region
                    failure-domain.beta.kubernetes.io/region=us-east1
                    failure-domain.beta.kubernetes.io/region=us-east1

In this example, the GKE cluster nodes are running in us-east1.

Next, create a ServiceEntry which aggregates the endpoints you want to use. In this example, we have selected mydb.com as the host. This is the address your application should be configured to connect to. Set the locality of the primary endpoint to the same region as your workload:

apiVersion: networking.istio.io/v1beta1
kind: ServiceEntry
metadata:
 name: external-svc-dns
spec:
 hosts:
 - mydb.com
 location: MESH_EXTERNAL
 ports:
 - number: 80
   name: http
   protocol: HTTP
 resolution: DNS
 endpoints:
 - address: dynamodb.us-east-1.amazonaws.com
   locality: us-east1
   ports:
     http: 80
 - address: dynamodb.us-west-1.amazonaws.com
   locality: us-west
   ports:
     http: 80

Let’s deploy a sleep container to use as a test source for sending requests.

Zip
$ kubectl apply -f @samples/sleep/sleep.yaml@

From the sleep container try going to http://mydb.com 5 times:

$ for i in {1..5}; do kubectl exec deploy/sleep -c sleep -- curl -sS http://mydb.com; echo; sleep 2; done
healthy: dynamodb.us-east-1.amazonaws.com
healthy: dynamodb.us-west-1.amazonaws.com
healthy: dynamodb.us-west-1.amazonaws.com
healthy: dynamodb.us-east-1.amazonaws.com
healthy: dynamodb.us-east-1.amazonaws.com

You will see that Istio is sending requests to both endpoints. We only want it to send to the endpoint marked with the same region as our nodes.

For that, we need to configure a DestinationRule.

Set failover conditions using a DestinationRule

Istio’s DestinationRule lets you configure load balancing, connection pool, and outlier detection settings. We can specify the conditions used to identify an endpoint as unhealthy and remove it from the load balancing pool.

apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
 name: mydynamodb
spec:
 host: mydb.com
 trafficPolicy:
   outlierDetection:
     consecutive5xxErrors: 1
     interval: 15s
     baseEjectionTime: 1m

The above DestinationRule configures the endpoints to be scanned every 15 seconds, and if any endpoint fails with a 5xx error code, even once, it will be marked unhealthy for one minute. If this circuit breaker is not triggered, the traffic will route to the same region as the pod.

If we run our curl again, we should see that traffic is always going to the us-east1 endpoint.

$ for i in {1..5}; do kubectl exec deploy/sleep -c sleep -- curl -sS http://mydb.com; echo; sleep 2; done

healthy: dynamodb.us-east-1.amazonaws.com
healthy: dynamodb.us-east-1.amazonaws.com
healthy: dynamodb.us-east-1.amazonaws.com
healthy: dynamodb.us-east-1.amazonaws.com
healthy: dynamodb.us-east-1.amazonaws.com

Simulate a failure

Next, let’s see what happens if the us-east endpoint goes down. To simulate this, let’s modify the ServiceEntry and set the us-east endpoint to an invalid port:

apiVersion: networking.istio.io/v1beta1
kind: ServiceEntry
metadata:
 name: external-svc-dns
spec:
 hosts:
 - mydb.com
 location: MESH_EXTERNAL
 ports:
 - number: 80
   name: http
   protocol: HTTP
 resolution: DNS
 endpoints:
 - address: dynamodb.us-east-1.amazonaws.com
   locality: us-east1
   ports:
     http: 81 # INVALID - This is purposefully wrong to trigger failover
 - address: dynamodb.us-west-1.amazonaws.com
   locality: us-west
   ports:
     http: 80

Running our curl again shows that traffic is automatically failed over to our us-west region after failing to connect to the us-east endpoint:

$ for i in {1..5}; do kubectl exec deploy/sleep -c sleep -- curl -sS http://mydb.com; echo; sleep 2; done
upstream connect error or disconnect/reset before headers. reset reason: connection failure
healthy: dynamodb.us-west-1.amazonaws.com
healthy: dynamodb.us-west-1.amazonaws.com
healthy: dynamodb.us-west-1.amazonaws.com
healthy: dynamodb.us-west-1.amazonaws.com

You can check the outlier status of the us-east endpoint by running:

$ istioctl pc endpoints <sleep-pod> | grep mydb
ENDPOINT                         STATUS      OUTLIER CHECK     CLUSTER
52.119.226.80:81                 HEALTHY     FAILED            outbound|80||mydb.com
52.94.12.144:80                  HEALTHY     OK                outbound|80||mydb.com

Failover for HTTPS

Configuring failover for external HTTPS services is just as easy. Your application can still continue to use plain HTTP, and you can let the Istio proxy perform the TLS origination to the HTTPS endpoint.

apiVersion: networking.istio.io/v1beta1
kind: ServiceEntry
metadata:
 name: external-svc-dns
spec:
 hosts:
 - mydb.com
 ports:
 - number: 80
   name: http-port
   protocol: HTTP
   targetPort: 443
 resolution: DNS
 endpoints:
 - address: dynamodb.us-east-1.amazonaws.com
   locality: us-east1
 - address: dynamodb.us-west-1.amazonaws.com
   locality: us-west

The above ServiceEntry defines the mydb.com service on port 80 and redirects traffic to the real DynamoDB endpoints on port 443.

apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
 name: mydynamodb
spec:
 host: mydb.com
 trafficPolicy:
   tls:
     mode: SIMPLE
   loadBalancer:
     simple: ROUND_ROBIN
     localityLbSetting:
       enabled: true
       failover:
         - from: us-east1
           to: us-west
   outlierDetection:
     consecutive5xxErrors: 1
     interval: 15s
     baseEjectionTime: 1m

The DestinationRule now performs TLS origination and configures the outlier detection. The rule also has a failover field configured where you can specify exactly what regions are failover targets. This is useful when you have several regions defined.

Wrapping Up

Istio’s VirtualService and DestinationRule API’s provide traffic routing, failure recovery and fault injection features so that you can create resilient applications. The ServiceEntry API extends many of these features to external services that are not part of your service mesh.