Fault Injection
This task shows you how to inject faults to test the resiliency of your application.
Before you begin
Set up Istio by following the instructions in the Installation guide.
Deploy the Bookinfo sample application.
Review the fault injection discussion in the Traffic Management concepts doc.
Apply application version routing by either performing the request routing task or by running the following commands:
$ kubectl apply -f @samples/bookinfo/networking/virtual-service-all-v1.yaml@ $ kubectl apply -f @samples/bookinfo/networking/virtual-service-reviews-test-v2.yaml@
Injecting an HTTP delay fault
To test the Bookinfo application microservices for resiliency, inject a 7s delay
between the reviews:v2
and ratings
microservices for user jason
. This test
will uncover a bug that was intentionally introduced into the Bookinfo app.
Note that the reviews:v2
service has a 10s hard-coded connection timeout for
calls to the ratings service. Even with the 7s delay that you introduced, you
still expect the end-to-end flow to continue without any errors.
Create a fault injection rule to delay traffic coming from the test user
jason
.$ kubectl apply -f @samples/bookinfo/networking/virtual-service-ratings-test-delay.yaml@
Confirm the rule was created:
$ kubectl get virtualservice ratings -o yaml apiVersion: networking.istio.io/v1alpha3 kind: VirtualService metadata: name: ratings ... spec: hosts: - ratings http: - fault: delay: fixedDelay: 7s percent: 100 match: - headers: end-user: exact: jason route: - destination: host: ratings subset: v1 - route: - destination: host: ratings subset: v1
Allow several seconds for the new rule to propagate to all pods.
Testing the delay configuration
Open the Bookinfo web application in your browser.
On the
/productpage
, log in as userjason
.You expect the Bookinfo home page to load without errors in approximately 7 seconds. However, there is a problem: the Reviews section displays an error message:
Error fetching product reviews! Sorry, product reviews are currently unavailable for this book.
View the web page response times:
- Open the Developer Tools menu in you web browser.
- Open the Network tab
- Reload the
productpage
web page. You will see that the webpage actually loads in about 6 seconds.
Understanding what happened
You've found a bug. There are hard-coded timeouts in the microservices that have
caused the reviews
service to fail.
The timeout between the
productpage
and the reviews
service is 6 seconds - coded as 3s + 1 retry
for 6s total. The timeout between the reviews
and ratings
service is hard-coded at 10 seconds. Because of the delay we introduced, the /productpage
times out prematurely and throws the error.
Bugs like this can occur in typical enterprise applications where different teams develop different microservices independently. Istio's fault injection rules help you identify such anomalies without impacting end users.
Notice that the fault injection test is restricted to when the logged in user is
jason
. If you login as any other user, you will not experience any delays.
Fixing the bug
You would normally fix the problem by:
- Either increasing the
/productpage
timeout or decreasing thereviews
toratings
service timeout - Stopping and restarting the fixed microservice
- Confirming that the
/productpage
returns its response without any errors.
However, you already have this fix running in v3 of the reviews service, so you
can simply fix the problem by migrating all traffic to reviews:v3
as described
in the traffic shifting task.
Exercise
Change the delay rule to use a 2.8 second delay and then run it against the v3 version of reviews.
Injecting an HTTP abort fault
Another way to test microservice resiliency is to introduce an HTTP abort fault.
In this task, you will introduce an HTTP abort to the ratings
microservices for
the test user jason
.
In this case, you expect the page to load immediately and display the product ratings not available
message.
Create a fault injection rule to send an HTTP abort for user
jason
:$ kubectl apply -f @samples/bookinfo/networking/virtual-service-ratings-test-abort.yaml@
Confirm the rule was created:
$ kubectl get virtualservice ratings -o yaml apiVersion: networking.istio.io/v1alpha3 kind: VirtualService metadata: name: ratings ... spec: hosts: - ratings http: - fault: abort: httpStatus: 500 percent: 100 match: - headers: end-user: exact: jason route: - destination: host: ratings subset: v1 - route: - destination: host: ratings subset: v1
Testing the abort configuration
Open the Bookinfo web application in your browser.
On the
/productpage
, log in as userjason
.If the rule propagated successfully to all pods, the page loads immediately and the
product ratings not available
message appears.Log out from user
jason
and the rating stars show up successfully on the application's/productpage
.
Cleanup
Remove the application routing rules:
$ kubectl delete -f @samples/bookinfo/networking/virtual-service-all-v1.yaml@
If you are not planning to explore any follow-on tasks, refer to the Bookinfo cleanup instructions to shutdown the application.
See also
Deploy a custom ingress gateway using cert-manager
Describes how to deploy a custom ingress gateway using cert-manager manually.
Incremental Istio Part 1, Traffic Management
How to use Istio for traffic management without deploying sidecar proxies.
Introducing the Istio v1alpha3 routing API
Introduction, motivation and design principles for the Istio v1alpha3 routing API.
Configuring Istio Ingress with AWS NLB
Describes how to configure Istio ingress with a network load balancer on AWS.
Traffic Mirroring with Istio for Testing in Production
An introduction to safer, lower-risk deployments and release to production.
Consuming External TCP Services
Describes a simple scenario based on Istio's Bookinfo example.