Metrics and Logs FAQ
Can Istio metrics be accessed through REST?
You can collect telemetry data about Istio using Prometheus1. And then use Prometheus’s HTTP API2 to query that data.
What are the differences in telemetry reported by in-proxy telemetry (aka v2) and Mixer-based telemetry (aka v1)?
In-proxy telemetry (aka v2) reduces resource cost and improves proxy performance as compared to the Mixer-based telemetry (aka v1) approach, and is the preferred mechanism for surfacing telemetry in Istio. However, there are few differences in reported telemetry between v1 and v2 which are listed below:
Missing labels for out-of-mesh traffic In-proxy telemetry relies on metadata exchange between Envoy proxies to gather information like peer workload name, namespace and labels. In Mixer-based telemetry this functionality was performed by Mixer as part of combining request attributes with the platform data. This metadata exchange is performed by the Envoy proxies by adding a specific HTTP header for HTTP protocol or augmenting ALPN protocol for TCP protocol as described here. This requires Envoy proxies to be injected at both the client & server workloads, implying that the telemetry reported when one peer is not in the mesh will be missing peer attributes like workload name, namespace and labels. However, if both peers have proxies injected all the labels mentioned here3 are available in the generated metrics. When the server workload is out of the mesh, server workload metadata is still distributed to client sidecar, causing client side metrics to have server workload metadata labels filled.
TCP metadata exchange requires mTLS TCP metadata exchange relies on the Istio ALPN protocol which requires mutual TLS (mTLS) to be enabled for the Envoy proxies to exchange metadata successfully. This implies that if mTLS is not enabled in your cluster, telemetry for TCP protocol will not include peer information like workload name, namespace and labels.
No mechanism for configuring custom buckets for histogram metrics Mixer-based telemetry supported customizing buckets for histogram type metrics like request duration and TCP byte sizes. In-proxy telemetry has no such available mechanism. Additionally, the buckets available for latency metrics in in-proxy telemetry are in milliseconds as compared to seconds in Mixer-based telemetry. However, more buckets are available by default in in-proxy telemetry for latency metrics at the lower latency levels.
No metric expiration for short-lived metrics Mixer-based telemetry supported metric expiration whereby metrics which were not generated for a configurable amount of time were de-registered for collection by Prometheus. This is useful in scenarios, such as one-off jobs, that generate short-lived metrics. De-registering the metrics prevents reporting of metrics which would no longer change in the future, thereby reducing network traffic and storage in Prometheus. This expiration mechanism is not available in in-proxy telemetry. The workaround for this can be found here.
How can I manage short-lived metrics?
Short-lived metrics can hamper the performance of Prometheus, as they often are a large source of label cardinality. Cardinality is a measure of the number of unique values for a label. To manage the impact of your short-lived metrics on Prometheus, you must first identify the high cardinality metrics and labels. Prometheus provides cardinality information at its /status
page. Additional information can be retrieved via PromQL4.
There are several ways to reduce the cardinality of Istio metrics:
- Disable host header fallback.
The
destination_service
label is one potential source of high-cardinality. The values fordestination_service
default to the host header if the Istio proxy is not able to determine the destination service from other request metadata. If clients are using a variety of host headers, this could result in a large number of values for thedestination_service
. In this case, follow the metric customization5 guide to disable host header fallback mesh wide. To disable host header fallback for a particular workload or namespace, you need to copy the statsEnvoyFilter
configuration, update it to have host header fallback disabled, and apply it with a more specific selector. This issue6 has more detail on how to achieve this. - Drop unnecessary labels from collection. If the label with high cardinality is not needed, you can drop it from metric collection via metric customization5 using
tags_to_remove
. - Normalize label values, either through federation or classification. If the information provided by the label is desired, you can use Prometheus federation or request classification7 to normalize the label.
How do I migrate existing Mixer functionality?
Mixer was removed in the 1.8 Istio release. Migration is needed if you still rely on Mixer’s built-in adapters or any out-of-process adapters for mesh extension.
For built-in adapters, several alternatives are provided:
Prometheus
andStackdriver
integrations are implemented as proxy extensions8. Customization of telemetry generated by these two extensions can be achieved via request classification7 and Prometheus metrics customization5.- Global and Local Rate-Limiting (
memquota
andredisquota
adapters) functionality is provided through the Envoy-based rate-limiting solution9. OPA
adapter is replaced by the Envoy ext-authz based solution10, which supports integration with OPA policy agent11.
For custom out-of-process adapters, migration to Wasm-based extensions is strongly encouraged. Please refer to the guides on Wasm module development12 and extension distribution13. As a temporary solution, you can enable Envoy ext-authz and gRPC access log API support in Mixer14, which allows you to upgrade Istio to post 1.7 versions while still using 1.7 Mixer with out-of-process adapters. This will give you more time to migrate to Wasm-based extensions. Note this temporary solution is not battle-tested and will unlikely get patch fixes, since it is only available on the Istio 1.7 branch which is out of support window after Feb 2021.
Can the Prometheus adapter be used in non-Kubernetes environments?
You can use docker-compose to install Prometheus.
How to figure out what happened to a request in Istio?
You can enable tracing15 to determine the flow of a request in Istio.
Additionally, you can use the following commands to know more about the state of the mesh:
istioctl proxy-config
: Retrieve information about proxy configuration when running in Kubernetes:kubectl get
: Gets information about different resources in the mesh along with routing configuration:
Can I use Prometheus to scrape application metrics with Istio?
Yes. Prometheus16 is an open source monitoring system and time series database. You can use Prometheus with Istio to record metrics that track the health of Istio and of applications within the service mesh. You can visualize metrics using tools like Grafana17 and Kiali18. See Configuration for Prometheus to understand how to enable collection of metrics.
A few notes:
- If the Prometheus pod started before the istiod pod could generate the required certificates and distribute them to Prometheus, the Prometheus pod will need to be restarted in order to collect from mutual TLS-protected targets.
- If your application exposes Prometheus metrics on a dedicated port, that port should be added to the service and deployment specifications.