Metrics and Logs FAQ

Can Istio metrics be accessed through REST?

You can collect telemetry data about Istio using Prometheus. And then use Prometheus’s HTTP API to query that data.

What are the differences in telemetry reported by in-proxy telemetry (aka v2) and Mixer-based telemetry (aka v1)?

In-proxy telemetry (aka v2) reduces resource cost and improves proxy performance as compared to the Mixer-based telemetry (aka v1) approach, and is the preferred mechanism for surfacing telemetry in Istio. However, there are few differences in reported telemetry between v1 and v2 which are listed below:

Missing labels for out-of-mesh traffic In-proxy telemetry relies on metadata exchange between Envoy proxies to gather information like peer workload name, namespace and labels. In Mixer-based telemetry this functionality was performed by Mixer as part of combining request attributes with the platform data. This metadata exchange is performed by the Envoy proxies by adding a specific HTTP header for HTTP protocol or augmenting ALPN protocol for TCP protocol as described here. This requires Envoy proxies to be injected at both the client & server workloads, implying that the telemetry reported when one peer is not in the mesh will be missing peer attributes like workload name, namespace and labels. However, if both peers have proxies injected all the labels mentioned here are available in the generated metrics. When the server workload is out of the mesh, server workload metadata is still distributed to client sidecar, causing client side metrics to have server workload metadata labels filled.
TCP metadata exchange requires mTLS TCP metadata exchange relies on the Istio ALPN protocol which requires mutual TLS (mTLS) to be enabled for the Envoy proxies to exchange metadata successfully. This implies that if mTLS is not enabled in your cluster, telemetry for TCP protocol will not include peer information like workload name, namespace and labels.
No mechanism for configuring custom buckets for histogram metrics Mixer-based telemetry supported customizing buckets for histogram type metrics like request duration and TCP byte sizes. In-proxy telemetry has no such available mechanism. Additionally, the buckets available for latency metrics in in-proxy telemetry are in milliseconds as compared to seconds in Mixer-based telemetry. However, more buckets are available by default in in-proxy telemetry for latency metrics at the lower latency levels.
No metric expiration for short-lived metrics Mixer-based telemetry supported metric expiration whereby metrics which were not generated for a configurable amount of time were de-registered for collection by Prometheus. This is useful in scenarios, such as one-off jobs, that generate short-lived metrics. De-registering the metrics prevents reporting of metrics which would no longer change in the future, thereby reducing network traffic and storage in Prometheus. This expiration mechanism is not available in in-proxy telemetry. The workaround for this can be found here.

How can I manage short-lived metrics?

Short-lived metrics can hamper the performance of Prometheus, as they often are a large source of label cardinality. Cardinality is a measure of the number of unique values for a label. To manage the impact of your short-lived metrics on Prometheus, you must first identify the high cardinality metrics and labels. Prometheus provides cardinality information at its /status page. Additional information can be retrieved via PromQL. There are several ways to reduce the cardinality of Istio metrics:

Disable host header fallback. The destination_service label is one potential source of high-cardinality. The values for destination_service default to the host header if the Istio proxy is not able to determine the destination service from other request metadata. If clients are using a variety of host headers, this could result in a large number of values for the destination_service. In this case, follow the metric customization guide to disable host header fallback mesh wide. To disable host header fallback for a particular workload or namespace, you need to copy the stats EnvoyFilter configuration, update it to have host header fallback disabled, and apply it with a more specific selector. This issue has more detail on how to achieve this.
Drop unnecessary labels from collection. If the label with high cardinality is not needed, you can drop it from metric collection via metric customization using tags_to_remove.
Normalize label values, either through federation or classification. If the information provided by the label is desired, you can use Prometheus federation or request classification to normalize the label.

How do I migrate existing Mixer functionality?

Mixer was removed in the 1.8 Istio release. Migration is needed if you still rely on Mixer’s built-in adapters or any out-of-process adapters for mesh extension.

For built-in adapters, several alternatives are provided:

Prometheus and Stackdriver integrations are implemented as proxy extensions. Customization of telemetry generated by these two extensions can be achieved via request classification and Prometheus metrics customization.
Global and Local Rate-Limiting (memquota and redisquota adapters) functionality is provided through the Envoy-based rate-limiting solution.
OPA adapter is replaced by the Envoy ext-authz based solution, which supports integration with OPA policy agent.

For custom out-of-process adapters, migration to Wasm-based extensions is strongly encouraged. Please refer to the guides on Wasm module development and extension distribution. As a temporary solution, you can enable Envoy ext-authz and gRPC access log API support in Mixer, which allows you to upgrade Istio to post 1.7 versions while still using 1.7 Mixer with out-of-process adapters. This will give you more time to migrate to Wasm-based extensions. Note this temporary solution is not battle-tested and will unlikely get patch fixes, since it is only available on the Istio 1.7 branch which is out of support window after Feb 2021.

Can the Prometheus adapter be used in non-Kubernetes environments?

You can use docker-compose to install Prometheus.

How to figure out what happened to a request in Istio?

You can enable tracing to determine the flow of a request in Istio.

Additionally, you can use the following commands to know more about the state of the mesh:

istioctl proxy-config: Retrieve information about proxy configuration when running in Kubernetes:

# Retrieve information about bootstrap configuration for the Envoy instance in the specified pod.
$ istioctl proxy-config bootstrap productpage-v1-bb8d5cbc7-k7qbm

# Retrieve information about cluster configuration for the Envoy instance in the specified pod.
$ istioctl proxy-config cluster productpage-v1-bb8d5cbc7-k7qbm

# Retrieve information about listener configuration for the Envoy instance in the specified pod.
$ istioctl proxy-config listener productpage-v1-bb8d5cbc7-k7qbm

# Retrieve information about route configuration for the Envoy instance in the specified pod.
$ istioctl proxy-config route productpage-v1-bb8d5cbc7-k7qbm

# Retrieve information about endpoint configuration for the Envoy instance in the specified pod.
$ istioctl proxy-config endpoints productpage-v1-bb8d5cbc7-k7qbm

# Try the following to discover more proxy-config commands
$ istioctl proxy-config --help

kubectl get: Gets information about different resources in the mesh along with routing configuration:
```
# List all virtual services
$ kubectl get virtualservices
```

Can I use Prometheus to scrape application metrics with Istio?

Yes. Prometheus is an open source monitoring system and time series database. You can use Prometheus with Istio to record metrics that track the health of Istio and of applications within the service mesh. You can visualize metrics using tools like Grafana and Kiali. See Configuration for Prometheus to understand how to enable collection of metrics.

A few notes:

If the Prometheus pod started before the istiod pod could generate the required certificates and distribute them to Prometheus, the Prometheus pod will need to be restarted in order to collect from mutual TLS-protected targets.
If your application exposes Prometheus metrics on a dedicated port, that port should be added to the service and deployment specifications.