Observability Best Practices

Using Prometheus for production-scale monitoring

The recommended approach for production-scale monitoring of Istio meshes with Prometheus is to use hierarchical federation in combination with a collection of recording rules.

In default deployments of Istio, a deployment of Prometheus is provided for collecting metrics generated for all mesh traffic. This deployment of Prometheus is intentionally deployed with a very short retention window (6 hours). The default Prometheus deployment is also configured to collect metrics from each Envoy proxy running in the mesh, augmenting each metric with a set of labels about their origin (instance, pod_name, and namespace).

While the default configuration is well-suited for small clusters and monitoring for short time horizons, it is not suitable for large-scale meshes or monitoring over a period of days or weeks. In particular, the introduced labels can increase metrics cardinality, requiring a large amount of storage. And, when trying to identify trends and differences in traffic over time, access to historical data can be paramount.

Architecture for production monitoring of Istio using Prometheus.
Production-scale Istio monitoring with Istio

Workload-level aggregation via recording rules

In order to aggregate metrics across instances and pods, update the default Prometheus configuration with the following recording rules:

groups:
- name: "istio.recording-rules"
  interval: 5s
  rules:
  - record: "workload:istio_requests_total"
    expr: |
      sum without(instance, namespace, pod_name) (istio_requests_total)

  - record: "workload:istio_request_duration_milliseconds_count"
    expr: |
      sum without(instance, namespace, pod_name) (istio_request_duration_milliseconds_count)

  - record: "workload:istio_request_duration_milliseconds_sum"
    expr: |
      sum without(instance, namespace, pod_name) (istio_request_duration_milliseconds_sum)

  - record: "workload:istio_request_duration_milliseconds_bucket"
    expr: |
      sum without(instance, namespace, pod_name) (istio_request_duration_milliseconds_bucket)

  - record: "workload:istio_request_bytes_count"
    expr: |
      sum without(instance, namespace, pod_name) (istio_request_bytes_count)

  - record: "workload:istio_request_bytes_sum"
    expr: |
      sum without(instance, namespace, pod_name) (istio_request_bytes_sum)

  - record: "workload:istio_request_bytes_bucket"
    expr: |
      sum without(instance, namespace, pod_name) (istio_request_bytes_bucket)

  - record: "workload:istio_response_bytes_count"
    expr: |
      sum without(instance, namespace, pod_name) (istio_response_bytes_count)

  - record: "workload:istio_response_bytes_sum"
    expr: |
      sum without(instance, namespace, pod_name) (istio_response_bytes_sum)

  - record: "workload:istio_response_bytes_bucket"
    expr: |
      sum without(instance, namespace, pod_name) (istio_response_bytes_bucket)

  - record: "workload:istio_tcp_connections_opened_total"
    expr: |
      sum without(instance, namespace, pod_name) (istio_tcp_connections_opened_total)

  - record: "workload:istio_tcp_connections_closed_total"
    expr: |
      sum without(instance, namespace, pod_name) (istio_tcp_connections_opened_total)

  - record: "workload:istio_tcp_sent_bytes_total_count"
    expr: |
      sum without(instance, namespace, pod_name) (istio_tcp_sent_bytes_total_count)

  - record: "workload:istio_tcp_sent_bytes_total_sum"
    expr: |
      sum without(instance, namespace, pod_name) (istio_tcp_sent_bytes_total_sum)

  - record: "workload:istio_tcp_sent_bytes_total_bucket"
    expr: |
      sum without(instance, namespace, pod_name) (istio_tcp_sent_bytes_total_bucket)

  - record: "workload:istio_tcp_received_bytes_total_count"
    expr: |
      sum without(instance, namespace, pod_name) (istio_tcp_received_bytes_total_count)

  - record: "workload:istio_tcp_received_bytes_total_sum"
    expr: |
      sum without(instance, namespace, pod_name) (istio_tcp_received_bytes_total_sum)

  - record: "workload:istio_tcp_received_bytes_total_bucket"
    expr: |
      sum without(instance, namespace, pod_name) (istio_tcp_received_bytes_total_bucket)

Federation using workload-level aggregated metrics

To establish Prometheus federation, modify the configuration of your production-ready deployment of Prometheus to scrape the federation endpoint of the Istio Prometheus.

Add the following job to your configuration:

- job_name: 'istio-prometheus'
  honor_labels: true
  metrics_path: '/federate'
  kubernetes_sd_configs:
  - role: pod
    namespaces:
      names: ['istio-system']
  metric_relabel_configs:
  - source_labels: [__name__]
    regex: 'workload:(.*)'
    target_label: __name__
    action: replace
  params:
    'match[]':
    - '{__name__=~"workload:(.*)"}'
    - '{__name__=~"pilot(.*)"}'

If you are using the Prometheus Operator, use the following configuration instead:

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: istio-federation
  labels:
    app.kubernetes.io/name: istio-prometheus
spec:
  namespaceSelector:
    matchNames:
    - istio-system
  selector:
    matchLabels:
      app: prometheus
  endpoints:
  - interval: 30s
    scrapeTimeout: 30s
    params:
      'match[]':
      - '{__name__=~"workload:(.*)"}'
      - '{__name__=~"pilot(.*)"}'
    path: /federate
    targetPort: 9090
    honorLabels: true
    metricRelabelings:
    - sourceLabels: ["__name__"]
      regex: 'workload:(.*)'
      targetLabel: "__name__"
      action: replace

Optimizing metrics collection with recording rules

Beyond just using recording rules to aggregate over pods and instances, you may want to use recording rules to generate aggregated metrics tailored specifically to your existing dashboards and alerts. Optimizing your collection in this manner can result in large savings in resource consumption in your production instance of Prometheus, in addition to faster query performance.

For example, imagine a custom monitoring dashboard that used the following Prometheus queries:

  • Total rate of requests averaged over the past minute by destination service name and namespace

    sum(irate(istio_requests_total{reporter="source"}[1m]))
    by (
        destination_canonical_service,
        destination_workload_namespace
    )
    
  • P95 client latency averaged over the past minute by source and destination service names and namespace

    histogram_quantile(0.95,
      sum(irate(istio_request_duration_milliseconds_bucket{reporter="source"}[1m]))
      by (
        destination_canonical_service,
        destination_workload_namespace,
        source_canonical_service,
        source_workload_namespace,
        le
      )
    )
    

The following set of recording rules could be added to the Istio Prometheus configuration, using the istio prefix to make identifying these metrics for federation simple.

groups:
- name: "istio.recording-rules"
  interval: 5s
  rules:
  - record: "istio:istio_requests:by_destination_service:rate1m"
    expr: |
      sum(irate(istio_requests_total{reporter="destination"}[1m]))
      by (
        destination_canonical_service,
        destination_workload_namespace
      )
  - record: "istio:istio_request_duration_milliseconds_bucket:p95:rate1m"
    expr: |
      histogram_quantile(0.95,
        sum(irate(istio_request_duration_milliseconds_bucket{reporter="source"}[1m]))
        by (
          destination_canonical_service,
          destination_workload_namespace,
          source_canonical_service,
          source_workload_namespace,
          le
        )
      )

The production instance of Prometheus would then be updated to federate from the Istio instance with:

  • match clause of {__name__=~"istio:(.*)"}

  • metric relabeling config with: regex: "istio:(.*)"

The original queries would then be replaced with:

  • istio_requests:by_destination_service:rate1m

  • avg(istio_request_duration_milliseconds_bucket:p95:rate1m)

Was this information useful?
Do you have any suggestions for improvement?

Thanks for your feedback!