Monitoring blocked and passthrough external service traffic
Understanding, controlling and securing your external service access is one of the key benefits that you get from a service mesh like Istio. From a security and operations point of view, it is critical to monitor what external service traffic is getting blocked as they might surface possible misconfigurations or a security vulnerability if an application is attempting to communicate with a service that it should not be allowed to. Similarly, if you currently have a policy of allowing any external service access, it is beneficial to monitor the traffic so you can incrementally add explicit Istio configuration to allow access and better security your cluster. In either case, having visibility into this traffic via telemetry is quite helpful as it enables you to create alerts and dashboards, and better reason about your security posture. This was a highly requested feature by production users of Istio and we are excited that the support for this was added in release 1.3.
To implement this, the Istio default metrics are augmented with explicit labels to capture blocked and passthrough external service traffic. This blog will cover how you can use these augmented metrics to monitor all external service traffic.
The Istio control plane configures the sidecar proxy with predefined clusters called BlackHoleCluster and Passthrough which block or allow all traffic respectively. To understand these clusters, let’s start with what external and internal services mean in the context of Istio service mesh.
External and internal services
Internal services are defined as services which are part of your platform and are considered to be in the mesh. For internal services, Istio control plane provides all the required configuration to the sidecars by default. For example, in Kubernetes clusters, Istio configures the sidecars for all Kubernetes services to preserve the default Kubernetes behavior of all services being able to communicate with other.
External services are services which are not part of your platform i.e. services
which are outside of the mesh. For external services, Istio provides two
options, first to block all external service access (enabled by setting
global.outboundTrafficPolicy.mode
to REGISTRY_ONLY
) and
second to allow all access to external service (enabled by setting
global.outboundTrafficPolicy.mode
to ALLOW_ANY
). The default option for this
setting (as of Istio 1.3) is to allow all external service access. This
option can be configured via mesh configuration.
This is where the BlackHole and Passthrough clusters are used.
What are BlackHole and Passthrough clusters?
- BlackHoleCluster - The BlackHoleCluster is a virtual cluster created
in the Envoy configuration when
global.outboundTrafficPolicy.mode
is set toREGISTRY_ONLY
. In this mode, all traffic to external service is blocked unless service entries are explicitly added for each service. To implement this, the default virtual outbound listener at0.0.0.0:15001
which uses original destination is setup as a TCP Proxy with the BlackHoleCluster as the static cluster. The configuration for the BlackHoleCluster looks like this:
{
"name": "BlackHoleCluster",
"type": "STATIC",
"connectTimeout": "10s"
}
As you can see, this cluster is static with no endpoints so all the traffic will be dropped. Additionally, Istio creates unique listeners for every port/protocol combination of platform services which gets hit instead of the virtual listener if the request is made to an external service on the same port. In that case, the route configuration of every virtual route in Envoy is augmented to add the BlackHoleCluster like this:
{
"name": "block_all",
"domains": [
"*"
],
"routes": [
{
"match": {
"prefix": "/"
},
"directResponse": {
"status": 502
}
}
]
}
The route is setup as direct response
with 502
response code which means if no other routes match the Envoy proxy
will directly return a 502
HTTP status code.
- PassthroughCluster - The PassthroughCluster is a virtual cluster created
in the Envoy configuration when
global.outboundTrafficPolicy.mode
is set toALLOW_ANY
. In this mode, all traffic to any external service external is allowed. To implement this, the default virtual outbound listener at0.0.0.0:15001
which usesSO_ORIGINAL_DST
, is setup as a TCP Proxy with the PassthroughCluster as the static cluster. The configuration for the PassthroughCluster looks like this:
{
"name": "PassthroughCluster",
"type": "ORIGINAL_DST",
"connectTimeout": "10s",
"lbPolicy": "ORIGINAL_DST_LB",
"circuitBreakers": {
"thresholds": [
{
"maxConnections": 102400,
"maxRetries": 1024
}
]
}
}
This cluster uses the original destination load balancing policy which configures Envoy to send the traffic to the original destination i.e. passthrough.
Similar to the BlackHoleCluster, for every port/protocol based listener the virtual route configuration is augmented to add the PassthroughCluster as the default route:
{
"name": "allow_any",
"domains": [
"*"
],
"routes": [
{
"match": {
"prefix": "/"
},
"route": {
"cluster": "PassthroughCluster"
}
}
]
}
Prior to Istio 1.3, there were no metrics reported or if metrics were reported there were no explicit labels set when traffic hit these clusters, resulting in lack of visibility in traffic flowing through the mesh.
The next section covers how to take advantage of this enhancement as the metrics and labels emitted are conditional on whether the virtual outbound or explicit port/protocol listener is being hit.
Using the augmented metrics
To capture all external service traffic in either of the cases (BlackHole or
Passthrough), you will need to monitor istio_requests_total
and
istio_tcp_connections_closed_total
metrics. Depending upon the Envoy listener
type i.e. TCP proxy or HTTP proxy that gets invoked, one of these metrics
will be incremented.
Additionally, in case of a TCP proxy listener in order to see the IP address of
the external service that is blocked or allowed via BlackHole or Passthrough
cluster, you will need to add the destination_ip
label to the
istio_tcp_connections_closed_total
metric. In this scenario, the host name of
the external service is not captured. This label is not added by default and can
be easily added by augmenting the Istio configuration for attribute generation
and Prometheus handler. You should be careful about cardinality explosion in
time series if you have many services with non-stable IP addresses.
PassthroughCluster metrics
This section explains the metrics and the labels emitted based on the listener type invoked in Envoy.
- HTTP proxy listener: This happens when the port of the external service is
same as one of the service ports defined in the cluster. In this scenario,
when the PassthroughCluster is hit,
istio_requests_total
will get increased like this:
{
"metric": {
"__name__": "istio_requests_total",
"connection_security_policy": "unknown",
"destination_app": "unknown",
"destination_principal": "unknown",
"destination_service": "httpbin.org",
"destination_service_name": "PassthroughCluster",
"destination_service_namespace": "unknown",
"destination_version": "unknown",
"destination_workload": "unknown",
"destination_workload_namespace": "unknown",
"instance": "100.96.2.183:42422",
"job": "istio-mesh",
"permissive_response_code": "none",
"permissive_response_policyid": "none",
"reporter": "source",
"request_protocol": "http",
"response_code": "200",
"response_flags": "-",
"source_app": "sleep",
"source_principal": "unknown",
"source_version": "unknown",
"source_workload": "sleep",
"source_workload_namespace": "default"
},
"value": [
1567033080.282,
"1"
]
}
Note that the destination_service_name
label is set to PassthroughCluster to
indicate that this cluster was hit and the destination_service
is set to the
host of the external service.
- TCP proxy virtual listener - If the external service port doesn’t map to any
HTTP based service ports within the cluster, this listener is invoked and
istio_tcp_connections_closed_total
is the metric that will be increased:
{
"status": "success",
"data": {
"resultType": "vector",
"result": [
{
"metric": {
"__name__": "istio_tcp_connections_closed_total",
"connection_security_policy": "unknown",
"destination_app": "unknown",
"destination_ip": "52.22.188.80",
"destination_principal": "unknown",
"destination_service": "unknown",
"destination_service_name": "PassthroughCluster",
"destination_service_namespace": "unknown",
"destination_version": "unknown",
"destination_workload": "unknown",
"destination_workload_namespace": "unknown",
"instance": "100.96.2.183:42422",
"job": "istio-mesh",
"reporter": "source",
"response_flags": "-",
"source_app": "sleep",
"source_principal": "unknown",
"source_version": "unknown",
"source_workload": "sleep",
"source_workload_namespace": "default"
},
"value": [
1567033761.879,
"1"
]
}
]
}
}
In this case, destination_service_name
is set to PassthroughCluster and
the destination_ip
is set to the IP address of the external service.
The destination_ip
label can be used to do a reverse DNS lookup and
get the host name of the external service. As this cluster is passthrough,
other TCP related metrics like istio_tcp_connections_opened_total
,
istio_tcp_received_bytes_total
and istio_tcp_sent_bytes_total
are also
updated.
BlackHoleCluster metrics
Similar to the PassthroughCluster, this section explains the metrics and the labels emitted based on the listener type invoked in Envoy.
- HTTP proxy listener: This happens when the port of the external service is same
as one of the service ports defined in the cluster.
In this scenario, when the BlackHoleCluster is hit,
istio_requests_total
will get increased like this:
{
"metric": {
"__name__": "istio_requests_total",
"connection_security_policy": "unknown",
"destination_app": "unknown",
"destination_principal": "unknown",
"destination_service": "httpbin.org",
"destination_service_name": "BlackHoleCluster",
"destination_service_namespace": "unknown",
"destination_version": "unknown",
"destination_workload": "unknown",
"destination_workload_namespace": "unknown",
"instance": "100.96.2.183:42422",
"job": "istio-mesh",
"permissive_response_code": "none",
"permissive_response_policyid": "none",
"reporter": "source",
"request_protocol": "http",
"response_code": "502",
"response_flags": "-",
"source_app": "sleep",
"source_principal": "unknown",
"source_version": "unknown",
"source_workload": "sleep",
"source_workload_namespace": "default"
},
"value": [
1567034251.717,
"1"
]
}
Note the destination_service_name
label is set to BlackHoleCluster and the
destination_service
to the host name of the external service. The response
code should always be 502
in this case.
- TCP proxy virtual listener - If the external service port doesn’t map to any
HTTP based service ports within the cluster, this listener is invoked and
istio_tcp_connections_closed_total
is the metric that will be increased:
{
"metric": {
"__name__": "istio_tcp_connections_closed_total",
"connection_security_policy": "unknown",
"destination_app": "unknown",
"destination_ip": "52.22.188.80",
"destination_principal": "unknown",
"destination_service": "unknown",
"destination_service_name": "BlackHoleCluster",
"destination_service_namespace": "unknown",
"destination_version": "unknown",
"destination_workload": "unknown",
"destination_workload_namespace": "unknown",
"instance": "100.96.2.183:42422",
"job": "istio-mesh",
"reporter": "source",
"response_flags": "-",
"source_app": "sleep",
"source_principal": "unknown",
"source_version": "unknown",
"source_workload": "sleep",
"source_workload_namespace": "default"
},
"value": [
1567034481.03,
"1"
]
}
Note the destination_ip
label represents the IP address of the external
service and the destination_service_name
is set to BlackHoleCluster
to indicate that this traffic was blocked by the mesh. Is is interesting to
note that for the BlackHole cluster case, other TCP related metrics like
istio_tcp_connections_opened_total
are not increased as there’s no
connection that is ever established.
Monitoring these metrics can help operators easily understand all the external services consumed by the applications in their cluster.