Monitoring Kafka on Kubernetes with Prometheus

Agraj Mangal
10 min readDec 9, 2019

--

This is the second blog is our series of “Running Kafka on Kubernetes” — for context and initial setup, readers are encouraged the read the first entry to be able to setup Apache Kafka on Azure Kubernetes Service with enabled end-to-end encryption.

“Everything fails, all the time !” — Werner Vogels, Amazon CTO

While the probability of failure has decreased, but what Werner said in 2008 still holds true to a great extent, no matter whether you run your application in a on-prem environment or use any public cloud, the virtual machine running your software will eventually fail. And this is why having proper monitoring and alerting mechanisms in place is of quintessential importance. And this is why we see industries adopting concepts like Infrastructure as Code (IAC) & Immutable Infrastructure and are increasingly investing in building SRE and/or DevOps culture in their teams.

Let’s get Started !

In this blog post, we explore how we can use Prometheus & Grafana for monitoring and alerting requirements and configure the Kafka Cluster to expose not only pod level metrics ( Memory, CPU, Available Disk Space, JVM GC Time and Memory Used, Network metrics etc) but also more Kafka contextual metrics like Consumer Lag, Messages produced and consumed per second, Number of Replicas per Topic/Partition, Number of In-Sync Replicas per Topic/Partition etc.

Prometheus

Prometheus joined CNCF in 2016 and become the second hosted project (second only to Kuberenetes) — this should be enough to convince you on the active community behind the project and the fact that it could be considered as a standard for monitoring & alerting. It offers a multi-dimensional data model with time series data identified by metric name and key/value pairs and a flexible query language (PromQL) to select and aggregate tune series data in real time. Some important notes about Prometheus before we proceed with the setup:

  1. Prometheus does not require you to install any agents onto your services/jobs which you want to be monitored, rather it works on a pull-based model — where you expose metrics from your service and it will pull those metrics at a configurable interval.
  2. You can configure Alerting rules/expressions in Prometheus which when evaluate to true, will result in an alert being pushed to another component known as Alert Manager. Alert Manager can then be configured with various notification channels like Pager Duty, Email, Slack channels etc. to send out these alert notifications to the concerned DevOps/SRE teams.
  3. Prometheus does provides a basic Web UI for visualizing these metrics but again the community has mostly converged to use Grafana for visualization requirements.
  4. Kubernetes provides an add-on agent: kube-state-metrics which listens to Kuberenetes API Server and generates and exposes cluster-level metrics about the health of various objects like Deployments, Pods, Nodes etc. kube-state-metrics also exposes a Prometheus compatible endpoint which can easily be configured for consumption.
  5. Prometheus also provides another exporter for hardware and OS level metrics, exposed by *NIX kernels: Node Exporter — you can deploy this as a DaemonSet on your cluster (one node exporter per node) and then expose the metrics using a Service to expose the metrics which will be automatically scrapped by Prometheus if you use the annotation prometheus.io/scrape:'true' — An example for the same can be downloaded from here and applied to your cluster using kubectl Another way to install the same is using a Helm chart
Basic Monitoring Setup

Set it up !

Now that we have a basic understanding of what we want to achieve, let’s start with installing Prometheus & Grafana on our Kubernetes cluster. As always there are multiple options but we will stick with our friendly-neighborhood-operators to help us out with it. Let’s first create the namespace for our monitoring setup

kubectl create namespace monitoring

Step 1: Install the Prometheus Operator

kube-prometheus provides a quick way to deploy Prometheus operator, Prometheus server, Grafana, Highly available Alert Manager, Node Exporter, kube-state-metrics and other helpful tools. I personally feel like losing control when deploying everything under manifests for kube-prometheus so we will take a different way to deploy these components, but your mileage may vary so feel free to deploy everything using kube-prometheus. However if you want to deploy using kube-prometheus on AKS, you must label your nodes correctly. The deployment configuration of some of these manifests include the nodeSelector as

nodeSelector:
kubernetes.io/os: linux

This will not work by default with your AKS Cluster, so you must label your nodes accordingly using kubectl

kubectl label nodes <node-name> kubernetes.io/os=linux

What we’ll do instead for our example is to take things slowly and deploy things one-at-a-time, starting with Prometheus Operator: The Prometheus Operator requires certain Kubernetes RBAC related resources like ServiceAccount , ClusterRole and ClusterRoleBinding in order to be able to authenticate against the API server and watch the custom resources for Prometheus — all the code for this blog post can be found on Github so please clone the repository and follow along

Clone the repo from Github to follow along.

So let’s start by creating ServiceAccount, ClusterRole & ClusterRoleBinding for our Operator in the monitoring namespace

kubectl apply -f monitoring/prometheus-setup/prometheus-operator-service-account.yaml -n monitoring
kubectl apply -f monitoring/prometheus-setup/prometheus-operator-cluster-role.yaml -n monitoring
kubectl apply -f monitoring/prometheus-setup/prometheus-operator-cluster-role-binding.yaml -n monitoring
kubectl apply -f monitoring/prometheus-setup/prometheus-operator-deployment.yaml -n monitoring

Step 2: Create a Service Monitor

Before we go further with deploying Prometheus server, we need to configure Recording & Alerting rules; let’s must take a step back and understand a bit more about how the operator works, what are the custom resources it exposes, how they interact with each other etc.

Source: Sysdig

Service Monitors are one of the Custom Resources offered by the Prometheus Operator and basically provides a way to tell the operator to target new Kubernetes Services for scraping by Prometheus. A ServiceMonitor should be deployed in the same namespace where Prometheus is deployed, although it could target services deployed in other namespaces also using namespaceSelector property. Here is the truncated version of the Service Monitor yaml used to scrape Kafka related metrics from the earlier deployed Kafka Cluster

Complete Code: Here

Couple of things to note here:

  1. spec.selector.matchLabels is used to target the service by matching labels.
  2. spec.selector.namespaceSelector.matchNames is used to select the namespace in which the service is running.
kubectl apply -f monitoring/strimzi-service-monitor.yaml -n monitoring

Step 3: Setup some Prometheus Rules

Another Custom Resource PrometheusRule can be used to configure alert rules in Prometheus. Here is how one looks

Complete Code: here
kubectl apply -f prometheus-rules.yaml -n monitoring

Step 4: Edit the Kafka Cluster

Now is the time to edit the existing Kafka cluster ( we created in the earlier blog post ) and add the following items:

Step 4 a) Add Metrics for Kafka Brokers & Zookeepers

Complete code: here

Above is a snippet of how metrics are added to Kafka Brokers and Zookeeper. Prometheus offers four core metric types — Counter, Gauge, Histogram and Summary

Step 4 b) Kafka Exporter

While Prometheus has a JMX exporter that is configured to scrape and expose mBeans of a JMX target, Kafka Exporter is an open source project used to enhance monitoring of Apache Kafka brokers and clients by extracting additional metrics data from Kafka brokers related to offsets, consumer groups, consumer lag, and topics. Some useful examples of alerts that can be created based on these metrics are:

  • Under Replicated Partitions of a Topic
  • ConsumerGroup Lag is too large = Very Slow Consumers, falling behind producers
  • An alert can be created if there is no message on a topic for a configurable amount of time

Add the following snippet after spec.zookeeper to tls-kafka.yaml

Complete tls-kafka.yaml code here

Time to update our existing Kafka Cluster:

kubectl apply -f monitoring/tls-kafka.yaml

Kafka Exporter enhances monitoring of Apache Kafka brokers and clients by extracting additional metrics data from Kafka brokers related to offsets, consumer groups, consumer lag, and topics.

Step 5: Create the Prometheus Server

Now that we have the Operator running, we will create the custom resource for Prometheus specifying some config and the Operator will spin up a Deployment to manage the Prometheus server instance.

Complete Code — here

The above gist only shows the important piece of the puzzle — the Prometheus Custom Resource. The complete code is present here.

Before we deploy this, it is recommended to add some additional scraping targets for cAdvisor, kubelet, pods & services. They can be added to the Prometheus resource using spec.additionalScrapeConfigs property. To do so, we need to create the following Kubernetes Secret

kubectl create secret generic additional-scrape-configs --from-file=prometheus-additional.yaml -n monitoring

Now let’s deploy the Prometheus Server

kubectl apply -f monitoring/prometheus.yaml -n monitoring

And that’s it. The operator does the rest — creates the Prometheus server instance and a couple of services and we can now simply do a port-forward and open up the basic Web UI offered by Prometheus

kubectl port-forward prometheus-prometheus-0 9090:9090 -n monitoring

Now point your browser to http://localhost:9090 and you would see something similar to the below:

Local Prometheus Server

This UI can also be used to see configured Alerts, how many of them are active and and details about when they were fired.

Alerts UI
Details of Alert being Fired

We will see in subsequent sections on how to configure the Alert Manager to send out these alerts/notifications and how a much better visualization tool can be used to see these metrics in action !

Alert Manager

Alert Manager is responsible for passing on the generated alerts to the configured notification receivers. Examples of receivers could be chat clients like Microsoft Teams, Slack or on-call notification systems like OpsGenie or PagerDuty. It provides a way to integrate with your own notification/alerting systems via Webhooks as well.

Alert Manager Overview

So AlertManager is responsible for grouping related alerts, silencing and inhibiting alerts if related alerts already occurred, de-duplicating alerts and finally distributing to configured receivers. Following Prometheus Operator guidelines, to create and configure AlertManager, we need

  1. AlertManager Custom Resource specifying a name and the number of replicas needed for the StatefulSet it deploys
  2. A Kuberenetes Secret holding the configuration for AlertManager — the secret must be named as alertmanager-{name} where name is the name of the Custom Resource created in the first step. This secret is created from a YAML file specifying the configuration of the Alert Manager — which may include the routes , receivers , inhibit_rules , templates for the alerts.
  3. A Kubernetes Service which exposes the AlertManager so that it can interact with Prometheus and receive alerts.

Since the AlertManager pod will not start up until the secret alertmanager-{name} is deployed so we start by creating the secret from our config file:

Slack Receiver Example for Alert Manager
kubectl create secret generic alertmanager-alertmanager --from-file=monitoring/alertmanager-setup/alertmanager.yaml -n monitoring

Important to note that the config file for alertmanager must be named alertmanager.yaml — nothing else would work. The secret alertmanager-{name} must be created from a config file named alertmanager.yaml

Next, we create the Custom Resource for AlertManager

kubectl apply -f monitoring/alertmanager-setup/alert-manager.yaml

That’s it — no need to connect Prometheus with AlertManager. The Operator will take care of doing that for you. See following as an example of a very raw alert being posted to the configured slack channel. Of course, you have the option to specify a template and custom image to format the message body.

Raw Alert messages being posted to slack !

Grafana

Grafana is an open-source visualization framework that could be used to work with disparate data sources including Prometheus. Since Grafana is not really a part of Prometheus stack, so the prometheus operator does not offer a CRD to install or manage Grafana and its connections to Prometheus, rather you can simply install Grafana on AKS using a Deployment and a Service

Sample Grafana Deployment

Install this in your AKS cluster using

kubectl apply -f monitoring/grafana.yaml -n monitoring

This deploys the Grafana server and exposes it via a Service — you can access the GUI using port-forwarding

kubectl port-forward grafana-85d856cf58-s5m4g 3000:3000 -n monitoring

Go ahead and access the Grafana UI at http://localhost:3000 and login with default admin/admin credentials — change the admin password on first login and then configure Prometheus DataSource using the prometheus-operated Kubernetes Service we created along with Prometheus Custom Resource. See the list of services using kubectl

Select Add Data Source → Prometheus → enter the details as below

Import Dashboards

The awesome community at Strimzi provides us with certain quickstart/example dashboards for Grafana which can be used as a starting point to explore the metrics

  1. Strimzi-Kafka — Shows basic metrics for Kafka Brokers
  2. Strimzi-Zookeeper — Shows metrics related to Zookeeper pods
  3. Kafka-Exporter — Metrics exported using Kafka Exporter.

Import these dashboards in Grafana and start playing around.

Strimzi Kafka Dashboard
Kafka Exporter Dashboard

Performance Testing

You can also run the Kafka Performance Test Tools kafka-producer-perf-test.sh and kafka-consumer-perf-test.sh —as an example to do so, exec into the configured kafka clients and execute

bin/kafka-producer-perf-test.sh --topic test-one-rep --num-records 50000000 --record-size 100 --throughput 1 --producer-props acks=1 bootstrap.servers=my-tls-cluster-kafka-bootstrap.tls-kafka:9093 buffer.memory=67108864 batch.size=8196 --producer.config /opt/kafka/config/ssl-config.properties

Here are some benchmark commands that you might find helpful.

Conclusion

I hope this article helps you with setting up your monitoring environment using Prometheus and enable you to be more proactive in dealing with the infrastructure you work with.

References

--

--