Apache Flink on Kubernetes

Introduction

Apache Flink is a powerful open-source stream processing framework that allows developers to build real-time data pipelines and perform complex analytics on streaming data. With the rise of Kubernetes as a popular container orchestration platform, many organizations are looking to deploy their Flink clusters on Kubernetes for easier management and scalability.

Why deploy Apache Flink over Kubernetes?

  • For Job image distribution
  • Continuous deployment
  • For portability: From localhost to production

Kubernetes is an open-source service for managing container clusters running on different machines. It was developed by the Google team and released in September 2014.

When deploying Flink on Kubernetes, there are two main approaches:

  • Standalone mode
  • Kubernetes Native

Basic Kubernetes Concepts

A Kubernetes cluster consists of a set of worker machines which are called nodes. These worker machines are responsible for running the containerized applications and in every cluster, there is at least one worker node.

A node contains an agent process, which maintains all containers on the node and also manages how these containers are created, started and stopped. It also provides a kube-proxy, which is a server for service discovery, reverse proxy and load balancing. It is also in the node, that we find the docker engine which is used to create and manage containers on the local machine.

We also have a master node which is used to manage clusters. It runs the API server, Controller Manager and Scheduler.

Inside the worker node(s), we have the pods. A pod is the combination of several containers that run on a node. It is the smallest unit in Kubernetes for creating, scheduling, and managing resources.

Kubernetes' Architecture

[Image showing Kubernetes Architecture]

The core concepts of Kubernetes:

  • Pod replicas are managed by the Replication Controller. It guarantees that a certain number of pod replicas are active at all times in a Kubernetes cluster. The Replication Controller starts new containers if the number of pod replicas is less than the desired amount. If not, it destroys the additional containers in order to preserve the desired number of pod replicas.
  • Persistent Volumes (PVs) and Persistent Volume Claims (PVCs) are used for persistent data storage.
  • A Service provides a central service access portal and implements service proxy and discovery.
  • ConfigMap stores the configuration files of user programs and uses etcd as its backend storage.

The Architecture of Flink on Kubernetes

[Image showing Flink on Kubernetes Architecture]

From the figure above, this is the process of running a Flink job on Kubernetes:

  1. Submit a resource description to the Kubernetes cluster.
  2. The master container and the worker containers are started immediately.
  3. The master container starts the Flink master process, which consists of the Flink-containers ResourceManager, the JobManager, and the Program Runner.
  4. The worker containers start the TaskManagers, which in turn register with the ResourceManager.
  5. The JobManager allocates tasks to the containers for execution.

The Basic Cluster

The JobManager:

  • The JobManager is described by a Deployment to ensure that it is executed by the container of a replica and it is labeled as flink-jobmanager.
  • Then we have a JobManager Service which is defined and exposed by using the service name and port number. Usually, the pods are selected based on the JobManager label.

The TaskManager:

  • It is also described by a deployment to ensure that it is executed by the containers of n replicas. One should also define a label for this TaskManager.

Service:

  • The service is used in exposing the JobManager API REST, UI ports and also the JobManager and TaskManagers metrics.
  • Then we have the ServiceMonitor which is used to send the metrics from the service to Prometheus.

Ingress:

  • The ingress is used to access the UI service port.

ConfigMaps:

  • It is used to pass and read configuration settings such as flink-conf.yaml, hdfs-site.xml, logback-console.xml, etc. if required.

The Interaction of Flink on Kubernetes

First, submit the defined resource description files, such as ConfigMap, service description files, and Deployment, to the Kubernetes cluster. Then Kubernetes will automatically complete the subsequent steps.

The Kubernetes cluster will start the Pods and run the programs as per the defined description files.

The following components take part in the interaction process within the Kubernetes cluster:

  • The Deployment ensures that the containers of n replicas run the JobManager and TaskManager and apply the upgrade policy.
  • The service, which uses a label selector to find the JobManager's pod for service exposure.
  • ConfigMap - which is used to mount the /etc/flink directory, which contains the flink-conf.yaml file, to each pod.

Conclusion

There are benefits of deploying Apache Flink on Kubernetes such as portability and job image distribution. But for ease of deployment, it is imperative that you understand the architecture of Flink on Kubernetes and how the entire process. In this article, we have discussed the building blocks to understanding the deployment of Flink on Kubernetes.

Previus Post Next Post

Leave a comments