Monitoring Hadoop with Prometheus and Grafana

Monitoring the performance and health of a Hadoop cluster is crucial for ensuring its efficient operation. Prometheus and Grafana are powerful open-source tools that, when combined, offer a comprehensive monitoring and visualization solution for Hadoop metrics.
In this blog post, we will explore how to set up Prometheus and Grafana to monitor Hadoop metrics, enabling you to gain valuable insights into your cluster’s operation

Prerequisites:
  • A running Hadoop cluster (HDFS and YARN)
  • Prometheus is configured in the machine.

Create a JMX directory in your system.

Inside that directory download the jmx_prometheus_javaagent-0.13.0.jar file.

We will add all the Yaml files in this directory.

 

Hadoop Metrics:

 

1. Namenode,Datanode, Node manager, ResourceManager
  • Configure the below changes in respective nodes where each component  is present
  • Create a namenode.yml file
  • Similarly, create yml files for other components
---

startDelaySeconds: 0

hostPort: <hostname>: <jmx-port>

ssl: false

lowercaseOutputName: false

lowercaseOutputLabelNames: false
  • hostname: machine hostname
  • jmx-port: you can use any set of integer values for jmx port number

We will pass this yaml file in the command of hdfs which will help us to scrape metrics for name node.

nano hadoop-2.10.2/bin/hdfs

Add the below line at the top by editing the hdfs command 

export HADOOP_NAMENODE_OPTS="$HADOOP_NAMENODE_OPTS -javaagent:/path/to/jmx-folder/jmx/jmx_prometheus_javaagent-0.13.0.jar=<port>:/path/to/jmx-folder/jmx/namenode.yml"

 

  • port: the port you want to use to expose Prometheus metrics

Add the below line at the top in Hadoop-env. sh configuration file

export HADOOP_NAMENODE_OPTS="-Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.local.only=false -Dcom.sun.management.jmxremote.port=<jmx-port> $HADOOP_NAMENODE_OPTS "
  • jmx-port: you can use any set of integer values for jmx port number. Just make sure to reflect the same in the namenode.yml file.

 

Now add the targets to scrape the metrics from, add the target port that you provided not the JMX port

  • Restart the name node to get metrics from Prometheus.
  • If you have started Hadoop using start-dfs.sh then at the end you can stop and start to get metrics

 

2. Datanode Metrics
  • Configure the below changes in respective nodes where the data node is present
  • Create a datanode.yml file as mentioned in Namenode Metrics.
  • We will pass this yaml file in the command of hdfs which will help us to scrape metrics for the data node.
nano hadoop-2.10.2/bin/hdfs

Add the below line at the top by editing the hdfs command 

export HADOOP_DATANODE_OPTS="$HADOOP_DATANODE_OPTS -javaagent:/path/to/jmx-folder/jmx/jmx_prometheus_javaagent-0.13.0.jar=<port>:/path/to/jmx-folder/jmx/datanode.yml"

Add the below line at the top in Hadoop-env. sh configuration file

export HADOOP_DATANODE_OPTS="-Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.local.only=false -Dcom.sun.management.jmxremote.port=<jmx-port> $HADOOP_DATANODE_OPTS "

Now add the targets to scrape the metrics from, add the target port that you provided not the JMX port

  • Restart the Hadoop data node to get metrics from Prometheus.
  • If you have started Hadoop using start-dfs.sh then at the end you can stop and start to get metrics

 

3.NodeManager Metrics
  • Configure the below changes in respective nodes where the node manager is present.
  • Create a nodemanager.yml file as mentioned in Namenode Metrics.
  • We will pass this yaml file in the command of yarn which will help us to scrape metrics for node manager.
nano hadoop-2.10.2/bin/yarn

Adding the below line to for node manager in yarn of Hadoop binary on data node & task node

export YARN_NODEMANAGER_OPTS="$YARN_NODEMANAGER_OPTS -javaagent:/path/to/jmx-folder/jmx/jmx_prometheus_javaagent-0.13.0.jar=<port>:/path/to/jmx-folder/jmx/nodemanager.yml"

Add the below line at the top in yarn-env.sh configuration file

export YARN_NODEMANAGER_OPTS="-Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.local.only=false -Dcom.sun.management.jmxremote.port=<jmx-port> $YARN_NODEMANAGER_OPTS "

Now add the targets to scrape the metrics from, add the target port that you provided not the JMX port

Restart the Hadoop node manager to get metrics from Prometheus.

 

4.ResourceManager Metrics
  • Configure the below changes in respective nodes where ResourceManager is present.
  • Create a resourcemanager.yml file as mentioned in Namenode Metrics.
  • We will pass this yaml file in the command of yarn which will help us to scrape metrics for the resource manager.
nano hadoop-2.10.2/bin/yarn

Add the below line in the Hadoop yarn command

export YARN_RESOURCEMANAGER_OPTS="$YARN_RESOURCEMANAGER_OPTS -javaagent:/path/to/jmx-folder/jmx/jmx_prometheus_javaagent-0.13.0.jar=6013:/path/to/jmx-folder/jmx/resourcemanager.yml"

Add the below line at the top in yarn-env.sh configuration file

export YARN_RESOURCEMANAGER_OPTS="-Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.local.only=false -Dcom.sun.management.jmxremote.port=<jmx-port> $YARN_RESOURCEMANAGER_OPTS "

Now add the targets to scrape the metrics from, add the target port that you provided not the JMX port

Restart the yarn resource manager to get metrics from Prometheus.

 

Grafana UI
  • Use Grafana’s visualization features to create panels that display specific Hadoop metrics. Metrics to consider include cluster memory usage, CPU utilization, disk I/O, network traffic, job success rates, and more.
  • Utilize Grafana’s query language to extract and aggregate data from Prometheus.

Scroll to Top