How To Install Apache Hadoop on Ubuntu 14.04

r00t June 13, 2015

Install Apache Hadoop on Ubuntu 14.04

This tutorial will show you how to install Apache Hadoop on Ubuntu 14.04. For those of you who didn’t know, Apache Hadoop is a an open-source software framework written in Java for distributed storage and distributed process, it handles very large size of data sets by distributing it across computer clusters. Rather than rely on hardware to deliver high-availability, the library itself is designed to detect and handle failures at the application layer, so delivering a highly-available service on top of a cluster of computers, each of which may be prone to failures.

This article assumes you have at least basic knowledge of linux, know how to use the shell, and most importantly, you host your site on your own VPS. The installation is quite simple. I will show you through the step by step installation Apache Hadoop on Ubuntu 14.04.

Install Apache Hadoop on Ubuntu 14.04

Step 1. Install Java (OpenJDK).

Since hadoop is based on java, make sure you have java jdk installed on the system. If you don’t have Java installed on your system, use following link to install it first.

Step 2. Disabling IPv6.

As of now Hadoop does not support IPv6, and is tested to work only on IPv4 networks. If you are using IPv6, you need to switch Hadoop host machines to use IPv4:

Add these 3 lines at the end of the file:

Step 3. Install Apache Hadoop.

To avoid security issues, we recommend to setup new Hadoop user group and user account to deal with all Hadoop related activities, following command:

After creating user, it also required to set up key based ssh to its own account. To do this use execute following commands:

Download the latest stable version of Apache Hadoop, At the moment of writing this article it is version 2.7.0:

Step 4. Configure Apache Hadoop.

Setup Hadoop environment variables. Edit ~/.bashrc file and append following values at end of file:

Apply environmental variables to current running session:

Now edit $HADOOP_HOME/etc/hadoop/hadoop-env.sh file and set JAVA_HOME environment variable:

Hadoop has many of configuration files, which need to configure as per requirements of your hadoop infrastructure. Lets start with the configuration with basic hadoop single node cluster setup:

Edit core-site.xml:

Edit hdfs-site.xml:

Edit mapred-site.xml:

Edit yarn-site.xml:

Now format namenode using following command, do not forget to check the storage directory:

Start all hadoop services use the following command:

You should observe the output to ascertain that it tries to start datanode on slave nodes one by one. To check if all services are started well use ‘jps‘ command:

Step 5. Accessing Apache Hadoop.

Apache Hadoop will be available on HTTP port 8088 and port 50070 by default. Open your favorite browser and navigate to http://yourdomain.com:50070 or http://server-ip:50070. If you are using a firewall, please open port 8088 and 50070 to enable access to the control panel.

Install Apache Hadoop on Ubuntu 14.04

Browse the web interface for the ResourceManager by default it is available at http://yourdomain.com:8088 or http://server-ip:8088:

Install Apache Hadoop on Ubuntu 14.04

Congratulation’s! You have successfully installed Apache Hadoop. Thanks for using this tutorial for installing Apache Hadoop on Ubuntu 14.04 system. For additional help or useful information, we recommend you to check the official Apache Hadoop web site.

VPS Manage Service Offer
If you don’t have time to do all of this stuff, or if this is not your area of expertise, we offer a service to do “VPS Manage Service Offer”, starting from $10 (Paypal payment). Please contact us to get a best deal!

Save

Share on Google+3Share on Facebook6Tweet about this on TwitterShare on Tumblr0Share on StumbleUpon0Share on Reddit1Pin on Pinterest1
The Tags:

Leave a Comment