Aseem Puri's Blog: Hadoop – HBase Installation On Windows

Hadoop and base are designed to work on Linux. But we can install on Windows also. Here I will explain how to install Hadoop and Hbase on Windows.

SOFTWARE INSTALLATION

1. Prerequisite

1. Hadoop-0.19.1
2. HBase-0.19.0
3. Cygwin
4. Java 1.6
5. Eclipse Europa 3.3.2

2. Downloads

Software Links

a. Hadoop-0.19.1: http://hadoop.apache.org/core/releases.htm
b. Hbase-0.19.0: http://hadoop.apache.org/hbase/releases.html
c. Cygwin: http://www.cygwin.com/
d. Java 1.6: http://java.sun.com/javase/downloads/index.jsp
e. Eclipse Europa 3.3.2: http://www.eclipse.org/downloads/packages/release/europa/winter, http://archive.eclipse.org/eclipse/downloads/

3. Creating Local Account(optional, better to have a separate local account for Hadoop)

1. Right click on my computer and click manage and create new user account with name ‘HadoopAdmin’ and give some password
2. Go to its local user and groups -> users -> HadoopAdmin -> properties and give him administrator rights.
3. Log off and log in with HadoopAdmin account.

4. Install Cygwin

1. Download Cygwin installer.
2. Run the downloaded file.
3. Keep pressing the 'Next' button until you see the package selection screen.
4. Click the little View button for "Full" view
5. Find the package "openssh", click on the word "skip"
6. After you selected these packages press the 'Next' button to complete the installation

7. Set Environment Variables
i. Find "My Computer" icon either on the desktop or in the start menu, right-click on it and select Properties item from the menu.
ii. When you see the Properties dialog box, click on the Environment Variables button
iii. When Environment Variables dialog shows up, click on the Path variable located in the System Variables box and then click the Edit button.
iv. When Edit dialog appears append the following text to the end of the Variable value field: “;C:\cygwin\bin;C:\cygwin\usr\bin”
v. Close all three dialog boxes by pressing OK button of each dialog box.

5. Setup SSH daemon

Both Hadoop scripts and Eclipse plug-in need password-less SSH to operate. This section describes how to set it up in the Cygwin environment.

5.1 Configure ssh daemon

1. Open the Cygwin command prompt.
2. Execute the following command: “ssh-host-config”
3. When asked if privilege separation should be used, answer no.
4. When asked if sshd should be installed as a service, answer yes.
5. When asked about the value of CYGWIN environment variable, enter ntsec.

5.2 Start SSH daemon

Find My Computer icon either on your desktop or in the start-up menu, right-click on it and select Manage from the context menu.

1. Open Services and Applications in the left-hand panel then select the Services item.
2. Find the CYGWIN sshd item in the main section and right-click on it.
3. Select Start from the context menu.

5.3 Setup authorization keys

Eclipse plug-in and Hadoop scripts require ssh authentication to be performed through authorization keys rather than passwords. The following steps describe how authorization keys are set up.

1. Open Cygwin command prompt
2. Execute the following command to generate keys: “ssh-keygen”
3. When prompted for filenames and pass phrases press ENTER to accept default values.
4. After the command has finished generating keys, enter the following command to change into your .ssh directory: “cd ~/.ssh”
5. Check if the keys were indeed generated by executing the following command: “ls –l”
6. You should see two files id_rsa.pub and id_rsa with recent creation dates. These files contain authorization keys.
7. To register the new authorization keys enter the following command (note the sharply-angled double brackets -- they are very important): “cat id_rsa.pub >> authorized_keys”
8. Now check if the keys were set up correctly by executing the following command: “ssh localhost”
Since it is a new ssh installation, you will be warned that authenticity of the host could not be established and will be asked whether you really want to connect. Answer yes and press ENTER. You should see the Cygwin prompt again, which means that you have successfully connected.
9. Now execute the command again: “ssh localhost”. This time you should not be prompted for anything.

6. Install java

1. Download Java 1.6 installer.
2. Run the downloaded file.
3. Change the installation path to: “C:\cygwin\home\HadoopAdmin\java”
4. Click finish when installation is complete.

7. Download and Extract Hadoop-0.19.1 and HBase-0.19.0

1. Download hadoop-0.19.1.tar.gz and hbase-0.19.0.tar.gz and place in some folder on your computer.
2. Right click on them and click Extract Files.
3. Give the destination path as: “C:\cygwin\home\HadoopAdmin”

CONFIGURE HADOOP

1. Supported modes

Hadoop runs in one of three modes:

* Standalone: All Hadoop functionality runs in one Java process. This works “out of the box” and is trivial to use on any platform, Windows included.
* Pseudo-Distributed: Hadoop functionality all runs on the local machine but the various components will run as separate processes. This is much more like “real” Hadoop and does require some configuration as well as SSH. It does not, however, permit distributed storage or processing across multiple machines.
* Fully Distributed: Hadoop functionality is distributed across a “cluster” of machines. Each machine participates in somewhat different (and occasionally overlapping) roles. This allows multiple machines to contribute processing power and storage to the cluster.

This post focuses on the Fully Distributed mode of Hadoop. If you want to try other modes then Hadoop Quick start can get you started on Standalone mode and Pseudo-Distributed.

2. Configure your hosts file (All machines)

Open your Windows hosts file located at c:\windows\system32\drivers\etc\hosts (the file is named “hosts” with no extension) in a text editor and add the following lines (replacing the NNNs with the IP addresses of both master and slave):
NNN.NNN.NNN.NNN master
NNN.NNN.NNN.NNN slave

Save the file. This step isn’t strictly necessary but it will make easier if your computers change IP’s.

3. Generate public/private key pairs (All Machines)

Hadoop uses SSH to allow the master computer(s) in a cluster to start and stop processes on the slave computers. It supports several modes of secure authentication: you can use passwords or you can use public/private keys to connect without passwords (”passwordless”).

1. To generate a key pair, open Cygwin and issue the following commands ($> is the command prompt):

a) $> ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa

b) $> cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys

2. Now, you should be able to SSH into your local machine using the following command:
$> ssh localhost

3. When prompted for your password, enter it. You’ll see something like the following in your Cygwin terminal.

HadoopAdmin@localhost's password:

Last login: Mon Apr 6 11:44:01 2009 from master

4. To quit the SSH session and go back to your regular terminal, use:

$> exit

Note: Make sure to do this on all computers in your cluster.

4. Exchange public keys

Now that you have public and private key pairs on each machine in your cluster, you need to share your public keys around to permit passwordless login from one machine to the other. Once a machine has a public key, it can safely authenticate a request from a remote machine that is encrypted using the private key that matches that public key.

On the master issue the following command in cygwin (where “” is the username you use to login to Windows on the slave computer):

$> scp ~/.ssh/id_dsa.pub @slave:~/.ssh/master-key.pub

Example:

$> scp ~/.ssh/id_dsa.pub HadoopAdmin@slave:~/.ssh/master-key.pub

Enter your password when prompted. This will copy your public key file in use on the master to the slave.

On the slave, issue the following command in cygwin:

$> cat ~/.ssh/master-key.pub >> ~/.ssh/authorized_keys

This will append your public key to the set of authorized keys the slave accepts for authentication purposes.

Back on the master, test this out by issuing the following command in cygwin:

$> ssh HadoopAdmin@slave

If all is well, you should be logged into the slave computer with no password required.

Note: Repeat this process in reverse, copying the slave’s public key to the master. Also, make sure to exchange public keys between the master and any other slaves that may be in your cluster.

Remarks: It’s better to have Hadoop Namenode/Jobtracker and HBase Master on one node. So there will less exchange of keys because if there are two masters then we have to repeat the process two times.

5. Configure hadoop-env.sh (All Machines)

The conf/hadoop-env.sh file is a shell script that sets up various environment variables that Hadoop needs to run.

1. Open hadoop-0.19.1 folder
2. Open conf/hadoop-env.sh in a text editor. Look for the line that starts with “#export JAVA_HOME”.
3. Change that line to something like the following:
“export JAVA_HOME=/home/HadoopAdmin/Java/jdk1.6.0_11”
Note: This should be the home directory of your Java installation. Note that you need to remove the leading “#” (comment) symbol

6 Configure hadoop-site.xml (All Machines)

The conf/hadoop-site.xml file is basically a properties file that lets you configure all sorts of HDFS and MapReduce parameters on a per-machine basis.

1. Open hadoop-0.19.1 folder
2. Open conf/hadoop-env.sh in a text editor.
3. Insert the following lines between and tags.

fs.default.name
hdfs://master:9000

mapred.job.tracker
master:9001

dfs.data.dir
/home/hadoop/dfs/data
true

dfs.name.dir
/home/hadoop/dfs/name
true

hadoop.tmp.dir
/tmp/hadoop
true

mapred.system.dir
/hadoop/mapred/system
true

dfs.replication
3

7. Important Directories

1. fs.default.name - This is the URI (protocol specifier, hostname, and port) that describes the NameNode for the cluster. Each node in the system on which Hadoop is expected to operate needs to know the address of the NameNode. The DataNode instances will register with this NameNode, and make their data available through it. Individual client programs will connect to this address to retrieve the locations of actual file blocks.
2. dfs.data.dir - This is the path on the local file system in which the DataNode instance should store its data. It is not necessary that all DataNode instances store their data under the same local path prefix, as they will all be on separate machines; it is acceptable that these machines are heterogeneous. However, it will simplify configuration if this directory is standardized throughout the system. By default, Hadoop will place this under /tmp. This is fine for testing purposes, but is an easy way to lose actual data in a production system, and thus must be overridden.
3. dfs.name.dir - This is the path on the local file system of the NameNode instance where the NameNode metadata is stored. It is only used by the NameNode instance to find its information, and does not exist on the DataNodes. The caveat above about /tmp applies to this as well; this setting must be overridden in a production system.
4. dfs.replication - This is the default replication factor for each block of data in the file system. For a production cluster, this should usually be left at its default value of 3.
5. mapred.system.dir - Hadoop's default installation is designed to work for standalone operation, which does not use HDFS. Thus it conflates HDFS and local file system paths. When enabling HDFS, however, MapReduce will store shared information about jobs in mapred.system.dir on the DFS

8. Configure slaves file (Master only)

The conf/slaves file tells the master where it can find slaves to do work.

1. Open hadoop-0.19.1 folder

2. Open conf/slaves in a text editor. It will probably have one line which says “localhost”.

3. Replace that with the following:
master
slave

9. Format the namenode

Next step is to format the Namenode to create a Hadoop Distributed File System (HDFS).

1. Open a new Cygwin window.

2. Execute the following commands:

a. cd hadoop-0.19.1

b. mkdir logs

c. bin/hadoop namenode -format

10. Starting your cluster (Master Only)

1. Open a new Cygwin window.
2. To fully start your cluster, execute the following commands:
1. cd hadoop-0.19.1
2. $ bin/start-all.sh

Browse the web interface for the NameNode and the JobTracker; they are available at:

* NameNode - http://master:50070/
* JobTracker - http://master:50030/

11 Stop your cluster

When you're done, stop the daemons with:

$ bin/stop-all.sh

CONFIGURE HBASE

1. Configure hbase-env.sh (All Machines)

The conf/hbase-env.sh file is a shell script that sets up various environment variables that Hbase needs to run.

1. Open hbase-0.19.0 folder

2. Open conf/hbase-env.sh in a text editor. Look for the line that starts with “#export JAVA_HOME”.

3. Change that line to something like the following:

4. “export JAVA_HOME=/home/HadoopAdmin/Java/jdk1.6.0_11”

Note: This should be the home directory of your Java installation. Note that you need to remove the leading “#” (comment) symbol

2. Configure hbase-site.xml (All Machines)

The conf/hbase-site.xml file is basically a properties file that lets you configure all sorts of HDFS and MapReduce parameters on a per-machine basis.

1. Open hbase-0.19.0 folder
2. Open conf/hbase-env.sh in a text editor.
3. Insert the following lines between and tags.

hbase.rootdir
hdfs://master:9000/hbase
The directory shared by region servers.

hbase.master
master:60000
The host and port that the HBase master runs at.

hbase.regionserver
master:60020
The host and port a HBase region server runs at.

3. Configure regionservers file

The conf/ regionservers file tells the master where it can find slaves to do work.

1. Open hbase-0.19.0 folder
2. Open conf/ regionservers in a text editor. It will probably have one line which says “localhost”.
3. Replace that with the following:
master
slave

4. Starting HBase

1. Open a new Cygwin window.
2. To fully start your cluster, execute the following commands:
a. cd hbase-0.19.0

b. $ bin/start-hbase.sh

Browse the web interface for the Master and the Regionserver; they are available at:

* HBase Master - http://master:60010/
* HBase Regionserver - http://master:60030/

5. HBase Shell

Once HBase has started, enter $ bin/hbase shell to obtain a shell against HBase from which you can execute commands. Test your installation by creating, viewing, and dropping.

6. Stop HBase

To stop HBase, exit the HBase shell and enter:
$ bin/stop-hbase.sh

Getting Started With Eclipse

1. Downloading and Installing

1. Download eclipse-SDK-3.3.2-win32.zip and place in some folder on your computer.
2. Right click on them and click Extract Files.
3. Give the destination path as: “C:\”

2. Installing the Hadoop MapReduce Plug-in

1. Open hadoop-0.19.1 folder
2. In the hadoop-0.18.0/contrib/eclipse-plugin directory on this CD, you will find a file named hadoop-0.18.0-eclipse-plugin.jar.
3. Copy this into the “C:\eclipse\plugins” subdirectory of Eclipse.

3. Configuring the MapReduce Plug-in

1. Start Eclipse and choose a workspace directory. If you are presented with a "welcome" screen, click the button that says "Go to the Workbench." The Workbench is the main view of Eclipse, where you can write source code, launch programs, and manage your projects.
2. Switch to the MapReduce perspective. In the upper-right corner of the workbench, click the "Open Perspective" button
3. Select "Other," followed by "Map/Reduce" in the window that opens up. At first, nothing may appear to change. In the menu, choose Window * Show View * Other. Under "MapReduce Tools," select "Map/Reduce Locations." This should make a new panel visible at the bottom of the screen, next to Problems and Tasks.
4. Add the Server. In the Map/Reduce Locations panel, click on the elephant logo in the upper-right corner to add a new server to Eclipse.
5. You will now be asked to fill in a number of parameters identifying the server. To connect to Hadoop Server, the values are:
a. Location name: (Any descriptive name you want; e.g., "HDFS")
b. Map/Reduce Master Host: master
c. Map/Reduce Master Port: 9001
d. DFS Master Port: 9000
e. User name: HadoopAdmin

6. Next, click on the "Advanced" tab. There are two settings here which must be changed.
7. When you are done, click "Finish." Your server will now appear in the Map/Reduce Locations panel. If you look in the Project Explorer (upper-left corner of Eclipse), you will see that the MapReduce plug-in has added the ability to browse HDFS. Click the [+] buttons to expand the directory tree to see any files already there. If you inserted files into HDFS yourself, they will be visible in this tree.

19 comments:

sushilMay 3, 2012 at 11:20 PM
I am not able to configure hbase on windows but hadoop is working fine. When I run hbase shell then it does not show the hbase prompt and hngs for a long time. Could you please what's the problem.
UnknownApril 27, 2017 at 9:39 PM
Great and helpful to everyone.. installation steps are very clear so easy to understand

big data training | hadoop training in chennai | big data training in chennai
UnknownApril 27, 2017 at 11:17 PM
Great and helpful blog to everyone.. Installation procedure are very clear and step by so easy to understand.. All installation commands are very clear and i learnt installation procedure easily form this blog so i install hadoop in my system very quickly.. thanks a lot for sharing this blog to us...

big data training and course syllabus | hadoop training topics | big data training topics
MounikaSeptember 10, 2018 at 4:05 AM
After reading your post I understood that last week was with full of surprises and happiness for you. Congratz! Even though the website is work related, you can update small events in your life and share your happiness with us too.
python training institute in chennai
python training in Bangalore
python training in pune
python online training
jeyanthiSeptember 13, 2018 at 11:08 PM
Really you have done great job,There are may person searching about that now they will find enough resources by your post

Data Science training in marathahalli
Data Science training in btm
Data Science training in rajaji nagar
Data Science training in chennai
Data Science training in electronic city
Data Science training in USA
Data science training in pune
Data science training in kalyan nagar

UnknownSeptember 17, 2018 at 11:07 PM
Thank you for sharing such great information with us. I really appreciate everything that you’ve done here and am glad to know that you really care about the world that we live in
Blueprism online training

Blue Prism Training in Pune
UnknownSeptember 21, 2018 at 2:26 AM
I believe there are many more pleasurable opportunities ahead for individuals that looked at your site.
Data Science training in rajaji nagar | Data Science with Python training in chenni
Data Science training in electronic city | Data Science training in USA
Data science training in pune | Data science training in kalyan nagar

UnknownOctober 3, 2018 at 2:23 AM
All the points you described so beautiful. Every time i read your i blog and i am so surprised that how you can write so well.

Blueprism training in tambaram

Blueprism training in annanagar
UnknownOctober 3, 2018 at 5:22 AM
Nice and good article. It is very useful for me to learn and understand easily. Thanks for sharing your valuable information and time. Please keep updating Hadoop administration Online Training Hyderabad
UnknownOctober 5, 2018 at 2:14 AM
I really appreciate this post. I’ve been looking all over for this! Thank goodness I found it on Bing. You’ve made my day! Thx again!
java training in chennai

java training in marathahalli | java training in btm layout
Amazon Web ServicesOctober 12, 2018 at 9:34 PM
Whoa! I’m enjoying the template/theme of this website. It’s simple, yet effective. A lot of times it’s very hard to get that “perfect balance” between superb usability and visual appeal. I must say you’ve done a very good job with this.

Selenium Interview Questions and Answers

Best Selenium Training in Chennai | Selenium Training Institute in Chennai | Besant Technologies

Selenium Training in Bangalore | Best Selenium Training in Bangalore

Best AWS Training in Marathahalli | AWS Training in Marathahalli
Amazon Web Services Training in Anna Nagar, Chennai |Best AWS Training in Anna Nagar, Chennai
Sumaya ManzoorDecember 15, 2018 at 1:47 AM
Very informative!! Thanks for sharing with us.

selenium testing course in chennai
selenium course
iOS Training in Chennai
French Classes in Chennai
web designing training in chennai
Big Data Training in Chennai
JAVA J2EE Training in Chennai
JAVA J2EE Training Institutes in Chennai
java training course in chennai
haripriyaMarch 6, 2019 at 11:18 PM
Wonderful article, very useful and well explanation. Your post is extremely incredible. I will refer this to my candidates...

Microsoft Azure online training
Selenium online training
Java online training
Java Script online training
Share Point online training

AjishJuly 18, 2019 at 6:26 AM
Good Post! Thank you so much for sharing this pretty post, it was so good to read and useful to improve my knowledge as updated one, keep blogging.
blue prism Training in Electronic City
divyaJune 6, 2020 at 7:44 PM
Nice article pure informative and knowledgeable thank you for sharing it. website development in Singaporethansk
Ai & Artificial Intelligence Course in Chennai
PHP Training in Chennai
Ethical Hacking Course in Chennai Blue Prism Training in Chennai
UiPath Training in Chennai
lavanyaJuly 26, 2020 at 12:34 AM
Thanks Admin for sharing such a useful post, I hope it’s useful to many individuals for developing their skill to get good career.
Java training in Chennai

Java Online training in Chennai

Java Course in Chennai

Best JAVA Training Institutes in Chennai

Java training in Bangalore

Java training in Hyderabad

Java Training in Coimbatore

Java Training

Java Online Training
ramJanuary 14, 2021 at 1:47 AM
Thank you for posting informative insights, I think we have got some more information to share with! Do check out
oracle training in chennai and let us know your thoughts. Let’s have great learning!
Rajendra CholanAugust 11, 2021 at 12:47 AM
Data Science Training Institute in Chennai | InfycleTechnologies

Don’t miss this Infycle Education feast!! Special menu like updated Java, Python, Big Data, Oracle, AWS, and more than 20 software-related courses. Just get Data Science from the best Data Science Training Institute in Chennai, Infycle Technologies, which helps to recreate your life. It can help to change your boring job into a pep-up energetic job because, in this feast, you can top-up your knowledge. To enjoy this Data Science training in Chennai, just make a call to 7502633633.

Best training with job placements
Rajendra CholanSeptember 15, 2021 at 1:27 AM

Don’t miss this Infycle Education feast!! Special menu like updated Java, Python, Big Data, Oracle, AWS, and more than 20 software-related courses. Just get Data Science from the best Data Science Training Institute in Chennai, Infycle Technologies, which helps to recreate your life. It can help to change your boring job into a pep-up energetic job because, in this feast, you can top-up your knowledge. To enjoy this Data Science training in Chennai, just make a call to 7502633633.
Best software training in chennai

Thursday, April 8, 2010

Hadoop – HBase Installation On Windows

19 comments: