Pages

Monday, May 2, 2016

Hadoop single node installation on linux


Hadoop single node installation on  linux



Purpose : Install Hadoop in single machine ,then use put file to Hadoop file system and get those files

Steps :


1.Download and installation of Hadoop
2.Configuration
3.Use Basic command like ls, put,get,

Detailed Steps :

A. Hadoop install

Prerequisites

1.Install JAVA
2.Create hduser in OS
3. Enable SSH

Steps :

1. Download hadoop-2.6.4.tar.gz : http://hadoop.apache.org/releases.html

2. copy to /opt/app

3. tar -xzf hadoop-2.6.4.tar.gz

4.mv hadoop-2.6.4 hadoop

5. chown -R hduser:hduser hadoop

B. Start and Verify

1. edit hadoop-env.sh

Add export JAVA_HOME=

OR edit bash profile with JAVA_HOME

2. go to /opt/app/hadoop/sbin

3. ./start-all.sh (provide , password when required)


[hadoop@sterinlap sbin]$ ./start-all.sh
This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh
Incorrect configuration: namenode address dfs.namenode.servicerpc-address or dfs.namenode.rpc-address is not configured.
Starting namenodes on []
hadoop@localhost's password:
localhost: starting namenode, logging to /opt/app/hadoop/logs/hadoop-hadoop-namenode-sterinlap.out
hadoop@localhost's password:
localhost: starting datanode, logging to /opt/app/hadoop/logs/hadoop-hadoop-datanode-sterinlap.out
Starting secondary namenodes [0.0.0.0]
hadoop@0.0.0.0's password:
0.0.0.0: starting secondarynamenode, logging to /opt/app/hadoop/logs/hadoop-hadoop-secondarynamenode-sterinlap.out
starting yarn daemons
starting resourcemanager, logging to /opt/app/hadoop/logs/yarn-hadoop-resourcemanager-sterinlap.out
hadoop@localhost's password:
localhost: starting nodemanager, logging to /opt/app/hadoop/logs/yarn-hadoop-nodemanager-sterinlap.out

4. Verify the process by

ps -ef | grep hadoop

[hduser@sterinlap sbin]$ ps -ef | grep hadoop
hduser 9745 1 20 11:22 pts/5 00:00:06 /home/sterin/Public/jdk1.8.0_45/bin/java -Dproc_resourcemanager -Xmx1000m -Dhadoop.log.di

hduser 10077 1 23 11:22 ? 00:00:06 /home/sterin/Public/jdk1.8.0_45/bin/java -Dproc_nodemanager

Total : 2 process
5. Verify hadoop command by “hadoop fs -ls”

Location of commands : /opt/app/hadoop/bin

/opt/app/hadoop/bin/hadoop fs -ls


Found 11 items
-rwxr-xr-x 1 oracle oracle 159223 2016-02-12 15:27 container-executor

( Even though this shows current path , just to verify the installation )


6. Stop all Hadoop process

/opt/app/hadoop/sbin/stop-all.sh

Configuration

1. Create tmp directory for hadoop,

under /opt/app/hadoop : /opt/app/hadoop/tmp

2.edit core-site.xml
/opt/app/hadoop/etc/hadoop/core-site.xml
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/opt/app/hadoop/tmp</value>
<description>A base for other temporary directories.</description>
</property>

<property>
<name>fs.default.name</name>
<value>hdfs://localhost:54310</value>
<description>The name of the default file system. A URI whose
scheme and authority determine the FileSystem implementation. The
uri's scheme determines the config property (fs.SCHEME.impl) naming
the FileSystem implementation class. The uri's authority is used to
determine the host, port, etc. for a filesystem.</description>
</property>
</configuration>


3. Create mapred-site.xml

Location :/opt/app/hadoop/etc/hadoop/

cp mapred-site.xml.template mapred-site.xml




4. Edit mapred-site.xml


<configuration>
<property>
<name>mapred.job.tracker</name>
<value>localhost:54311</value>
<description>The host and port that the MapReduce job tracker runs
at. If "local", then jobs are run in-process as a single map
and reduce task.
</description>
</property>
</configuration>




5. Create namenode and the datanode folders

Location : /opt/app/hadoop/

mkdir hadoop_store/hdfs/namenode
mkdir hadoop_store/hdfs/datanode


6. Edit hdfs-site.xml

Location : /opt/app/hadoop/etc/hadoop/

<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
<description>Default block replication.
The actual number of replications can be specified when the file is created.
The default is used if replication is not specified in create time.
</description>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/opt/app/hadoop/hadoop_store/hdfs/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/opt/app/hadoop/hadoop_store/hdfs/datanode</value>
</property>
</configuration>

7. Add path in bashrc ( bash profile under home directory)

edit .bashrc

PATH=$PATH:/opt/app/hadoop/bin


8. Format the hadoop file system

hadoop namenode -format


16/04/25 12:11:03 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG: host = -10-184-37-177.in..com/10.xx.37.177
STARTUP_MSG: args = [-format]
STARTUP_MSG: version = 2.6.4





16/04/25 12:11:05 INFO util.ExitUtil: Exiting with status 0
16/04/25 12:11:05 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at l-10-184-37-177..com/10.xx.37.177





9. Start the hadoop

Location :/opt/app/hadoop/sbin

./start-all.sh ( like step B-2)

10. jps


12882 NameNode
13189 DataNode
14152 NodeManager
13816 ResourceManager
13529 SecondaryNameNode
14394 Jps



11. Checking the file system ( listing the files )


hadoop fs -ls /

It will be blank

12. create new folder


hadoop fs -mkdir /new




13. Verify it :

hadoop fs -ls /
Found 1 items
drwxr-xr-x - sterin supergroup 0 2016-04-25 12:18 /new


14. Put Command
fs -put /tmp/sterin /new

<source> <target>
/tmp/sterin : My local file

/new : In Hadoop

hadoop fs -put /tmp/sterin /new








15 . Get command

hadoop fs -get /new/sterin /home/sterin/Downloads/Chrome






/new/sterin : In hadoop source

/home/sterin/Downloads/Chrome : Local file system



URLs : http://localhost:50070/ web UI of the NameNode daemon

No comments:

Post a Comment