WCC and Linux: May 2016

Wednesday, May 25, 2016

java.lang.ClassNotFoundException and No FileSystem for scheme: hdfs

java.lang.ClassNotFoundException and No FileSystem for scheme: hdfs exception while connecting to hadoop from application server

Stand alone code to connect to hadoop works fine But when the same code put it in application server ( weblogic ) , it fails . Even though all the jars bundled in ear file

Error 1 :

java.io.IOException: No FileSystem for scheme: hdfs
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1600)
at org.apache.hadoop.fs.FileSystem.access$300(FileSystem.java:69)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:1637)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1619)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:255)
at connector.HadoopService.hadoopHandler(HadoopService.java:62)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingM

Fix :
Edit the code like this

Configuration conf = new Configuration();

conf.set("fs.hdfs.impl", org.apache.hadoop.hdfs.DistributedFileSystem.class.getName());
conf.set("fs.file.impl", org.apache.hadoop.fs.LocalFileSystem.class.getName());

FileSystem hdfs = FileSystem.get(new URI("hdfs://IP:9000"),conf);

Error 2 :

java.lang.ClassNotFoundException: Class org.apache.hadoop.hdfs.DistributedFileSystem
org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2290)
org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2303)
org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:87)
org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2342)
org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2324)
org.apache.hadoop.fs.FileSystem.get(FileSystem.java:351)
org.apache.hadoop.fs.FileSystem.get(FileSystem.java:163)

Fix : add all the jars required specifically in <domain>/lib directory

For hadoop 2.7.2 : required jars ( all are located at common/lib folder )

commons-io-2.4.jar
guava-11.0.2.jar
hadoop-common-2.7.2.jar
htrace-core-3.1.0-incubating.jar
protobuf-java-2.5.0.jar
slf4j-api-1.7.10.jar
commons-logging-1.1.3.jar
hadoop-auth-2.7.2.jar
hadoop-hdfs-2.7.2.jar
log4j-1.2.17.jar

For coludera distribution

1. log4j-1.2.17.jar

Location : http://redrockdigimark.com/apachemirror/logging/log4j/1.2.17/log4j-1.2.17.zip

2. commons-logging-1.0.4.jar

3. guava-r09-jarjar.jar

Location : http://www.java2s.com/Code/JarDownload/guava/guava-r09-jarjar.jar.zip

4. hadoop-core-0.20.2.jar

Adding lib directory is best solution to fix this issue

Monday, May 23, 2016

How to connect to HDFS using java ?

How to connect to HDFS using java ?

Required library files

For coludera distribution

1. log4j-1.2.17.jar

Location : http://redrockdigimark.com/apachemirror/logging/log4j/1.2.17/log4j-1.2.17.zip

2. commons-logging-1.0.4.jar

3. guava-r09-jarjar.jar

Location : http://www.java2s.com/Code/JarDownload/guava/guava-r09-jarjar.jar.zip

4. hadoop-core-0.20.2.jar

For hadoop 2.7.2 : required jars ( all are located at common/lib folder )

commons-io-2.4.jar
guava-11.0.2.jar
hadoop-common-2.7.2.jar
htrace-core-3.1.0-incubating.jar
protobuf-java-2.5.0.jar
slf4j-api-1.7.10.jar
commons-logging-1.1.3.jar
hadoop-auth-2.7.2.jar
hadoop-hdfs-2.7.2.jar
log4j-1.2.17.jar

Location :

In master node , hadoop version will show the core jar to use

[root@oel6 ~]# hadoop version

Hadoop 0.20.2-cdh3u6

Subversion file:///data/1/tmp/topdir/BUILD/hadoop-0.20.2-cdh3u6 -r efb405d2aa54039bdf39e0733cd0bb9423a1eb0a

Compiled by root on Wed Mar 20 13:11:26 PDT 2013

From source with checksum 3277b62b2872d77555cfbc5a202f81c4

This command was run using /usr/lib/hadoop-0.20/hadoop-core-0.20.2-cdh3u6.jar

So use /usr/lib/hadoop-0.20/hadoop-core-0.20.2-cdh3u6.jar

Basic Code :

http://www.folkstalk.com/2013/06/connect-to-hadoop-hdfs-through-java.html

Read FileSystem

import java.io.IOException;

import java.net.URI;

import java.net.URISyntaxException;

import org.apache.hadoop.conf.Configuration;

import org.apache.hadoop.fs.FileStatus;

import org.apache.hadoop.fs.FileSystem;

import org.apache.hadoop.fs.FileUtil;

import org.apache.hadoop.fs.Path;

public class ReadFileSystem {

public static void main(String[] args) throws IOException, URISyntaxException

{

Configuration conf = new Configuration();

FileSystem hdfs = FileSystem.get(new URI("hdfs://IP:9000"),conf);

FileStatus[] fileStatus = hdfs.listStatus(new Path("hdfs://IP:9000/new"));

Path[] paths = FileUtil.stat2Paths(fileStatus);

System.out.println("***** Contents of the Directory *****");

for(Path path : paths)

{

System.out.println(path);

}

hdfs getconf -confKey fs.default.name in server

shows correct dfs location

Sample output :

***** Contents of the Directory *****

hdfs://IP:9000/new/123.txt

hdfs://IP:9000/new/newww.txt

hdfs://IP:9000/new/sterin.txt

hdfs://IP:9000/new/ucm

hdfs://IP:9000/new/valut

hdfs://IP:9000/new/weblayout

Write a file to HDFS

import java.io.BufferedInputStream;

import java.io.FileInputStream;

import java.io.IOException;

import java.io.InputStream;

import java.io.OutputStream;

import java.net.URI;

import java.net.URISyntaxException;

import org.apache.hadoop.conf.Configuration;

import org.apache.hadoop.fs.FileSystem;

import org.apache.hadoop.fs.Path;

import org.apache.hadoop.io.IOUtils;

import org.apache.hadoop.util.Progressable;

public class CopyFileToHDFS {

public static void main(String[] args) throws IOException, URISyntaxException

{

//1. Get the instance of COnfiguration

Configuration configuration = new Configuration();

//2. Create an InputStream to read the data from local file

InputStream inputStream = new BufferedInputStream(new FileInputStream("/tmp/sample.txt"));

//3. Get the HDFS instance

FileSystem hdfs = FileSystem.get(new URI("hdfs://IP:9000"), configuration);

//4. Open a OutputStream to write the data, this can be obtained from the FileSytem

OutputStream outputStream = hdfs.create(new Path("hdfs://IP:9000/forsterin/Hadoop_File.txt"),

new Progressable() {

@Override

public void progress() {

System.out.println("....");

}

});

try

{

IOUtils.copyBytes(inputStream, outputStream, 4096, false);

}

finally

{

IOUtils.closeStream(inputStream);

IOUtils.closeStream(outputStream);

}

Wednesday, May 18, 2016

How to get expired contents and set new expired date using RIDC ?

How to get expired contents and set new expired date using RIDC ?

Purpose : To list all the expired content and set new expired date in bulk . There is no database update is required for this .

Java Class : https://github.com/sterin501/expiredContent

Detailed steps :

1. Set config.properties for connection details .
2. Set StartDate and EndDate for search query . To list all the expired contents use blank values 





in unix

3. 


. ./classpath


4. To get expired content :


java -classpath $CLASSPATH GetExpired

Content.txt will have the expired date 

5. To update the expired Date :

java -classpath $CLASSPATH UpdateExpiredDate


in windows

3. call classpath.bat

4. java -classpath %CLASSPATH% GetExpired

5 . java -classpath %CLASSPATH% UpdateExpiredDate


OR



java -classpath .oracle.ucm.ridc.jar;. GetExpired


java -classpath .oracle.ucm.ridc.jar;. UpdateExpiredDate 



Sample run:

[sterin@sterinlap expiredContent]$ . ./classpath
[sterin@sterinlap expiredContent]$ java GetExpired 
ContentID is : STJACOBPC1IDCO003530 3131
ContentID is : STJACOBPC1IDCO003522 3123
ContentID is : STJACOBPC1IDCO003527 3128
[sterin@sterinlap expiredContent]$ java UpdateExpiredDate
Updating STJACOBPC1IDCO003530


@Properties LocalData
UserDateFormat=iso8601
IdcService=UPDATE_DOCINFO
dDocName=STJACOBPC1IDCO003530
UserTimeZone=UTC
dOutDate=2017-04-29 08:59:00
dID=3131
@end
Updating STJACOBPC1IDCO003522


@Properties LocalData
UserDateFormat=iso8601
IdcService=UPDATE_DOCINFO
dDocName=STJACOBPC1IDCO003522
UserTimeZone=UTC
dOutDate=2017-04-29 08:59:00
dID=3123
@end
Updating STJACOBPC1IDCO003527


@Properties LocalData
UserDateFormat=iso8601
IdcService=UPDATE_DOCINFO
dDocName=STJACOBPC1IDCO003527
UserTimeZone=UTC
dOutDate=2017-04-29 08:59:00
dID=3128
@end
[sterin@sterinlap expiredContent]$ java GetExpired 
[sterin@sterinlap expiredContent]$ 

Script to Set new dOutDate for Expired Content Using RIDC (Doc ID 2139331.1)

Friday, May 13, 2016

How to mount hadoop or hdfs file system in linux?

How to mount hadoop or hdfs file system in linux?

Purpose : To mount hdfs file in linux . It mainly use to dump the file and read only purpose .

Since hadoop-fuse-dfs is cloudera based solution , it is better to install both server and client installed from cloudera distribution itself

Steps :

1.Install hadoop on both servers and client :

2.Run the mount command in client

Detailed Steps :

A. Hadoop install I( in both client and server from cloudera distribution )

Steps :

1. Add cloudera distribution & Install hadoop-0.20-fuse ( this will install hadoop server)

wget http://archive.cloudera.com/redhat/6/x86_64/cdh/cdh3-repository-1.0-1.noarch.rpm

yum --nogpgcheck localinstall cdh3-repository-1.0-1.noarch.rpm

rpm --import http://archive.cloudera.com/redhat/6/x86_64/cdh/RPM-GPG-KEY-cloudera

yum install hadoop-0.20-fuse

2.set JAVA_HOME ( for both client and server)

hadoop-env.sh

Location:/usr/lib/hadoop-0.20/conf/

Export JAVA_HOME=

3. Configure hadoop in server and start

a. core-site.xml

Location :/usr/lib/hadoop-0.20/conf/

<property>

<name>hadoop.tmp.dir</name>

<value>/path/to/your/directory/hadoop-${user.name}</value>

</property>

<property>

<name>fs.default.name</name>

<value>hdfs://IP:9000</value>

</property>

b. Edit hdfs-site.xml

Location :/usr/lib/hadoop-0.20/conf/

<name>dfs.replication</name>

</property>

C. Edit mapred-site.xml

Location :/usr/lib/hadoop-0.20/conf/

<name>mapred.job.tracker</name>

<value>master:9001</value>

</property>

d. Format Hadoop

hadoop namenode -format

e. start the hadoop

Location :/usr/lib/hadoop-0.20/bin

start-dfs.sh

start-mapred.sh

Use jps to verify the process

f. Use netstat to verify the hdfs port here :900

netstat -an | grep 9000

tcp 0 0 IP:9000 0.0.0.0:* LISTEN

tcp 0 0 IP:9000 10.184.37.158:45227 ESTABLISHED

4. In client

Set JAVA_HOME and create new folder for mount

a.run in debug mode to verify the connection

format :hadoop-fuse-dfs -d dfs://IP:9000 /home/hduser/mount/

$ hadoop-fuse-dfs -d dfs://IP:9000 /home/hduser/mount/

INFO fuse_options.c:116 Ignoring option -d

INFO fuse_options.c:165 Adding FUSE arg /home/hduser/mount/

FUSE library version: 2.8.3

nullpath_ok: 0

unique: 1, opcode: INIT (26), nodeid: 0, insize: 56

INIT: 7.20

hdfs getconf -confKey fs.default.name in server

shows correct dfs location .

5 . Once mounting is fine ( it will show all the permission and users with out “?”)

then run the same command with out -d

Incorrect Mounting:

d?????????? ? ? ? ? ? mount

Correct Mounting

drwxr-xr-x. 2 hduser nobody 4096 Dec 31 1969 mount

Note : After mounting operation like copy or move works fine . But editing , appending the file is not possible .

Monday, May 2, 2016

Hadoop single node installation on linux

Hadoop single node installation on linux

Purpose : Install Hadoop in single machine ,then use put file to Hadoop file system and get those files

Steps :

1.Download and installation of Hadoop

2.Configuration

3.Use Basic command like ls, put,get,

Detailed Steps :

A. Hadoop install

Prerequisites

1.Install JAVA

2.Create hduser in OS

3. Enable SSH

Steps :

1. Download hadoop-2.6.4.tar.gz : http://hadoop.apache.org/releases.html

2. copy to /opt/app

3. tar -xzf hadoop-2.6.4.tar.gz

4.mv hadoop-2.6.4 hadoop

5. chown -R hduser:hduser hadoop

B. Start and Verify

1. edit hadoop-env.sh

Add export JAVA_HOME=

OR edit bash profile with JAVA_HOME

2. go to /opt/app/hadoop/sbin

3. ./start-all.sh (provide , password when required)

[hadoop@sterinlap sbin]$ ./start-all.sh

This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh

Incorrect configuration: namenode address dfs.namenode.servicerpc-address or dfs.namenode.rpc-address is not configured.

Starting namenodes on []

hadoop@localhost's password:

localhost: starting namenode, logging to /opt/app/hadoop/logs/hadoop-hadoop-namenode-sterinlap.out

hadoop@localhost's password:

localhost: starting datanode, logging to /opt/app/hadoop/logs/hadoop-hadoop-datanode-sterinlap.out

Starting secondary namenodes [0.0.0.0]

hadoop@0.0.0.0's password:

0.0.0.0: starting secondarynamenode, logging to /opt/app/hadoop/logs/hadoop-hadoop-secondarynamenode-sterinlap.out

starting yarn daemons

starting resourcemanager, logging to /opt/app/hadoop/logs/yarn-hadoop-resourcemanager-sterinlap.out

hadoop@localhost's password:

localhost: starting nodemanager, logging to /opt/app/hadoop/logs/yarn-hadoop-nodemanager-sterinlap.out

4. Verify the process by

ps -ef | grep hadoop

[hduser@sterinlap sbin]$ ps -ef | grep hadoop

hduser 9745 1 20 11:22 pts/5 00:00:06 /home/sterin/Public/jdk1.8.0_45/bin/java -Dproc_resourcemanager -Xmx1000m -Dhadoop.log.di

hduser 10077 1 23 11:22 ? 00:00:06 /home/sterin/Public/jdk1.8.0_45/bin/java -Dproc_nodemanager

Total : 2 process

5. Verify hadoop command by “hadoop fs -ls”

Location of commands : /opt/app/hadoop/bin

/opt/app/hadoop/bin/hadoop fs -ls

Found 11 items

-rwxr-xr-x 1 oracle oracle 159223 2016-02-12 15:27 container-executor

( Even though this shows current path , just to verify the installation )

6. Stop all Hadoop process

/opt/app/hadoop/sbin/stop-all.sh

Configuration

1. Create tmp directory for hadoop,

under /opt/app/hadoop : /opt/app/hadoop/tmp

2.edit core-site.xml

/opt/app/hadoop/etc/hadoop/core-site.xml

<name>hadoop.tmp.dir</name>

<value>/opt/app/hadoop/tmp</value>

<description>A base for other temporary directories.</description>

</property>

<name>fs.default.name</name>

<value>hdfs://localhost:54310</value>

<description>The name of the default file system. A URI whose

scheme and authority determine the FileSystem implementation. The

uri's scheme determines the config property (fs.SCHEME.impl) naming

the FileSystem implementation class. The uri's authority is used to

determine the host, port, etc. for a filesystem.</description>

</property>

</configuration>

3. Create mapred-site.xml

Location :/opt/app/hadoop/etc/hadoop/

cp mapred-site.xml.template mapred-site.xml

4. Edit mapred-site.xml

<name>mapred.job.tracker</name>

<value>localhost:54311</value>

<description>The host and port that the MapReduce job tracker runs

at. If "local", then jobs are run in-process as a single map

and reduce task.

</description>

</property>

</configuration>

5. Create namenode and the datanode folders

Location : /opt/app/hadoop/

mkdir hadoop_store/hdfs/namenode

mkdir hadoop_store/hdfs/datanode

6. Edit hdfs-site.xml

Location : /opt/app/hadoop/etc/hadoop/

<configuration>

<property>

<name>dfs.replication</name>

<value>1</value>

<description>Default block replication.

The actual number of replications can be specified when the file is created.

The default is used if replication is not specified in create time.

</description>

</property>

<property>

<name>dfs.namenode.name.dir</name>

<value>file:/opt/app/hadoop/hadoop_store/hdfs/namenode</value>

</property>

<property>

<name>dfs.datanode.data.dir</name>

<value>file:/opt/app/hadoop/hadoop_store/hdfs/datanode</value>

</property>

</configuration>

7. Add path in bashrc ( bash profile under home directory)

edit .bashrc

PATH=$PATH:/opt/app/hadoop/bin

8. Format the hadoop file system

hadoop namenode -format

16/04/25 12:11:03 INFO namenode.NameNode: STARTUP_MSG:

/************************************************************

STARTUP_MSG: Starting NameNode

STARTUP_MSG: host = -10-184-37-177.in..com/10.xx.37.177

STARTUP_MSG: args = [-format]

STARTUP_MSG: version = 2.6.4

16/04/25 12:11:05 INFO util.ExitUtil: Exiting with status 0

16/04/25 12:11:05 INFO namenode.NameNode: SHUTDOWN_MSG:

/************************************************************

SHUTDOWN_MSG: Shutting down NameNode at l-10-184-37-177..com/10.xx.37.177

9. Start the hadoop

Location :/opt/app/hadoop/sbin

./start-all.sh ( like step B-2)

10. jps

12882 NameNode

13189 DataNode

14152 NodeManager

13816 ResourceManager

13529 SecondaryNameNode

14394 Jps

11. Checking the file system ( listing the files )

hadoop fs -ls /

It will be blank

12. create new folder

hadoop fs -mkdir /new

13. Verify it :

hadoop fs -ls /

Found 1 items

drwxr-xr-x - sterin supergroup 0 2016-04-25 12:18 /new

14. Put Command

fs -put /tmp/sterin /new

/tmp/sterin : My local file

/new : In Hadoop

hadoop fs -put /tmp/sterin /new

15 . Get command

hadoop fs -get /new/sterin /home/sterin/Downloads/Chrome

/new/sterin : In hadoop source

/home/sterin/Downloads/Chrome : Local file system

URLs : http://localhost:50070/ web UI of the NameNode daemon

WCC and Linux

Pages

Wednesday, May 25, 2016

java.lang.ClassNotFoundException and No FileSystem for scheme: hdfs

Monday, May 23, 2016

How to connect to HDFS using java ?

Wednesday, May 18, 2016

How to get expired contents and set new expired date using RIDC ?

Friday, May 13, 2016

How to mount hadoop or hdfs file system in linux?

Monday, May 2, 2016

Hadoop single node installation on linux

Labels