Pages

Wednesday, May 25, 2016

java.lang.ClassNotFoundException and No FileSystem for scheme: hdfs

java.lang.ClassNotFoundException and No FileSystem for scheme: hdfs  exception while connecting to hadoop from application server



Stand alone code to connect to hadoop works fine But when the same code put it in application server ( weblogic ) , it fails . Even though all the jars bundled in ear file


Error 1 :

java.io.IOException: No FileSystem for scheme: hdfs                                                                        
        at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1600)                                          
        at org.apache.hadoop.fs.FileSystem.access$300(FileSystem.java:69)                                                  
        at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:1637)                                        
        at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1619)                                                
        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:255)                                                        
        at connector.HadoopService.hadoopHandler(HadoopService.java:62)                                                    
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)                                                    
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)                                  
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingM


Fix :
Edit the code  like this

                      Configuration conf = new Configuration();
               

              conf.set("fs.hdfs.impl", org.apache.hadoop.hdfs.DistributedFileSystem.class.getName());
             conf.set("fs.file.impl", org.apache.hadoop.fs.LocalFileSystem.class.getName());



        FileSystem hdfs = FileSystem.get(new URI("hdfs://IP:9000"),conf);




Error 2 :


java.lang.ClassNotFoundException: Class  org.apache.hadoop.hdfs.DistributedFileSystem
org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2290)
org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2303)
org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:87)
org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2342)
org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2324)
org.apache.hadoop.fs.FileSystem.get(FileSystem.java:351)
org.apache.hadoop.fs.FileSystem.get(FileSystem.java:163)




Fix : add all the jars required specifically in <domain>/lib directory

For hadoop 2.7.2  : required jars  ( all are located at common/lib folder )


  1. commons-io-2.4.jar       
  2.   guava-11.0.2.jar    
  3.    hadoop-common-2.7.2.jar 
  4.  htrace-core-3.1.0-incubating.jar 
  5.  protobuf-java-2.5.0.jar 
  6.  slf4j-api-1.7.10.jar
  7. commons-logging-1.1.3.jar 
  8.  hadoop-auth-2.7.2.jar 
  9.  hadoop-hdfs-2.7.2.jar  
  10.   log4j-1.2.17.jar   
For coludera distribution 

1. log4j-1.2.17.jar


2. commons-logging-1.0.4.jar

3. guava-r09-jarjar.jar


4. hadoop-core-0.20.2.jar


Adding lib directory is best solution to fix this issue 

Monday, May 23, 2016

How to connect to HDFS using java ?

How to connect to HDFS using java ?





Required library files

For coludera distribution 

1. log4j-1.2.17.jar


2. commons-logging-1.0.4.jar

3. guava-r09-jarjar.jar


4. hadoop-core-0.20.2.jar


For hadoop 2.7.2  : required jars  ( all are located at common/lib folder )


  1. commons-io-2.4.jar       
  2.   guava-11.0.2.jar    
  3.    hadoop-common-2.7.2.jar 
  4.  htrace-core-3.1.0-incubating.jar 
  5.  protobuf-java-2.5.0.jar 
  6.  slf4j-api-1.7.10.jar
  7. commons-logging-1.1.3.jar 
  8.  hadoop-auth-2.7.2.jar 
  9.  hadoop-hdfs-2.7.2.jar  
  10.   log4j-1.2.17.jar   


Location :

In master node , hadoop version will show the core jar to use



[root@oel6 ~]# hadoop version
Hadoop 0.20.2-cdh3u6
Subversion file:///data/1/tmp/topdir/BUILD/hadoop-0.20.2-cdh3u6 -r efb405d2aa54039bdf39e0733cd0bb9423a1eb0a
Compiled by root on Wed Mar 20 13:11:26 PDT 2013
From source with checksum 3277b62b2872d77555cfbc5a202f81c4
This command was run using /usr/lib/hadoop-0.20/hadoop-core-0.20.2-cdh3u6.jar


So use /usr/lib/hadoop-0.20/hadoop-core-0.20.2-cdh3u6.jar

Basic Code :

http://www.folkstalk.com/2013/06/connect-to-hadoop-hdfs-through-java.html

Read FileSystem

import java.io.IOException;
import java.net.URI;
import java.net.URISyntaxException;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileStatus;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.FileUtil;
import org.apache.hadoop.fs.Path;

public class ReadFileSystem {
public static void main(String[] args) throws IOException, URISyntaxException
{
Configuration conf = new Configuration();
FileSystem hdfs = FileSystem.get(new URI("hdfs://IP:9000"),conf);

FileStatus[] fileStatus = hdfs.listStatus(new Path("hdfs://IP:9000/new"));
Path[] paths = FileUtil.stat2Paths(fileStatus);

System.out.println("***** Contents of the Directory *****");
for(Path path : paths)
{
System.out.println(path);
}


}
}

hdfs getconf -confKey fs.default.name in server 

shows correct dfs location

Sample output :

***** Contents of the Directory *****
hdfs://IP:9000/new/123.txt
hdfs://IP:9000/new/newww.txt
hdfs://IP:9000/new/sterin.txt
hdfs://IP:9000/new/ucm
hdfs://IP:9000/new/valut
hdfs://IP:9000/new/weblayout


Write a file to HDFS


import java.io.BufferedInputStream;
import java.io.FileInputStream;
import java.io.IOException;
import java.io.InputStream;
import java.io.OutputStream;
import java.net.URI;
import java.net.URISyntaxException;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IOUtils;
import org.apache.hadoop.util.Progressable;

public class CopyFileToHDFS {
public static void main(String[] args) throws IOException, URISyntaxException
{
//1. Get the instance of COnfiguration
Configuration configuration = new Configuration();
//2. Create an InputStream to read the data from local file
InputStream inputStream = new BufferedInputStream(new FileInputStream("/tmp/sample.txt"));
//3. Get the HDFS instance
FileSystem hdfs = FileSystem.get(new URI("hdfs://IP:9000"), configuration);
//4. Open a OutputStream to write the data, this can be obtained from the FileSytem
OutputStream outputStream = hdfs.create(new Path("hdfs://IP:9000/forsterin/Hadoop_File.txt"),
new Progressable() {
@Override
public void progress() {
System.out.println("....");
}
});
try
{
IOUtils.copyBytes(inputStream, outputStream, 4096, false);
}
finally
{
IOUtils.closeStream(inputStream);
IOUtils.closeStream(outputStream);
}
}
}



Wednesday, May 18, 2016

How to get expired contents and set new expired date using RIDC ?

How to get expired contents and set new  expired date using RIDC ?

Purpose : To list all the expired content and set new expired date in bulk . There is no database update is required for this .


Java Class : https://github.com/sterin501/expiredContent



Detailed steps :


1. Set config.properties for connection details .
2. Set StartDate and EndDate for search query . To list all the expired contents use blank values 





in unix

3. 


. ./classpath


4. To get expired content :


java -classpath $CLASSPATH GetExpired

Content.txt will have the expired date 

5. To update the expired Date :

java -classpath $CLASSPATH UpdateExpiredDate


in windows

3. call classpath.bat

4. java -classpath %CLASSPATH% GetExpired

5 . java -classpath %CLASSPATH% UpdateExpiredDate


OR



java -classpath .oracle.ucm.ridc.jar;. GetExpired


java -classpath .oracle.ucm.ridc.jar;. UpdateExpiredDate 



Sample run:

[sterin@sterinlap expiredContent]$ . ./classpath
[sterin@sterinlap expiredContent]$ java GetExpired 
ContentID is : STJACOBPC1IDCO003530 3131
ContentID is : STJACOBPC1IDCO003522 3123
ContentID is : STJACOBPC1IDCO003527 3128
[sterin@sterinlap expiredContent]$ java UpdateExpiredDate
Updating STJACOBPC1IDCO003530


@Properties LocalData
UserDateFormat=iso8601
IdcService=UPDATE_DOCINFO
dDocName=STJACOBPC1IDCO003530
UserTimeZone=UTC
dOutDate=2017-04-29 08:59:00
dID=3131
@end
Updating STJACOBPC1IDCO003522


@Properties LocalData
UserDateFormat=iso8601
IdcService=UPDATE_DOCINFO
dDocName=STJACOBPC1IDCO003522
UserTimeZone=UTC
dOutDate=2017-04-29 08:59:00
dID=3123
@end
Updating STJACOBPC1IDCO003527


@Properties LocalData
UserDateFormat=iso8601
IdcService=UPDATE_DOCINFO
dDocName=STJACOBPC1IDCO003527
UserTimeZone=UTC
dOutDate=2017-04-29 08:59:00
dID=3128
@end
[sterin@sterinlap expiredContent]$ java GetExpired 
[sterin@sterinlap expiredContent]$ 

Script to Set new dOutDate for Expired Content Using RIDC (Doc ID 2139331.1)

Friday, May 13, 2016

How to mount hadoop or hdfs file system in linux?


How to mount hadoop or hdfs file system in linux?



Purpose : To mount hdfs file in linux . It mainly use to dump the file and read only purpose .

Since hadoop-fuse-dfs is cloudera based solution , it is better to install both server and client installed from cloudera distribution itself


Steps :
1.Install hadoop on both servers and client :
2.Run the mount command in client


Detailed Steps :

A. Hadoop install I( in both client and server from cloudera distribution )



Steps :

1. Add cloudera distribution & Install hadoop-0.20-fuse ( this will install hadoop server)


wget http://archive.cloudera.com/redhat/6/x86_64/cdh/cdh3-repository-1.0-1.noarch.rpm
yum --nogpgcheck localinstall cdh3-repository-1.0-1.noarch.rpm

yum install hadoop-0.20-fuse



2.set JAVA_HOME ( for both client and server)

hadoop-env.sh
Location:/usr/lib/hadoop-0.20/conf/

Export JAVA_HOME=


3. Configure hadoop in server and start

a. core-site.xml
Location :/usr/lib/hadoop-0.20/conf/
<property>
<name>hadoop.tmp.dir</name>
<value>/path/to/your/directory/hadoop-${user.name}</value>
</property>

<property>
<name>fs.default.name</name>
<value>hdfs://IP:9000</value>
</property>

b. Edit hdfs-site.xml
Location :/usr/lib/hadoop-0.20/conf/
<property>
  <name>dfs.replication</name>
  <value>1</value>
</property>


C. Edit  mapred-site.xml
Location :/usr/lib/hadoop-0.20/conf/

<property>
  <name>mapred.job.tracker</name>
  <value>master:9001</value>
</property>

d. Format Hadoop



hadoop namenode -format



e. start the hadoop


Location :/usr/lib/hadoop-0.20/bin
start-dfs.sh
start-mapred.sh

Use jps to verify the process



f. Use netstat to verify the hdfs port here :900


netstat -an | grep 9000
tcp 0 0 IP:9000 0.0.0.0:* LISTEN
tcp 0 0 IP:9000 10.184.37.158:45227 ESTABLISHED

4. In client

Set JAVA_HOME and create new folder for mount

a.run in debug mode to verify the connection

format :hadoop-fuse-dfs -d dfs://IP:9000 /home/hduser/mount/

$ hadoop-fuse-dfs -d dfs://IP:9000 /home/hduser/mount/
INFO fuse_options.c:116 Ignoring option -d
INFO fuse_options.c:165 Adding FUSE arg /home/hduser/mount/
FUSE library version: 2.8.3
nullpath_ok: 0
unique: 1, opcode: INIT (26), nodeid: 0, insize: 56
INIT: 7.20





hdfs getconf -confKey fs.default.name in server 

shows correct dfs location  . 

5 . Once mounting is fine ( it will show all the permission and users with out “?”)

then run the same command with out -d

Incorrect Mounting:

d?????????? ? ? ? ? ? mount

Correct Mounting

drwxr-xr-x. 2 hduser nobody 4096 Dec 31 1969 mount


Note : After mounting operation like copy or move works fine . But editing , appending the file is not possible .  

Monday, May 2, 2016

Hadoop single node installation on linux


Hadoop single node installation on  linux



Purpose : Install Hadoop in single machine ,then use put file to Hadoop file system and get those files

Steps :


1.Download and installation of Hadoop
2.Configuration
3.Use Basic command like ls, put,get,

Detailed Steps :

A. Hadoop install

Prerequisites

1.Install JAVA
2.Create hduser in OS
3. Enable SSH

Steps :

1. Download hadoop-2.6.4.tar.gz : http://hadoop.apache.org/releases.html

2. copy to /opt/app

3. tar -xzf hadoop-2.6.4.tar.gz

4.mv hadoop-2.6.4 hadoop

5. chown -R hduser:hduser hadoop

B. Start and Verify

1. edit hadoop-env.sh

Add export JAVA_HOME=

OR edit bash profile with JAVA_HOME

2. go to /opt/app/hadoop/sbin

3. ./start-all.sh (provide , password when required)


[hadoop@sterinlap sbin]$ ./start-all.sh
This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh
Incorrect configuration: namenode address dfs.namenode.servicerpc-address or dfs.namenode.rpc-address is not configured.
Starting namenodes on []
hadoop@localhost's password:
localhost: starting namenode, logging to /opt/app/hadoop/logs/hadoop-hadoop-namenode-sterinlap.out
hadoop@localhost's password:
localhost: starting datanode, logging to /opt/app/hadoop/logs/hadoop-hadoop-datanode-sterinlap.out
Starting secondary namenodes [0.0.0.0]
hadoop@0.0.0.0's password:
0.0.0.0: starting secondarynamenode, logging to /opt/app/hadoop/logs/hadoop-hadoop-secondarynamenode-sterinlap.out
starting yarn daemons
starting resourcemanager, logging to /opt/app/hadoop/logs/yarn-hadoop-resourcemanager-sterinlap.out
hadoop@localhost's password:
localhost: starting nodemanager, logging to /opt/app/hadoop/logs/yarn-hadoop-nodemanager-sterinlap.out

4. Verify the process by

ps -ef | grep hadoop

[hduser@sterinlap sbin]$ ps -ef | grep hadoop
hduser 9745 1 20 11:22 pts/5 00:00:06 /home/sterin/Public/jdk1.8.0_45/bin/java -Dproc_resourcemanager -Xmx1000m -Dhadoop.log.di

hduser 10077 1 23 11:22 ? 00:00:06 /home/sterin/Public/jdk1.8.0_45/bin/java -Dproc_nodemanager

Total : 2 process
5. Verify hadoop command by “hadoop fs -ls”

Location of commands : /opt/app/hadoop/bin

/opt/app/hadoop/bin/hadoop fs -ls


Found 11 items
-rwxr-xr-x 1 oracle oracle 159223 2016-02-12 15:27 container-executor

( Even though this shows current path , just to verify the installation )


6. Stop all Hadoop process

/opt/app/hadoop/sbin/stop-all.sh

Configuration

1. Create tmp directory for hadoop,

under /opt/app/hadoop : /opt/app/hadoop/tmp

2.edit core-site.xml
/opt/app/hadoop/etc/hadoop/core-site.xml
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/opt/app/hadoop/tmp</value>
<description>A base for other temporary directories.</description>
</property>

<property>
<name>fs.default.name</name>
<value>hdfs://localhost:54310</value>
<description>The name of the default file system. A URI whose
scheme and authority determine the FileSystem implementation. The
uri's scheme determines the config property (fs.SCHEME.impl) naming
the FileSystem implementation class. The uri's authority is used to
determine the host, port, etc. for a filesystem.</description>
</property>
</configuration>


3. Create mapred-site.xml

Location :/opt/app/hadoop/etc/hadoop/

cp mapred-site.xml.template mapred-site.xml




4. Edit mapred-site.xml


<configuration>
<property>
<name>mapred.job.tracker</name>
<value>localhost:54311</value>
<description>The host and port that the MapReduce job tracker runs
at. If "local", then jobs are run in-process as a single map
and reduce task.
</description>
</property>
</configuration>




5. Create namenode and the datanode folders

Location : /opt/app/hadoop/

mkdir hadoop_store/hdfs/namenode
mkdir hadoop_store/hdfs/datanode


6. Edit hdfs-site.xml

Location : /opt/app/hadoop/etc/hadoop/

<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
<description>Default block replication.
The actual number of replications can be specified when the file is created.
The default is used if replication is not specified in create time.
</description>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/opt/app/hadoop/hadoop_store/hdfs/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/opt/app/hadoop/hadoop_store/hdfs/datanode</value>
</property>
</configuration>

7. Add path in bashrc ( bash profile under home directory)

edit .bashrc

PATH=$PATH:/opt/app/hadoop/bin


8. Format the hadoop file system

hadoop namenode -format


16/04/25 12:11:03 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG: host = -10-184-37-177.in..com/10.xx.37.177
STARTUP_MSG: args = [-format]
STARTUP_MSG: version = 2.6.4





16/04/25 12:11:05 INFO util.ExitUtil: Exiting with status 0
16/04/25 12:11:05 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at l-10-184-37-177..com/10.xx.37.177





9. Start the hadoop

Location :/opt/app/hadoop/sbin

./start-all.sh ( like step B-2)

10. jps


12882 NameNode
13189 DataNode
14152 NodeManager
13816 ResourceManager
13529 SecondaryNameNode
14394 Jps



11. Checking the file system ( listing the files )


hadoop fs -ls /

It will be blank

12. create new folder


hadoop fs -mkdir /new




13. Verify it :

hadoop fs -ls /
Found 1 items
drwxr-xr-x - sterin supergroup 0 2016-04-25 12:18 /new


14. Put Command
fs -put /tmp/sterin /new

<source> <target>
/tmp/sterin : My local file

/new : In Hadoop

hadoop fs -put /tmp/sterin /new








15 . Get command

hadoop fs -get /new/sterin /home/sterin/Downloads/Chrome






/new/sterin : In hadoop source

/home/sterin/Downloads/Chrome : Local file system



URLs : http://localhost:50070/ web UI of the NameNode daemon