Pages

Monday, May 23, 2016

How to connect to HDFS using java ?

How to connect to HDFS using java ?





Required library files

For coludera distribution 

1. log4j-1.2.17.jar


2. commons-logging-1.0.4.jar

3. guava-r09-jarjar.jar


4. hadoop-core-0.20.2.jar


For hadoop 2.7.2  : required jars  ( all are located at common/lib folder )


  1. commons-io-2.4.jar       
  2.   guava-11.0.2.jar    
  3.    hadoop-common-2.7.2.jar 
  4.  htrace-core-3.1.0-incubating.jar 
  5.  protobuf-java-2.5.0.jar 
  6.  slf4j-api-1.7.10.jar
  7. commons-logging-1.1.3.jar 
  8.  hadoop-auth-2.7.2.jar 
  9.  hadoop-hdfs-2.7.2.jar  
  10.   log4j-1.2.17.jar   


Location :

In master node , hadoop version will show the core jar to use



[root@oel6 ~]# hadoop version
Hadoop 0.20.2-cdh3u6
Subversion file:///data/1/tmp/topdir/BUILD/hadoop-0.20.2-cdh3u6 -r efb405d2aa54039bdf39e0733cd0bb9423a1eb0a
Compiled by root on Wed Mar 20 13:11:26 PDT 2013
From source with checksum 3277b62b2872d77555cfbc5a202f81c4
This command was run using /usr/lib/hadoop-0.20/hadoop-core-0.20.2-cdh3u6.jar


So use /usr/lib/hadoop-0.20/hadoop-core-0.20.2-cdh3u6.jar

Basic Code :

http://www.folkstalk.com/2013/06/connect-to-hadoop-hdfs-through-java.html

Read FileSystem

import java.io.IOException;
import java.net.URI;
import java.net.URISyntaxException;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileStatus;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.FileUtil;
import org.apache.hadoop.fs.Path;

public class ReadFileSystem {
public static void main(String[] args) throws IOException, URISyntaxException
{
Configuration conf = new Configuration();
FileSystem hdfs = FileSystem.get(new URI("hdfs://IP:9000"),conf);

FileStatus[] fileStatus = hdfs.listStatus(new Path("hdfs://IP:9000/new"));
Path[] paths = FileUtil.stat2Paths(fileStatus);

System.out.println("***** Contents of the Directory *****");
for(Path path : paths)
{
System.out.println(path);
}


}
}

hdfs getconf -confKey fs.default.name in server 

shows correct dfs location

Sample output :

***** Contents of the Directory *****
hdfs://IP:9000/new/123.txt
hdfs://IP:9000/new/newww.txt
hdfs://IP:9000/new/sterin.txt
hdfs://IP:9000/new/ucm
hdfs://IP:9000/new/valut
hdfs://IP:9000/new/weblayout


Write a file to HDFS


import java.io.BufferedInputStream;
import java.io.FileInputStream;
import java.io.IOException;
import java.io.InputStream;
import java.io.OutputStream;
import java.net.URI;
import java.net.URISyntaxException;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IOUtils;
import org.apache.hadoop.util.Progressable;

public class CopyFileToHDFS {
public static void main(String[] args) throws IOException, URISyntaxException
{
//1. Get the instance of COnfiguration
Configuration configuration = new Configuration();
//2. Create an InputStream to read the data from local file
InputStream inputStream = new BufferedInputStream(new FileInputStream("/tmp/sample.txt"));
//3. Get the HDFS instance
FileSystem hdfs = FileSystem.get(new URI("hdfs://IP:9000"), configuration);
//4. Open a OutputStream to write the data, this can be obtained from the FileSytem
OutputStream outputStream = hdfs.create(new Path("hdfs://IP:9000/forsterin/Hadoop_File.txt"),
new Progressable() {
@Override
public void progress() {
System.out.println("....");
}
});
try
{
IOUtils.copyBytes(inputStream, outputStream, 4096, false);
}
finally
{
IOUtils.closeStream(inputStream);
IOUtils.closeStream(outputStream);
}
}
}



No comments:

Post a Comment