Pages

Wednesday, June 29, 2016

MapReduce Two Values for One key example

MapReduce  Multiple values for a single key

MapReduce  Joint example






In this example , creating MapReduce code for doing activity from hortonworsks

http://hortonworks.com/hadoop-tutorial/how-to-process-data-with-apache-pig/

It need to map , one key to two values .

Year as key and PlayerID & Runs as Value






This also contains csv creator in any size and standalone java code to do the same activity



Details of the files :


BattingExample.java : MapReduce Driver Class

BattingMapper.java : MapReduce Mapper Class

BattingReducer.java : MapReduce Reducer Class


Batting.jar : MapReduce Jar


hadoop jar ./Batting.jar BattingExample <InputCSVfile> <OutputFolder>




StandAlone.java : it used for run same Mapper Reduce logic in stand alone mode


java StandAlone <inputCSV> <outputFIle>


Batting.csv : it contains data of players

CsvCreator.java : Used to create similar csv file with any size in similar format



java CsvCreator <NumberOfPlayers> <outputCSVfile>

2000 players will create 1 MB file


Sunday, June 19, 2016

How to protect webUI port of namenode ?

How to protect webUI port 50070  of namenode ?

By default webUI port of namenode running on 50070 is not protected and details of HDFS and file system in read only mode are open to all , by accessing http://<namenodeServer>:50070

All hadoop daemons use an embedded Jetty web container to host JSP for webui.



Version used in the example : apache 2.7.2

1. Go to <hadoop_home>/ share/hadoop/hdfs/webapps/hdfs/WEB-INF

2. edit web.xml

From
<web-app version="2.4" xmlns="http://java.sun.com/xml/ns/j2ee">
</web-app>


<web-app version="2.4" xmlns="http://java.sun.com/xml/ns/j2ee">
</web-app>

To

<web-app version="2.4" xmlns="http://java.sun.com/xml/ns/j2ee">
<security-constraint>
<web-resource-collection>
<web-resource-name>Protected</web-resource-name>
<url-pattern>/*</url-pattern>
</web-resource-collection>
<auth-constraint>
<role-name>admin</role-name>
</auth-constraint>
</security-constraint>
<login-config>
<auth-method>BASIC</auth-method>
<realm-name>explorerRelam</realm-name>
</login-config>
</web-app>


3. Create new file : jetty-web.xml
CopyPaste:

<Configure class="org.mortbay.jetty.webapp.WebAppContext">
<Get name="securityHandler">
<Set name="userRealm">
<New class="org.mortbay.jetty.security.HashUserRealm">
<Set name="name">explorerRelam</Set>
<Set name="config">
<SystemProperty name="hadoop.home.dir"/>/jetty/etc/realm.properties
</Set>
</New>
</Set>
</Get>
</Configure>



4. Create new file <hadoop_home>/jetty/etc/realm.properties
(folder jetty/etc should be created )

format :

Username: password,group


tushar: welcome1,admin

5. Access http://IP:50070



























6. If only explorer need to protect use , in step 2


<url-pattern>/explorer.html/*</url-pattern>




Thursday, June 16, 2016

How to overwrite or update a file in hadoop HDFS ?

Using put or  copyFromLocal   wont able to update a file in HDFS . it will show below error


[hduser@localhost SampleData]$ hadoop fs -put books.csv  /yesB
put: Target /yesB/books.csv already exists


[hduser@localhost SampleData]$ hadoop fs -copyFromLocal   books.csv  /yesB
copyFromLocal: Target /yesB/books.csv already exists



To overcome this issue , distcp can be used 


 hadoop distcp -update  file://<source>  hdfs://<IP:PORT>/<targetlocation>

Example :

 hadoop distcp -update   file:///home/hduser/pigSample/labfiles/SampleData/books.csv hdfs://11.181.37.158:9000/yesB


-overwrite can be used . But using -update is better because it copy and do mapreduce only when there is difference in source and target