Pages

Thursday, June 16, 2016

How to overwrite or update a file in hadoop HDFS ?

Using put or  copyFromLocal   wont able to update a file in HDFS . it will show below error


[hduser@localhost SampleData]$ hadoop fs -put books.csv  /yesB
put: Target /yesB/books.csv already exists


[hduser@localhost SampleData]$ hadoop fs -copyFromLocal   books.csv  /yesB
copyFromLocal: Target /yesB/books.csv already exists



To overcome this issue , distcp can be used 


 hadoop distcp -update  file://<source>  hdfs://<IP:PORT>/<targetlocation>

Example :

 hadoop distcp -update   file:///home/hduser/pigSample/labfiles/SampleData/books.csv hdfs://11.181.37.158:9000/yesB


-overwrite can be used . But using -update is better because it copy and do mapreduce only when there is difference in source and target 






No comments:

Post a Comment