Using put or copyFromLocal wont able to update a file in HDFS . it will show below error
[hduser@localhost SampleData]$ hadoop fs -put books.csv /yesB
put: Target /yesB/books.csv already exists
[hduser@localhost SampleData]$ hadoop fs -copyFromLocal books.csv /yesB
copyFromLocal: Target /yesB/books.csv already exists
[hduser@localhost SampleData]$ hadoop fs -put books.csv /yesB
put: Target /yesB/books.csv already exists
[hduser@localhost SampleData]$ hadoop fs -copyFromLocal books.csv /yesB
copyFromLocal: Target /yesB/books.csv already exists
To overcome this issue , distcp can be used
hadoop distcp -update file://<source> hdfs://<IP:PORT>/<targetlocation>
Example :
hadoop distcp -update file:///home/hduser/pigSample/labfiles/SampleData/books.csv hdfs://11.181.37.158:9000/yesB
-overwrite can be used . But using -update is better because it copy and do mapreduce only when there is difference in source and target
No comments:
Post a Comment