Saturday, February 21, 2015

Working with Compression on HDFS

Copy and uncompress file to HDFS without unziping the file on local filesystem

If your file is in GB's then this command would certainly help to avoid out of space errors as there is no need to unzip the file on local filesystem.

put command in hadoop supports reading input from stdin. For reading the input from stdin use '-' as source file.

Compressed filename: compressed.tar.gz
  
gunzip -c  compressed.tar.gz | hadoop fs -put - /user/files/uncompressed_data

Only Disadvantage: The only drawback of this approach is that in HDFS the data will be merged into a single file even though the local compressed file contains more than one file.

Practically used it today...while working with realtime problem...

Thanks to below blogger
http://bigdatanoob.blogspot.com/2011/07/copy-and-uncompress-file-to-hdfs.html

No comments:

Post a Comment

Working with Compression on HDFS

Copy and uncompress file to HDFS without unziping the file on local filesystem If your file is in GB's then this command would cer...