Saturday, February 21, 2015

Working with Compression on HDFS

Copy and uncompress file to HDFS without unziping the file on local filesystem

If your file is in GB's then this command would certainly help to avoid out of space errors as there is no need to unzip the file on local filesystem.

put command in hadoop supports reading input from stdin. For reading the input from stdin use '-' as source file.

Compressed filename: compressed.tar.gz
  
gunzip -c  compressed.tar.gz | hadoop fs -put - /user/files/uncompressed_data

Only Disadvantage: The only drawback of this approach is that in HDFS the data will be merged into a single file even though the local compressed file contains more than one file.

Practically used it today...while working with realtime problem...

Thanks to below blogger
http://bigdatanoob.blogspot.com/2011/07/copy-and-uncompress-file-to-hdfs.html

Sunday, February 15, 2015

Big Data solution for My Retail client

We wanted to provide best marketing campaigns, coupons, and offers down to the individual customer. Direct customer relationships are a privilege, but it  also requires processing the  massive amounts of data to provide the best winning prices to the customer at point of sale is one of the complex task .Our client is using Hadoop to process large data. The Merchants generates the offers into the portals for particular week or month on the basis of regular price , promotional price & Clearance price ,  for the group of products and for group market across the globe.Based on promotional offers transaction, we conduct explosion of data using Hadoop by generating the promotional offers to help retailers make informed decisions about pricing, promotions, and assortment management. These offers then flow into the Hadoop system for further processing.We explode these offers using PIG and HIVE for all the products and store across the globes. In product explosion we generate the promotions data for all the products which belongs to product groups. In market explosion we explode these data for respective stores across the globe. The details about the product group and market category during product explosion and market explosion ,we get from the internal datawarehouse. During these transformations we check store authorization for the category of that product for respective stores and many complex business rules. Based on this data, the Hadoop calculates the optimal promotional retail price for product. The billons of promotional data set then generated and processed using Hadoop PIG and HIVE transformations. We do chaining and collision within generated promotional data. Then out of billion records we categorize the winning price or winning data and looser data. Hadoop provides a near complete ecosystem where we run batch and ETL-type processing, analytics, store data, and process data faster for billions of records. We store data in HDFS and process data using HIVE and PIG that enable analytics of this promotional dataset. We run multiple transformation jobs and deliver information to multiple systems. We then send this winning data (after pricing optimization) every day to our point of sale (retails stores), where during the purchasing of retail product by customer, our associate will provide the promotional price applicable to that particular time. We also send this winning data and looser data to our datawarehouse for reporting purpose to our merchants so that they can generate BI reporting out of that. 

Working with Compression on HDFS

Copy and uncompress file to HDFS without unziping the file on local filesystem If your file is in GB's then this command would cer...