How to Use Intermediate and Final Output Compression (MR1 & YARN)

Configuring MapReduce Compression

MR1YARNDescription

To enable MapReduce intermediate compression:

 
mapred.compress.map.output=true mapreduce.map.output.compress=true Should the outputs of the maps be compressed before being sent across the network. Uses SequenceFile compression.

mapred.map.output.compression.codec=
org.apache.hadoop.io.compress.SnappyCodec

mapreduce.map.output.compress.codec=
org.apache.hadoop.io.compress.SnappyCodec

If the map outputs are compressed, how should they be compressed? (i.e. Snappy)
To compress the final output of a MapReduce job:  
mapred.output.compress=true mapreduce.output.fileoutputformat.compress=true Should the job outputs be compressed?

mapred.output.compression.type=BLOCK

mapreduce.output.fileoutputformat.compress.type=BLOCK

If the job outputs are to compressed as SequenceFiles, how should they be compressed? Should be one of NONE, RECORD or BLOCK.

mapred.output.compression.codec=
org.apache.hadoop.io.compress.GzipCodec

mapreduce.output.fileoutputformat.compress.codec=

org.apache.hadoop.io.compress.GzipCodec

If the job outputs are compressed, how should they be compressed? (i.e. Gzip)

io.compression.codecs=

org.apache.hadoop.io.compress.DefaultCodec,org.apache.hadoop.io.compress.GzipCodec,

org.apache.hadoop.io.compress.BZip2Codec,org.apache.hadoop.io.compress.SnappyCodec

 

A list of the compression codec classes that can be used for compression/decompression
     

RECOMMENDED: