yarn.app.mapreduce.am.staging-dir property and its usage

Description

According to the Apache Hadoop documentation, history files are written by MapReduce jobs (in HDFS) to the .../history/done_intermediate/ directory. This location is configured in mapred-site.xml via the property mapreduce.jobhistory.intermediate-done-dir.

After a mapreduce job completes, logs are written to HDFS under this directory. The history server continuously scans the intermediate directory and moves any newly available logs to the directory specified by the mapreduce.jobhistory.done-dir parameter in mapred-site.xml. From this location, history server picks up the logs and displays them on the history server UI.

MapReduce Job History retention policy is controlled by the below properties.

  • mapreduce.jobhistory.cleaner.enable - True / False. Default value is True.
  • mapreduce.jobhistory.cleaner.interval-ms - How often the job history cleaner checks for files to delete, in milliseconds. Defaults to 86400000 (one day). Files are only deleted if they are older than mapreduce.jobhistory.max-age-ms.
  • mapreduce.jobhistory.max-age-ms - Job history files older than this many milliseconds will be deleted when the history cleaner runs. Defaults to 604800000 (1 week).