How to Interpret the Contents of a Job Trace

Goal

Break down the content of a job trace captured from the perspective of the job-trace-creation.log, which is part of a job trace.

Learn

While gathering a job trace, the job-trace-creation.log is created to log its activity. This log gives an overall view of what is successfully captured. 

JOB_LOG

job.log - Logging when the job runs. This file is of the most interest to the support team. 

JOB_PLAN_ORIGINAL

job-plan-original.dot - Original definition of the job which is sent to the Hadoop Cluster.

JOB_PLAN_COMPILED

job-plan-complied.dot - Modified definition of the job which is sent to the Hadoop Cluster. This file contains the reordering of job sequences to be processed, by understanding the dependencies. 

JOB_DEFINITION

job-definition.json - Job definition in the REST format used by Datameer.

JOB_INPUT_DEFINITION

job-definition-<xxx>.json - xxx denotes the original file name of the job which is incorporated as a part of this file name. This file defines job specifics in REST format.

JOB_CONF

job-conf.xml - Job configuration used when running jobs locally. 

JOB_CONF_CLUSTER

job-conf-cluster.xml - When the execution framework is Tez or SparkClient, this file is logged. It merges the Datameer configuration with the Hadoop configuration.

TASK_LOGS

tasklog-spark-submit.log - When the execution framework is SparkCluster, this file is logged and it contains an account of all activities for the tasks executed on this particular job. 

ERROR_LOGS

When there are exceptions, a different error log file with a different name is created.

Example:

error-map-local-<numbering>.log.gz
error-map-attempt-<timestamp>.log.gz