General Job Failure and Slow Job Troubleshooting Tips

Problem

Datameer and Hadoop jobs are failing or seem to run slower than expected.
 

Solutions

The following list has troubleshooting tips to gain insights into why jobs are failing or have slower than expected performance. 

Check the job logs for clues:

  • Check logs for Datameer object run/execution errors.
  • Examine your job error handling strategy. (Is Datameer set to skip, ignore, or fail jobs due to an invalid record?)
  • Using the job ID from the job log, extract all the log activity from job tracker logs for the job ID in question.
  • From the job tracker logs, examine the tasks that were started, what task tracker the jobs were sent to, and if they completed successfully.
  • Check to find out how many times a specific task has been retried by using the task ID. Look to see if it was retried in the same task tracker.
  • Identify the task failures and the server where the job was assigned. Look at the task tracker logs for additional clues.

Job tracker logs tell a story. Using the job ID you can track the task activity and dig from there.

 

Check system resources:

  • Jobs may seem to be executing slow than usual. This could be due to resource allocation. If no obvious task failures are populated in logs, tasks could be waiting for available resources before executing.

Note:

To access a job log in Datameer, you must have read permission for the job.

To access the node manager logs on your Hadoop cluster, you must have Hadoop administrator permissions.