Jobs Fail with Error - RuntimeException: Job status not available

Symptoms

Some Datameer jobs are labelled as failed in Datameer even though the associated Hadoop job completed successfully. In the Job Log, the following log entries are observed:

ERROR [2015-01-01 00:00:00.000] [ConcurrentJobExecutor-0] (ClusterSession.java:193) - Failed to run cluster job 'Sample job (1234): MyDatalink#datalink-sample(Limit record processor)' [3 mins, 36 sec]
java.lang.RuntimeException: Job status not available
        at datameer.dap.sdk.util.ExceptionUtil.convertToRuntimeException(ExceptionUtil.java:49)
        at datameer.dap.sdk.util.ExceptionUtil.convertToRuntimeException(ExceptionUtil.java:31)
        at datameer.dap.common.graphv2.hadoop.MrJob.runImpl(MrJob.java:197)
        at datameer.dap.common.graphv2.ClusterJob.run(ClusterJob.java:129)
        at datameer.dap.common.graphv2.ClusterSession.execute(ClusterSession.java:187)
        at datameer.dap.common.graphv2.ConcurrentClusterSession$1.run(ConcurrentClusterSession.java:48)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
        at java.util.concurrent.FutureTask.run(FutureTask.java:262)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: Job status not available
        at org.apache.hadoop.mapreduce.Job.updateStatus(Job.java:322)
        at org.apache.hadoop.mapreduce.Job.getJobState(Job.java:347)
        at org.apache.hadoop.mapred.JobClient$NetworkedJob.getJobState(JobClient.java:295)
        at datameer.dap.common.job.mr.DefaultMrJobClient.waitUntilJobCompletion(DefaultMrJobClient.java:129)
        at datameer.dap.common.job.mr.DefaultMrJobClient.runJobImpl(DefaultMrJobClient.java:77)
        at datameer.dap.common.job.mr.MrJobClient.runJob(MrJobClient.java:34)
        at datameer.dap.common.graphv2.hadoop.MrJob.runImpl(MrJob.java:185)
        ... 8 more

Cause

This is a Datameer configuration issue.

Datameer by default provides a

yarn.app.mapreduce.am.staging-dir

value of /tmp which may not be valid for the connected Hadoop cluster.

Resolution

To resolve this issue, follow these steps:

1. Identify on the Hadoop cluster the value for

yarn.app.mapreduce.am.staging-dir

which can be found in the yarn-site.xml file.

2. Navigate to the Administration tab in Datameer.

3. Navigate to the Hadoop Cluster section. Click Edit.

4. Add at the Custom Hadoop Properties section the parameter with value from step 1 to

yarn.app.mapreduce.am.staging-dir

For example:

yarn.app.mapreduce.am.staging-dir=/user

5. Click Save to activate the updated settings.

Comments

  • Avatar
    Kewal Chopra

    Facing similar error

    What does this value mean to yarn.app.mapreduce.am.staging-dir=/user ?

    what is the significance of this property and what value shall be configure is this USER name

  • Avatar
    Joel Stewart

    Kewal - The recommended setting is to type the literal path of "/user" this is not a placeholder for a user account.

    I'd recommend double checking the value of the "yarn.app.mapreduce.am.staging-dir" property in the yarn-site.xml file of your Hadoop cluster members.