Failed to renew token: Kind: MR_DELEGATION_TOKEN

Problem

MapReduce jobs are not executed and fail with the following error message:

ERROR [2016-05-24 12:04:29.285] [ConcurrentJobExecutor-0] (ClusterSession.java:198) - Failed to run cluster job 'Workbook job (110): TEST_JOB#Sheet1(Group by operation)' [10 sec]
java.lang.RuntimeException: Failed to run job : Failed to renew token: Kind: MR_DELEGATION_TOKEN, Service: 10.1.1.12:10020, Ident: (owner=datameer@DM.COM, renewer=yarn, realUser=, issueDate=1464105869043, maxDate=1464710669043, sequenceNumber=2, masterKeyId=2)
at datameer.dap.sdk.util.ExceptionUtil.convertToRuntimeException(ExceptionUtil.java:49)
at datameer.dap.sdk.util.ExceptionUtil.convertToRuntimeException(ExceptionUtil.java:31)
at datameer.dap.common.graphv2.hadoop.MrJob.runImpl(MrJob.java:228)
at datameer.dap.common.graphv2.ClusterJob.run(ClusterJob.java:128)
at datameer.dap.common.graphv2.ClusterSession.execute(ClusterSession.java:184)
at datameer.dap.common.graphv2.ConcurrentClusterSession$1.run(ConcurrentClusterSession.java:48)
at datameer.dap.common.security.DatameerSecurityService$1.call(DatameerSecurityService.java:151)
at datameer.dap.common.security.DatameerSecurityService$1.call(DatameerSecurityService.java:145)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: Failed to run job : Failed to renew token: Kind: MR_DELEGATION_TOKEN, Service: 10.1.1.12:10020, Ident: (owner=datameer@DM.COM, renewer=yarn, realUser=, issueDate=1464105869043, maxDate=1464710669043, sequenceNumber=2, masterKeyId=2)

Cause

This failure was observed with Datameer running against a multi-homed cluster. By default the service field of a delegation token is populated based on the server IP address. 

Setting hadoop.security.token.service.use_ip=false changes this behavior to use the host name instead of the IP address.

However, this configuration property in not read from job.xml (see MAPREDUCE-6565 for background information). According to this ticket there is a Hadoop class which creates an Configuration object using new Configuration() and expects *-site.xml in the classpath of the client (here Datameer) in a static block.

It looks something like this:

org.apache.hadoop.security.SecurityUtil.java
static {
Configuration conf = new Configuration();
boolean useIp = conf.getBoolean(
CommonConfigurationKeys.HADOOP_SECURITY_TOKEN_SERVICE_USE_IP,
CommonConfigurationKeys.HADOOP_SECURITY_TOKEN_SERVICE_USE_IP_DEFAULT);
setTokenServiceUseIp(useIp);
}

Because of this piece of code, the value of hadoop.security.token.service.use_ip passed through the client programatically is not respected and it uses the default value i.e. true

Solution

  1. Add *-site.xml from the cluster in Datameer's class path (possibly under etc/custom-jars)
  2. Set the value of hadoop.security.token.service.use_ip=true everywhere including the cluster. For example, let the cluster use ip instead of hostname (not a recommended solution).