Datalink fail to resolve logical name in HA setup - UnknownHostException: nameservice1


DM: 4.4.1, OS: -, DIST: -, COM: HDFS


After upgrading Datameer to 4.4.1, the existing datalinks fail to resolve the configured logical name. Datalink jobs start running fine, but eventually they fail with an error like this:

INFO [2014-10-19 23:41:25.797] [JobScheduler worker1-thread-991] ( - Completed postprocessing: [0 sec], progress at 100
 INFO [2014-10-19 23:41:25.797] [JobScheduler worker1-thread-991] ( - -------------------------------------------
 INFO [2014-10-19 23:41:25.798] [JobScheduler worker1-thread-991] ( - Completed execution plan with SUCCESS and 1 completed MR jobs. (hdfs://nameservice1/user/datameer/importlinks/7199/34922)
 INFO [2014-10-19 23:41:25.814] [JobScheduler worker1-thread-991] ( - Configuring job result artifacts from [hdfs://nameservice1/user/datameer/importlinks/7199]
 INFO [2014-10-19 23:46:24.327] [JobScheduler worker1-thread-991] ( - Configuring job result artifacts from [hdfs://nameservice1/user/datameer/joblogs/34922]
ERROR [2014-10-19 23:46:24.410] [JobScheduler worker1-thread-991] ( - Job failed! Execution plan: digraph G {
  1 [label = "MrInputNode{datalink-sample-input} - 0 Bytes"];
  2 [label = "MrMapNode{datameer.dap.common.job.sample.WritePartitionedPreviewMapper@216634b4}"];
  3 [label = "MrOutputNode{datalink-sample} - 0 Bytes"];
  2 -> 3 [label = "PRODUCED_BY_MAPPER"];
  1 -> 2 [label = "REQUIRED_AS_MAPPER_INPUT"];
datameer.dap.sdk.util.ExceptionUtil$WrappedThreadException: java.lang.IllegalArgumentException: nameservice1
	at datameer.dap.sdk.util.ExceptionUtil.wrapInThreadException(
	at datameer.dap.sdk.util.HadoopUtil.executeTimeRestrictedCall(
	at datameer.dap.sdk.util.HadoopUtil.getFileSystem(
	at datameer.dap.sdk.util.HadoopUtil.getFileSystem(
	at datameer.dap.sdk.cluster.filesystem.ClusterFileSystemProvider$
	at datameer.dap.sdk.datastore.FileDataStoreModel.openFileSystem(
Caused by: java.lang.IllegalArgumentException: nameservice1
	at org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy(
	at org.apache.hadoop.hdfs.NameNodeProxies.createProxy(
	at org.apache.hadoop.hdfs.DFSClient.<init>(
	at org.apache.hadoop.hdfs.DFSClient.<init>(

All the correct HA configuration details are present in the Custom Properties field of the Administration -> Hadoop Cluster page, but the jobs are still failing to resolve the nameservice1 logical name.


Copying the same HA Hadoop configuration details from the Administration -> Hadoop Cluster page to the custom properties field of the HDFS Connection (that datalinks use to connect to the cluster) helps to run Datalinks successfully:


Even having the same configuration details in the Hadoop Custom Properties field of datalinks doesn't help - the configuration needs to be present in the HDFS Connection.

Instead of setting the HDFS Name Node to hdfs://hostname:8080 and to solve the issue global, it will be necessary to use hdfs://nameservice.* 

Further Information

regarding how to "Configure High Availability on a Hadoop Cluster" and "High Availability and Yarn" can be requested from Datameer service team.