Spark Job Fails When It's the First Execution After Datameer Restart


After a Datameer restart, the first job fails with the following error (if it's a Spark job):

Error: UnknownHostException: mycluster

The issue doesn't occur for Tez jobs or any ongoing execution, regardless of its engine. 



The root cause is currently under investigation by the Datameer engineering team. The internal ticket number is DAP-32386.



It is confirmed that the issue only occurs on environments that have enabled HDFS High Availability as well as having the first job executed after Datameer restarts using the Spark engine. If you face such issue, restart impacted job.

Please get in touch with Datameer support if the mentioned workaround doesn't help.