RuntimeException: File does not exist: hdfs://nameservice1:8020/<distribution-specific-path>/hadoop/mapreduce.tar.gz

Problem

Troubleshooting and optimizing the workbook execution. 

Learn

Comparing different execution frameworks by running under MapReduce rather than Tez which is the default.

After forcing the execution under another framework by adding

das.execution-framework=MapReduce

to the workbook's properties, the workbook's execution immediately fails with an error.

Error Message

ERROR [<timestamp>] [ConcurrentJobExecutor-0] (ClusterSession.java:198) - Failed to run cluster job 'Workbook job (<jobID>): <Workbook>#<Worksheet>(Filter by =!ISNULL(#<column>) && CONTAINS(#...' [2 sec]
java.lang.RuntimeException: File does not exist: hdfs://nameservice1:8020/<distribution-specific-path>/hadoop/mapreduce.tar.gz
	at datameer.dap.sdk.util.ExceptionUtil.convertToRuntimeException(ExceptionUtil.java:49)
...

Background

An out-of-box installation of Datameer comes with some pre-defined properties, e.g., for

mapreduce.application.framework.path

which is a decent default.

Solution

The default setting

mapreduce.application.framework.path=/<distribution-specific-path>/mapreduce/mapreduce.tar.gz#yarn

may not work in every case since it will depend on the way that cluster was installed as well as the configuration.

To enable MapReduce jobs, set the path as available within the cluster

mapreduce.application.framework.path=/<cluster-specific-path>/mapreduce/mapreduce.tar.gz#yarn

Once the correct path to the mapreduce library is set, it is possible to run MapReduce jobs.