Tez not running on HDP 2.4 with DAS 6.1.2

2 followers
0
Avatar

I cannot run any Tez jobs with Datameer 6.1.2 and HDP 2.4. Mapreduce or Spark is working properly. It is always the same error:

INFO [2016-09-30 09:39:36.239] [JobScheduler thread-1] (JobScheduler.java:396) - Starting job 11 (DAS Version: 6.1.2, Revision: 57da07af2dbc26f22425250c608b8b59a0f29579, Hadoop-Distribution: 2.7.1.2.4.0.0-169 (hdp-2.4.0), JVM: 1.8)
INFO [2016-09-30 09:39:36.243] [JobScheduler thread-1] (NormalJobDriver.java:124) - Checking if JobExecutionValueObject{_id=11} can be started
INFO [2016-09-30 09:39:36.284] [JobScheduler thread-1] (JobScheduler.java:430) - [Job 11] Preparing job in job scheduler thread for DataSourceConfigurationImpl{id=6}...
INFO [2016-09-30 09:39:36.284] [JobScheduler thread-1] (JobScheduler.java:433) - [Job 11] Preparing job in job scheduler thread for DataSourceConfigurationImpl{id=6}... done (0 sec)
INFO [2016-09-30 09:39:36.286] [JobScheduler worker1-thread-2] (JobSchedulerJob.java:89) - [Job 11] Preparing job for DataSourceConfigurationImpl{id=6}...
INFO [2016-09-30 09:39:36.302] [JobScheduler worker1-thread-2] (ImportJobCreator.java:88) - Previous max split value for '/Users/admin/Applications/UFO Sightings/Resources/UFO Sightings Data.imp' is null
INFO [2016-09-30 09:39:36.318] [JobScheduler worker1-thread-2] (JobSchedulerJob.java:94) - [Job 11] Preparing job for DataSourceConfigurationImpl{id=6}... done (0 sec)
INFO [2016-09-30 09:39:36.410] [JobScheduler worker1-thread-2] (JobSchedulerJob.java:115) - Starting job ...
INFO [2016-09-30 09:39:36.471] [JobScheduler worker1-thread-2] (MrPlanRunnerV2.java:81) - Allow running Datameer job with up to 1 concurrent cluster jobs.
INFO [2016-09-30 09:39:36.494] [MrPlanRunnerV2] (JobExecutionTraceService.java:82) - Creating local job execution trace log at /home/datameer/Datameer-6.1.2-hdp-2.4.0/temp/cache/dfscache/local-job-execution-traces/11
INFO [2016-09-30 09:39:36.497] [MrPlanRunnerV2] (TezClusterSession.java:43) - Creating a TEZ job for session Import job (11): UFO Sightings Data with a job count 1
INFO [2016-09-30 09:39:36.497] [MrPlanRunnerV2] (ClusterJobFlow.java:149) - Created configuration for StageGraphClusterJobFlow{stages=[Stage{input=ExternalInputConnector{}, streams=[RecordStream{sheetName=import, description=Identity}]}]}: ClusterJobConfiguration{enabledConsumers=[34c1f502-3588-4591-94cd-26452ee72787, 7436da1b-fcc9-4760-98c6-5ecfc3954708, c0702e33-91d2-4158-a486-e15bbecb473e]}
INFO [2016-09-30 09:39:36.498] [MrPlanRunnerV2] (ClusterSession.java:196) - -------------------------------------------
INFO [2016-09-30 09:39:36.498] [MrPlanRunnerV2] (ClusterSession.java:197) - Running cluster job (TEZ) for 'Import job (11): UFO Sightings Data#import(Identity)'
INFO [2016-09-30 09:39:36.499] [MrPlanRunnerV2] (ClusterSession.java:199) - Output (final): import (8c3171fe-c2de-4b35-a173-08fd048e7fd0)
INFO [2016-09-30 09:39:36.499] [MrPlanRunnerV2] (ClusterSession.java:201) - -------------------------------------------
INFO [2016-09-30 09:39:36.579] [MrPlanRunnerV2] (TezJob.java:174) - Submitting DAG to Tez cluster with name:Import job (11): UFO Sightings Data#import(Identity) (c466d96c-929e-4b78-a930-932371dc37d9)
INFO [2016-09-30 09:39:36.588] [MrPlanRunnerV2] (TezClientFacade.java:72) - Cleaning up tmp/tez-plugin-jars
INFO [2016-09-30 09:39:36.592] [MrPlanRunnerV2] (LightweightDasJobContext.java:68) - Synchronize global task local resources with remote hdfs://dn180.pf4h.local:8020/user/datameer/jobjars
INFO [2016-09-30 09:39:36.688] [MrPlanRunnerV2] (LightweightDasJobContext.java:84) - Synchronize job-specific task local resources with remote hdfs://dn180.pf4h.local:8020/user/datameer/jobjars
INFO [2016-09-30 09:39:36.740] [MrPlanRunnerV2] (LightweightDasJobContext.java:110) - Synchronize additional task local resource 'tmp/tez-plugin-jars/plugin-tez-1475220924000.jar' with remote filesystem hdfs://dn180.pf4h.local:8020/user/datameer/jobjars
INFO [2016-09-30 09:39:36.805] [MrPlanRunnerV2] (LightweightDasJobContext.java:110) - Synchronize additional task local resource '/home/datameer/Datameer-6.1.2-hdp-2.4.0/webapps/conductor/WEB-INF/lib/hadoop-mapreduce-client-core-2.7.1.2.4.0.0-169.jar' with remote filesystem hdfs://dn180.pf4h.local:8020/user/datameer/jobjars
INFO [2016-09-30 09:39:36.909] [MrPlanRunnerV2] (TezSessionImpl.java:45) - Creating new TezClient...
INFO [2016-09-30 09:39:36.956] [MrPlanRunnerV2] (LightweightDasJobContext.java:110) - Synchronize additional task local resource 'tmp/tez-plugin-jars/tez-libs-1475220924000/tez-api-0.7.1.jar' with remote filesystem hdfs://dn180.pf4h.local:8020/user/datameer/jobjars
INFO [2016-09-30 09:39:36.961] [MrPlanRunnerV2] (LightweightDasJobContext.java:110) - Synchronize additional task local resource 'tmp/tez-plugin-jars/tez-libs-1475220924000/tez-common-0.7.1.jar' with remote filesystem hdfs://dn180.pf4h.local:8020/user/datameer/jobjars
INFO [2016-09-30 09:39:36.964] [MrPlanRunnerV2] (LightweightDasJobContext.java:110) - Synchronize additional task local resource 'tmp/tez-plugin-jars/tez-libs-1475220924000/tez-runtime-library-0.7.1.jar' with remote filesystem hdfs://dn180.pf4h.local:8020/user/datameer/jobjars
INFO [2016-09-30 09:39:36.969] [MrPlanRunnerV2] (LightweightDasJobContext.java:110) - Synchronize additional task local resource 'tmp/tez-plugin-jars/tez-libs-1475220924000/tez-dag-0.7.1.jar' with remote filesystem hdfs://dn180.pf4h.local:8020/user/datameer/jobjars
INFO [2016-09-30 09:39:36.976] [MrPlanRunnerV2] (LightweightDasJobContext.java:110) - Synchronize additional task local resource 'tmp/tez-plugin-jars/tez-libs-1475220924000/commons-collections4-4.1.jar' with remote filesystem hdfs://dn180.pf4h.local:8020/user/datameer/jobjars
INFO [2016-09-30 09:39:36.981] [MrPlanRunnerV2] (LightweightDasJobContext.java:110) - Synchronize additional task local resource 'tmp/tez-plugin-jars/tez-libs-1475220924000/RoaringBitmap-0.4.9.jar' with remote filesystem hdfs://dn180.pf4h.local:8020/user/datameer/jobjars
INFO [2016-09-30 09:39:36.983] [MrPlanRunnerV2] (LightweightDasJobContext.java:110) - Synchronize additional task local resource 'tmp/tez-plugin-jars/tez-libs-1475220924000/tez-runtime-internals-0.7.1.jar' with remote filesystem hdfs://dn180.pf4h.local:8020/user/datameer/jobjars
INFO [2016-09-30 09:39:37.033] [MrPlanRunnerV2] (TezClient.java:173) - Tez Client Version: [ component=tez-api, version=0.7.1, revision=6868944d862113485ec98b9480e9f2445bf2b34d, SCM-URL=scm:git:https://git-wip-us.apache.org/repos/asf/tez.git, buildTime=2016-05-04T17:18:07Z ]
INFO [2016-09-30 09:39:37.035] [MrPlanRunnerV2] (TezClientFacade.java:318) - Starting Tez session ...
INFO [2016-09-30 09:39:37.056] [MrPlanRunnerV2] (RMProxy.java:98) - Connecting to ResourceManager at dn187.pf4h.local/192.168.239.187:8050
INFO [2016-09-30 09:39:37.058] [MrPlanRunnerV2] (TezClient.java:375) - Session mode. Starting session.
INFO [2016-09-30 09:39:37.058] [MrPlanRunnerV2] (TezClientUtils.java:173) - Using tez.lib.uris value from configuration: hdfs://dn180.pf4h.local:8020/user/datameer/jobjars/6.1.2/tez-jars/tez-api-0.7.1.jar_339d4b322bfdad1c17109d1495819685.jar,hdfs://dn180.pf4h.local:8020/user/datameer/jobjars/6.1.2/tez-jars/tez-common-0.7.1.jar_cc6468b278f0d0cb0b89317b1895c668.jar,hdfs://dn180.pf4h.local:8020/user/datameer/jobjars/6.1.2/tez-jars/tez-runtime-library-0.7.1.jar_b053b36007fa21bed7c4f5ad65bff6d7.jar,hdfs://dn180.pf4h.local:8020/user/datameer/jobjars/6.1.2/tez-jars/tez-dag-0.7.1.jar_77f5078dc465aaffe0ed3e47f105f5c4.jar,hdfs://dn180.pf4h.local:8020/user/datameer/jobjars/6.1.2/tez-jars/commons-collections4-4.1.jar_45af6a8e5b51d5945de6c7411e290bd1.jar,hdfs://dn180.pf4h.local:8020/user/datameer/jobjars/6.1.2/tez-jars/RoaringBitmap-0.4.9.jar_f4b4ae423753b1a7b34fa17810dc21fd.jar,hdfs://dn180.pf4h.local:8020/user/datameer/jobjars/6.1.2/tez-jars/tez-runtime-internals-0.7.1.jar_cbf9627edf57ed975472f4209e907281.jar
INFO [2016-09-30 09:39:37.131] [MrPlanRunnerV2] (TezCommonUtils.java:122) - Tez system stage directory hdfs://dn180.pf4h.local:8020/user/datameer/temp/job-11/.staging-68d48413-2ed6-4006-88ca-7f1e261ff11e/.tez/application_1475069740945_0004 doesn't exist and is created
INFO [2016-09-30 09:39:37.549] [MrPlanRunnerV2] (YarnClientImpl.java:274) - Submitted application application_1475069740945_0004
INFO [2016-09-30 09:39:37.556] [MrPlanRunnerV2] (TezClient.java:409) - The url to track the Tez Session: http://dn187.pf4h.local:8088/proxy/application_1475069740945_0004/
INFO [2016-09-30 09:39:37.556] [MrPlanRunnerV2] (TezClientFacade.java:320) - Starting Tez session done
INFO [2016-09-30 09:39:37.557] [MrPlanRunnerV2] (TezClientFacade.java:322) - Wait until Tez session ready (remaining attempts 2) ...
INFO [2016-09-30 09:39:39.569] [MrPlanRunnerV2] (TezClient.java:673) - App did not succeed. Diagnostics: Application application_1475069740945_0004 failed 2 times due to AM Container for appattempt_1475069740945_0004_000002 exited with exitCode: 1
For more detailed output, check application tracking page:http://dn187.pf4h.local:8088/cluster/app/application_1475069740945_0004Then, click on links to logs of each attempt.
Diagnostics: Exception from container-launch.
Container id: container_e04_1475069740945_0004_02_000001
Exit code: 1
Stack trace: ExitCodeException exitCode=1:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:600)
at org.apache.hadoop.util.Shell.run(Shell.java:511)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:783)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:212)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:303)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)


Container exited with a non-zero exit code 1
Failing this attempt. Failing the application.
INFO [2016-09-30 09:39:39.570] [MrPlanRunnerV2] (TezClientFacade.java:329) - Failed to create Tez session. Retrying Tez session creation up to 2 more times.
INFO [2016-09-30 09:39:40.071] [MrPlanRunnerV2] (TezClient.java:173) - Tez Client Version: [ component=tez-api, version=0.7.1, revision=6868944d862113485ec98b9480e9f2445bf2b34d, SCM-URL=scm:git:https://git-wip-us.apache.org/repos/asf/tez.git, buildTime=2016-05-04T17:18:07Z ]
INFO [2016-09-30 09:39:40.072] [MrPlanRunnerV2] (TezClientFacade.java:318) - Starting Tez session ...
INFO [2016-09-30 09:39:40.085] [MrPlanRunnerV2] (RMProxy.java:98) - Connecting to ResourceManager at dn187.pf4h.local/192.168.239.187:8050
INFO [2016-09-30 09:39:40.086] [MrPlanRunnerV2] (TezClient.java:375) - Session mode. Starting session.
INFO [2016-09-30 09:39:40.086] [MrPlanRunnerV2] (TezClientUtils.java:173) - Using tez.lib.uris value from configuration: hdfs://dn180.pf4h.local:8020/user/datameer/jobjars/6.1.2/tez-jars/tez-api-0.7.1.jar_339d4b322bfdad1c17109d1495819685.jar,hdfs://dn180.pf4h.local:8020/user/datameer/jobjars/6.1.2/tez-jars/tez-common-0.7.1.jar_cc6468b278f0d0cb0b89317b1895c668.jar,hdfs://dn180.pf4h.local:8020/user/datameer/jobjars/6.1.2/tez-jars/tez-runtime-library-0.7.1.jar_b053b36007fa21bed7c4f5ad65bff6d7.jar,hdfs://dn180.pf4h.local:8020/user/datameer/jobjars/6.1.2/tez-jars/tez-dag-0.7.1.jar_77f5078dc465aaffe0ed3e47f105f5c4.jar,hdfs://dn180.pf4h.local:8020/user/datameer/jobjars/6.1.2/tez-jars/commons-collections4-4.1.jar_45af6a8e5b51d5945de6c7411e290bd1.jar,hdfs://dn180.pf4h.local:8020/user/datameer/jobjars/6.1.2/tez-jars/RoaringBitmap-0.4.9.jar_f4b4ae423753b1a7b34fa17810dc21fd.jar,hdfs://dn180.pf4h.local:8020/user/datameer/jobjars/6.1.2/tez-jars/tez-runtime-internals-0.7.1.jar_cbf9627edf57ed975472f4209e907281.jar
INFO [2016-09-30 09:39:40.126] [MrPlanRunnerV2] (TezCommonUtils.java:122) - Tez system stage directory hdfs://dn180.pf4h.local:8020/user/datameer/temp/job-11/.staging-68d48413-2ed6-4006-88ca-7f1e261ff11e/.tez/application_1475069740945_0007 doesn't exist and is created
INFO [2016-09-30 09:39:40.409] [MrPlanRunnerV2] (YarnClientImpl.java:274) - Submitted application application_1475069740945_0007
INFO [2016-09-30 09:39:40.411] [MrPlanRunnerV2] (TezClient.java:409) - The url to track the Tez Session: http://dn187.pf4h.local:8088/proxy/application_1475069740945_0007/
INFO [2016-09-30 09:39:40.411] [MrPlanRunnerV2] (TezClientFacade.java:320) - Starting Tez session done
INFO [2016-09-30 09:39:40.412] [MrPlanRunnerV2] (TezClientFacade.java:322) - Wait until Tez session ready (remaining attempts 1) ...
INFO [2016-09-30 09:39:42.420] [MrPlanRunnerV2] (TezClient.java:673) - App did not succeed. Diagnostics: Application application_1475069740945_0007 failed 2 times due to AM Container for appattempt_1475069740945_0007_000002 exited with exitCode: 1
For more detailed output, check application tracking page:http://dn187.pf4h.local:8088/cluster/app/application_1475069740945_0007Then, click on links to logs of each attempt.
Diagnostics: Exception from container-launch.
Container id: container_e04_1475069740945_0007_02_000001
Exit code: 1
Stack trace: ExitCodeException exitCode=1:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:600)
at org.apache.hadoop.util.Shell.run(Shell.java:511)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:783)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:212)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:303)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)


Container exited with a non-zero exit code 1
Failing this attempt. Failing the application.
INFO [2016-09-30 09:39:42.421] [MrPlanRunnerV2] (TezClientFacade.java:329) - Failed to create Tez session. Retrying Tez session creation up to 1 more times.
INFO [2016-09-30 09:39:42.922] [MrPlanRunnerV2] (TezClient.java:173) - Tez Client Version: [ component=tez-api, version=0.7.1, revision=6868944d862113485ec98b9480e9f2445bf2b34d, SCM-URL=scm:git:https://git-wip-us.apache.org/repos/asf/tez.git, buildTime=2016-05-04T17:18:07Z ]
INFO [2016-09-30 09:39:42.923] [MrPlanRunnerV2] (TezClientFacade.java:318) - Starting Tez session ...
INFO [2016-09-30 09:39:42.940] [MrPlanRunnerV2] (RMProxy.java:98) - Connecting to ResourceManager at dn187.pf4h.local/192.168.239.187:8050
INFO [2016-09-30 09:39:42.940] [MrPlanRunnerV2] (TezClient.java:375) - Session mode. Starting session.
INFO [2016-09-30 09:39:42.940] [MrPlanRunnerV2] (TezClientUtils.java:173) - Using tez.lib.uris value from configuration: hdfs://dn180.pf4h.local:8020/user/datameer/jobjars/6.1.2/tez-jars/tez-api-0.7.1.jar_339d4b322bfdad1c17109d1495819685.jar,hdfs://dn180.pf4h.local:8020/user/datameer/jobjars/6.1.2/tez-jars/tez-common-0.7.1.jar_cc6468b278f0d0cb0b89317b1895c668.jar,hdfs://dn180.pf4h.local:8020/user/datameer/jobjars/6.1.2/tez-jars/tez-runtime-library-0.7.1.jar_b053b36007fa21bed7c4f5ad65bff6d7.jar,hdfs://dn180.pf4h.local:8020/user/datameer/jobjars/6.1.2/tez-jars/tez-dag-0.7.1.jar_77f5078dc465aaffe0ed3e47f105f5c4.jar,hdfs://dn180.pf4h.local:8020/user/datameer/jobjars/6.1.2/tez-jars/commons-collections4-4.1.jar_45af6a8e5b51d5945de6c7411e290bd1.jar,hdfs://dn180.pf4h.local:8020/user/datameer/jobjars/6.1.2/tez-jars/RoaringBitmap-0.4.9.jar_f4b4ae423753b1a7b34fa17810dc21fd.jar,hdfs://dn180.pf4h.local:8020/user/datameer/jobjars/6.1.2/tez-jars/tez-runtime-internals-0.7.1.jar_cbf9627edf57ed975472f4209e907281.jar
INFO [2016-09-30 09:39:42.985] [MrPlanRunnerV2] (TezCommonUtils.java:122) - Tez system stage directory hdfs://dn180.pf4h.local:8020/user/datameer/temp/job-11/.staging-68d48413-2ed6-4006-88ca-7f1e261ff11e/.tez/application_1475069740945_0010 doesn't exist and is created
INFO [2016-09-30 09:39:43.253] [MrPlanRunnerV2] (YarnClientImpl.java:274) - Submitted application application_1475069740945_0010
INFO [2016-09-30 09:39:43.256] [MrPlanRunnerV2] (TezClient.java:409) - The url to track the Tez Session: http://dn187.pf4h.local:8088/proxy/application_1475069740945_0010/
INFO [2016-09-30 09:39:43.257] [MrPlanRunnerV2] (TezClientFacade.java:320) - Starting Tez session done
INFO [2016-09-30 09:39:43.257] [MrPlanRunnerV2] (TezClientFacade.java:322) - Wait until Tez session ready (remaining attempts 0) ...
INFO [2016-09-30 09:39:45.766] [MrPlanRunnerV2] (TezClient.java:673) - App did not succeed. Diagnostics: Application application_1475069740945_0010 failed 2 times due to AM Container for appattempt_1475069740945_0010_000002 exited with exitCode: 1
For more detailed output, check application tracking page:http://dn187.pf4h.local:8088/cluster/app/application_1475069740945_0010Then, click on links to logs of each attempt.
Diagnostics: Exception from container-launch.
Container id: container_e04_1475069740945_0010_02_000001
Exit code: 1
Stack trace: ExitCodeException exitCode=1:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:600)
at org.apache.hadoop.util.Shell.run(Shell.java:511)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:783)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:212)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:303)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)


Container exited with a non-zero exit code 1
Failing this attempt. Failing the application.
INFO [2016-09-30 09:39:45.777] [MrPlanRunnerV2] (TezJob.java:163) - Completed Tez job 'Import job (11): UFO Sightings Data#import(Identity)' with output path: hdfs://dn180.pf4h.local:8020/user/datameer/temp/job-11/...
INFO [2016-09-30 09:39:45.777] [MrPlanRunnerV2] (ClusterJob.java:131) - Tez Execution Framework completed cluster job 'Import job (11): UFO Sightings Data#import(Identity)' [9 sec]
ERROR [2016-09-30 09:39:45.777] [MrPlanRunnerV2] (ClusterSession.java:219) - Failed to run cluster job 'Import job (11): UFO Sightings Data#import(Identity)' [9 sec]
java.lang.IllegalStateException: Failed to create Tez session.
at datameer.plugin.tez.TezClientFacade$CreateTezClientAction.run(TezClientFacade.java:343)
at datameer.plugin.tez.TezClientFacade.createClient(TezClientFacade.java:159)
at datameer.plugin.tez.TezClientFacade.createSession(TezClientFacade.java:181)
at datameer.plugin.tez.session.TezSessionImpl.<init>(TezSessionImpl.java:46)
at datameer.plugin.tez.session.TezSessionFactory$AlwaysNewSessionFactory.get(TezSessionFactory.java:50)
at datameer.plugin.tez.session.TezSessionFactory$ReuseSessionFactory.createNewSession(TezSessionFactory.java:104)
at datameer.plugin.tez.session.PoolingTezSessionFactory.get(PoolingTezSessionFactory.java:108)
at datameer.plugin.tez.session.TrackRunningSessionFactory.get(TrackRunningSessionFactory.java:50)
at datameer.plugin.tez.DagRunner.submit(DagRunner.java:79)
at datameer.plugin.tez.TezJob.runTezDag(TezJob.java:175)
at datameer.plugin.tez.TezJob.runImpl(TezJob.java:153)
at datameer.dap.common.graphv2.ClusterJob.run(ClusterJob.java:125)
at datameer.dap.common.graphv2.ClusterSession.execute(ClusterSession.java:205)
at datameer.dap.common.graphv2.ClusterSession.runAllClusterJobs(ClusterSession.java:342)
at datameer.dap.common.graphv2.MrPlanRunnerV2.run(MrPlanRunnerV2.java:129)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at datameer.dap.common.security.DatameerSecurityService.runAsUser(DatameerSecurityService.java:109)
at datameer.dap.common.security.DatameerSecurityService.runAsUser(DatameerSecurityService.java:186)
at datameer.dap.common.security.RunAsThread$1.run(RunAsThread.java:34)
at datameer.dap.common.security.RunAsThread$1.run(RunAsThread.java:30)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at datameer.dap.common.filesystem.Impersonator.doAs(Impersonator.java:31)
at datameer.dap.common.security.RunAsThread.run(RunAsThread.java:30)
INFO [2016-09-30 09:39:45.778] [MrPlanRunnerV2] (ClusterSession.java:222) - -------------------------------------------
INFO [2016-09-30 09:39:45.778] [MrPlanRunnerV2] (ClusterSession.java:80) - Committing failed job and moving job output from 'hdfs://dn180.pf4h.local:8020/user/datameer/temp/job-11' to 'hdfs://dn180.pf4h.local:8020/user/datameer/importjobs/6/11'.
WARN [2016-09-30 09:39:45.782] [MrPlanRunnerV2] (DatameerFsClient.java:66) - No files found for hdfs://dn180.pf4h.local:8020/user/datameer/temp/job-11/history
INFO [2016-09-30 09:39:45.789] [MrPlanRunnerV2] (ClusterSession.java:128) - Completed job flow with FAILURE and 0 completed cluster jobs. (hdfs://dn180.pf4h.local:8020/user/datameer/importjobs/6/11)
INFO [2016-09-30 09:39:45.789] [MrPlanRunnerV2] (PoolingTezSessionFactory.java:151) - Closing ReuseSessionFactory{source=AlwaysNewSessionFactory{}}.
INFO [2016-09-30 09:39:46.090] [MrPlanRunnerV2] (HarBuilder.java:77) - Created har file at hdfs://dn180.pf4h.local:8020/user/datameer/jobhistory/6/11/job-metadata.har.tmp out of [hdfs://dn180.pf4h.local:8020/user/datameer/jobhistory/6/11/cluster-jobs.json, hdfs://dn180.pf4h.local:8020/user/datameer/jobhistory/6/11/job-conf.xml, hdfs://dn180.pf4h.local:8020/user/datameer/jobhistory/6/11/job-definition.json, hdfs://dn180.pf4h.local:8020/user/datameer/jobhistory/6/11/job-plan-compiled.dot, hdfs://dn180.pf4h.local:8020/user/datameer/jobhistory/6/11/job-plan-original.dot]. Moving it to hdfs://dn180.pf4h.local:8020/user/datameer/jobhistory/6/11/job-metadata.har
INFO [2016-09-30 09:39:46.135] [MrPlanRunnerV2] (MrPlanRunnerV2.java:174) - Deleting temporary job directory hdfs://dn180.pf4h.local:8020/user/datameer/temp/job-11
INFO [2016-09-30 09:39:46.147] [MrPlanRunnerV2] (DatameerJobStorage.java:157) - Copying job execution trace log from /home/datameer/Datameer-6.1.2-hdp-2.4.0/temp/cache/dfscache/local-job-execution-traces/11 to hdfs://dn180.pf4h.local:8020/user/datameer/jobhistory/6/11/job-execution-trace.log
INFO [2016-09-30 09:39:46.174] [JobScheduler worker1-thread-2] (DapJobCounter.java:172) - Job FAILURE with '0' mr-jobs and following counters:
INFO [2016-09-30 09:39:46.175] [JobScheduler worker1-thread-2] (DapJobCounter.java:175) - IMPORT_RECORDS: 0
INFO [2016-09-30 09:39:46.176] [JobScheduler worker1-thread-2] (DapJobCounter.java:175) - IMPORT_DROPPED_RECORDS: 0
INFO [2016-09-30 09:39:46.176] [JobScheduler worker1-thread-2] (DapJobCounter.java:175) - IMPORT_PREVIEW_RECORDS: 0
INFO [2016-09-30 09:39:46.176] [JobScheduler worker1-thread-2] (DapJobCounter.java:175) - IMPORT_DROPPED_SPLITS: 0
ERROR [2016-09-30 09:39:46.588] [JobScheduler thread-1] (JobScheduler.java:829) - Job 11 failed with exception.
java.lang.RuntimeException: Failed to run cluster job for 'Import job (11): UFO Sightings Data#import(Identity)'
at datameer.dap.common.graphv2.ClusterSession.execute(ClusterSession.java:220)
at datameer.dap.common.graphv2.ClusterSession.runAllClusterJobs(ClusterSession.java:342)
at datameer.dap.common.graphv2.MrPlanRunnerV2.run(MrPlanRunnerV2.java:129)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at datameer.dap.common.security.DatameerSecurityService.runAsUser(DatameerSecurityService.java:109)
at datameer.dap.common.security.DatameerSecurityService.runAsUser(DatameerSecurityService.java:186)
at datameer.dap.common.security.RunAsThread$1.run(RunAsThread.java:34)
at datameer.dap.common.security.RunAsThread$1.run(RunAsThread.java:30)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at datameer.dap.common.filesystem.Impersonator.doAs(Impersonator.java:31)
at datameer.dap.common.security.RunAsThread.run(RunAsThread.java:30)
Caused by: java.lang.IllegalStateException: Failed to create Tez session.
at datameer.plugin.tez.TezClientFacade$CreateTezClientAction.run(TezClientFacade.java:343)
at datameer.plugin.tez.TezClientFacade.createClient(TezClientFacade.java:159)
at datameer.plugin.tez.TezClientFacade.createSession(TezClientFacade.java:181)
at datameer.plugin.tez.session.TezSessionImpl.<init>(TezSessionImpl.java:46)
at datameer.plugin.tez.session.TezSessionFactory$AlwaysNewSessionFactory.get(TezSessionFactory.java:50)
at datameer.plugin.tez.session.TezSessionFactory$ReuseSessionFactory.createNewSession(TezSessionFactory.java:104)
at datameer.plugin.tez.session.PoolingTezSessionFactory.get(PoolingTezSessionFactory.java:108)
at datameer.plugin.tez.session.TrackRunningSessionFactory.get(TrackRunningSessionFactory.java:50)
at datameer.plugin.tez.DagRunner.submit(DagRunner.java:79)
at datameer.plugin.tez.TezJob.runTezDag(TezJob.java:175)
at datameer.plugin.tez.TezJob.runImpl(TezJob.java:153)
at datameer.dap.common.graphv2.ClusterJob.run(ClusterJob.java:125)
at datameer.dap.common.graphv2.ClusterSession.execute(ClusterSession.java:205)
... 12 more
INFO [2016-09-30 09:39:46.620] [JobScheduler thread-1] (JobScheduler.java:904) - Computing after job completion operations for execution 11 (type=NORMAL)
INFO [2016-09-30 09:39:46.621] [JobScheduler thread-1] (JobScheduler.java:908) - Finished computing after job completion operations for execution 11 (type=NORMAL) [0 sec]
WARN [2016-09-30 09:39:46.631] [JobScheduler thread-1] (JobScheduler.java:759) - Job DapJobExecution{id=11, type=NORMAL, status=ERROR} completed with status ERROR.

Michael Ahn

2 comments

  • Avatar
    Konsta Danyliuk

    Hello Michael,

    In order to resolve Tez errors caused by Container exited with a non-zero exit code 1

    I would suggest to try the following steps:

    1. Please ensure that you have correct Yarn classpath set at Datameer cluster settings.

    • Type the command yarn classpath at your namenode cli and ensure that corresponding setting at your Datameer cluster matches to it's output.

    Related documentation Job Failure - ExitCodeException exitCode=1

    2. Check whether native Hadoop libraries have been properly set

    • Find where native hadoop libraries live: sudo find / -name libhadoop.so
    • Set below custom properties at your cluster settings
    yarn.app.mapreduce.am.env=LD_LIBRARY_PATH=$LD_LIBRARY_PATH:<absolute-path-to-native-libs>

    yarn.app.mapreduce.am.admin.user.env=LD_LIBRARY_PATH=$LD_LIBRARY_PATH:<absolute-path-to-native-libs>

    mapreduce.admin.user.env=LD_LIBRARY_PATH=$LD_LIBRARY_PATH:<absolute-path-to-native-libs>

    mapred.child.env=LD_LIBRARY_PATH=$LD_LIBRARY_PATH:<absolute-path-to-native-libs>

    tez.am.launch.env=LD_LIBRARY_PATH=$LD_LIBRARY_PATH:<absolute-path-to-native-libs>

    tez.task.launch.env=LD_LIBRARY_PATH=$LD_LIBRARY_PATH:<absolute-path-to-native-libs>

    Let me know if this helps you resolve the issue.

     

    0
  • Avatar
    Michael Ahn

    Fixing the YARN class path helped. I did not think in that direction because MR2 and Spark worked. Thanks

    0
Please sign in to leave a comment.