How to increase concurrent Datameer jobs in cluster

2 followers
0
Avatar

Hi ,

I am current doing an Hadoop environment review and i find that jobs contain the following line. 

 INFO [2017-02-01 03:00:06.126] [JobScheduler worker1-thread-8833] (MrPlanRunnerV2.java:61) - Allow running Datameer job with up to 1 concurrent cluster jobs.

Scenario 1:

Does it indicate that Tez Exe Framework should run just 1 datameer job at a time, resulting in queuing of Datameer jobs ?

or 
Scenario 2:

Does it mean that for 1 datameer job , 1 Tez job is allocated.

If it is Scenario 1 , is there a way to increase concurrent Datameer jobs execution on hadoop framework so that many Datameer jobs can execute in parallel and help clear up processing window. 

Suhel Khan

1 comment

  • Avatar
    Konsta Danyliuk

    Hello Suhel,

    [JobScheduler worker1-thread-511] (MrPlanRunnerV2.java:81) - Allow running Datameer job with up to 1 concurrent cluster jobs.

    This message shows how many map/reduce jobs will be concurrently started at the cluster to calculate certain execution submitted by Datameer (e.g. a Workbook or an ImportJob).

    This value is controlled by the property das.job.concurrent-mr-jobs.new-graph-api. You could check this property at job-cont-cluster.xmlfile for a particular job.

    Default values

    • For execution framework Local das.job.concurrent-mr-jobs.new-graph-api=1
    • If Datameer is connected to a cluster das.job.concurrent-mr-jobs.new-graph-api=5

    Amount of concurrent map/reduce jobs depends on file size, execution engine, calculated splits, manual setting of splits and how many mappers and reducer the cluster allows to run in parallel. But most likely on calculated splits, rafly speaking - more splits theoretically lead to more concurrent map/reduce jobs. This is being calculated by internal algorithm on job compiling.

    Concerning Scenario 1

    Amount of concurrent job submitted by Datameer to a cluster is controlled by the option Max Concurrent Jobs. It could be set in HadoopCluster section at Admin tab. Default value is 25, this means that if you'll start 30 jobs at the same time, only first 25 will be started and sent to a cluster, the rest 5 will wait in queued state until any this first jobs is completed.

     

    0
Please sign in to leave a comment.