Setting the Max Concurrent Job Limit

Goal

Datameer has the ability to set the number of jobs which are compiled and submitted to a Hadoop cluster at the same time.

 

Learn

Datameer is a job compiler that compiles jobs and sends them to the cluster for execution. Parameter max concurrent jobs limits the amount of jobs it concurrently submits.

 

For example, if the property is set to 25 (default value) and you trigger 30 jobs in Datameer at the same time, only the first 25 items are compiled and sent to the cluster. The remaining five items in the queued state are compiled and submitted to the cluster one by one as soon as a job on the cluster is completed.

 

Datameer compiles jobs one by one. The number of max concurrent jobs shouldn't impact service performance in terms of JVM memory consumption.

 

Considering the above, the amount of concurrent jobs Datameer could submit to the cluster is limited by available cluster resources.

A few things to keep in mind:

  • As Datameer maintains telemetry (e.g., job status) from submitted jobs, it is required to open a connection to the Resource Manager to get this data. There should be at least one open connection per job. It's important to ensure that the right amount of open file handles are configured on the Datameer host.

  • In case you use the Hadoop Capacity Scheduler and have two queues e.g., high priority and low priority, a low number of concurrent jobs will likely lead to the situation when high priority jobs are stuck queued in Datameer.

  • If you have a shared cluster, excessive amounts of concurrent jobs might lead to a lack of resources for Datameer jobs if other services heavily load the cluster at the same time.