YARN Scheduler - Best Practices

Problem

Datameer is great for splitting up workloads into smaller pieces. You may notice jobs occupying execution frameworks other than simple basic MapReduce jobs.

Solution

Use scheduling to ensure fair sharing of resources in your cluster. 

Fair versus capacity - Which scheduler should be used?

The scheduling strategy is not checked for by Datameer. A job complied by Datameer and sent to the cluster does not know which scheduler is being used. The job will run with any scheduling setting and use the resources granted by the scheduler efficiently. Users of Cloudera's distribution may use the Fair Scheduler.

How about resource allocation?

Depending on workloads and SLAs, a fixed capacity for a pool is often wanted. Fair Scheduler can do that as well. A Datameer job can not take over all the resources on the cluster as it will just use the resources granted by the scheduler only.

How to configure scheduling?

In order to throttle jobs using the scheduler you may follow up in your Hadoop cluster distribution specific documentation and Configure Datameer Jobs for Specific Queue.

Distribution Specific Information

Documentation

Resource Management with Cloudera
Hortonworks YARN Resource Management
Job Scheduling with MapR

Configuration

In Cloudera's Fair Scheduler, having the combination of Virtual Cores Min along with Weight, seems to be sufficient for some environments. Before configuring Weight along with Min Share Preemption Timeout, review the Fair Scheduler Preemption page.

Additional Information

Find additional information on How to Test Job Execution and Queue root.default already has 10000 applications