YARN Scheduler - Best Practices
Datameer is great for splitting up workloads into smaller pieces. You may notice jobs occupying execution frameworks other than simple basic MapReduce jobs.
Use scheduling to ensure fair sharing of resources in your cluster.
Fair versus capacity - Which scheduler should be used?
The scheduling strategy is not checked for by Datameer. A job complied by Datameer and sent to the cluster does not know which scheduler is being used. The job will run with any scheduling setting and use the resources granted by the scheduler efficiently. Users of Cloudera's distribution may use the Fair Scheduler.
How about resource allocation?
Depending on workloads and SLAs, a fixed capacity for a pool is often wanted. Fair Scheduler can do that as well. A Datameer job can not take over all the resources on the cluster as it will just use the resources granted by the scheduler only.
How to configure scheduling?
In order to throttle jobs using the scheduler you may follow up in your Hadoop cluster distribution specific documentation and Configure Datameer Jobs for Specific Queue.
Distribution Specific Information
In Cloudera's Fair Scheduler, having the combination of
Virtual Cores Min along with
Weight, seems to be sufficient for some environments. Before configuring
Weight along with
Min Share Preemption Timeout, review the Fair Scheduler Preemption page.