Precisely Controlling Hadoop Splits

In some situations, a job's performance may improve with increasing or decreasing the number of splits for a particular job.

Step-by-step guide

The following parameters are available to control the splitting of a job. Each of these can be set in the Custom Hadoop Properties for the job:

  • das.splitting.max-split-count - Explicitly set the maximum number of splits for a particular job.
  • das.splitting.min-split-count - Explicitly set the minimum number of splits for a particular job.
  • das.splitting.min-split-size-hard - Explicitly set the minimum split size for a particular job.
  • mapred.max.split.size - Set the maximum size of a split. 
  • mapred.min.split.size - Set the minimum size of a split.

Note that the "mapred" options above are native Hadoop properties and that these are not enforced by Datameer.