Fix the Replication Factor is Datameer Using

Problem

If running the HDFS filesystem checking utility the following output is received: 

[root@<host> <user>]# hadoop fsck /user/datameer

Connecting to namenode via http://<host>:<port>

FSCK started by hdfs (auth:KERBEROS_SSL) from /<ip> for path /user/datameer at <datetime>
.

/user/datameer/.staging/job_<id>/job.split: Under replicated BP-<id>. Target Replicas is 10 but found 3 replica(s).

...

Cause

The default value for Hadoop replication of these temp execution files is 10.

mapreduce.client.submit.file.replication=10 

By running the fsck command, Hadoop is reporting against its expected behavior.

This setting only replicates 3 times when Hadoop is still expecting 10.

Solution

Add

mapreduce.client.submit.file.replication=3

to your Datameer Hadoop Custom Properties. You should see this report of under replication resolved.