How to Monitor the Datameer Core Directory Size in HDFS

Goal

Learn how to restrict or to monitor the size of the Datameer core directories in HDFS, to prevent users from loading workbooks with millions of records and filling up the disk space in the cluster. 

Learn

Since Datameer as job compiler is a Hadoop client and does not have much knowledge about the underlying file system, a common way of monitoring resource usage and force restrictions is to set quotas in your Hadoop cluster

Hard quota limits protect you from filled up disk space in your cluster. 

The command

$ hadoop fs -count -q -h /user/datameer

delivers the output for QUOTA, REMAINING_QUOTA, SPACE_QUOTA, REMAINING_SPACE_QUOTA, DIR_COUNT, FILE_COUNT, CONTENT_SIZE and FILE_NAME.

With this, you can implement scripts or cron jobs which are gathering a report frequently and generating or sending alerts if necessary. 

For overall usage you may use reports like

hdfs dfsadmin -report

Further Information

You can also find Hadoop distribution specific information, i.e from Cloudera or Hortonworks