How to Evaluate Cluster I/O (read/write) Performance

Goal

Datameer acts as a job compiler, it compiles jobs and send thems to the cluster for execution. A clusters low I/O throughput might significantly impact Datameer's performance, especially relatively heavy jobs that require a huge volume of intermediate and final results to be written to HDFS.

The TestDFSIO benchmark is used for measuring I/O (read/write) performance. It does this by using a MapReduce job to read and write files in parallel. Hence, functional MapReduce is needed for it.

Learn

In order to measure I/O (read/write) performance via TestDFSIO, it is required to perform the following steps:

 

  • Login at a DataNode client and locate hadoop-mapreduce-client-jobclient-*-tests.jar
    find / -name "hadoop-mapreduce-client-jobclient-*-tests.jar"
  • Execute 3-5 TestDFSIO jobs with different parameters (various amounts and sizes of written files).
    hadoop jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-client-jobclient-<version>-tests.jar TestDFSIO -write -nrFiles 10 -size 25MB

    hadoop jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-client-jobclient-<version>-tests.jar TestDFSIO -write -nrFiles 10 -size 50MB

    hadoop jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-client-jobclient-<version>-tests.jar TestDFSIO -write -nrFiles 10 -size 100MB
  • TestDFSIO output example.
    17/11/13 17:17:24 INFO fs.TestDFSIO: ----- TestDFSIO ----- : write
    17/11/13 17:17:24 INFO fs.TestDFSIO:            Date & time: Mon Nov 13 17:17:24 UTC 2017
    17/11/13 17:17:24 INFO fs.TestDFSIO:        Number of files: 10
    17/11/13 17:17:24 INFO fs.TestDFSIO: Total MBytes processed: 1000.0
    17/11/13 17:17:24 INFO fs.TestDFSIO:      Throughput mb/sec: 14.895138226882745
    17/11/13 17:17:24 INFO fs.TestDFSIO: Average IO rate mb/sec: 17.42253303527832
    17/11/13 17:17:24 INFO fs.TestDFSIO:  IO rate std deviation: 6.712244005380064
    17/11/13 17:17:24 INFO fs.TestDFSIO:     Test exec time sec: 65.527

 

Possible issues 

Sometimes TestDFSIO responds with the following error message.

java.io.FileNotFoundException: File does not exist: /benchmarks/TestDFSIO/io_write/part-00000

This might be caused by an incorrect compression configuration. Additional property -D mapred.output.compress=false to TestDFSIO should fix this.

hadoop jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-client-jobclient-<version>-tests.jar TestDFSIO -D mapred.output.compress=false -write -nrFiles 10 -size 25MB