ClassNotFoundException: org.apache.hadoop.hive.common.io.NonSyncByteArrayOutputStream

Problem

After creating a data link to Hive it is not possible to import data. In the log files an error is shown. 

Caused by: java.lang.NoClassDefFoundError: org/apache/hadoop/hive/common/io/NonSyncByteArrayOutputStream
...
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.hive.common.io.NonSyncByteArrayOutputStream
...

Background

When loading classes, Datameer is giving priority to the etc/custom-jars directory. 

If custom Hive SerDe are used, our process is expecting the classes to reside in the Hive plugin, but if they were first picked up from custom-jars then they will be skipped when the Hive plugin becomes loaded as they are already available.

Troubleshooting Steps

  • Review the current Hive plugin and check if all classes are in place.
  • Check MD5.
  • Run lsof against the Datameer process ID (PID).
  • Note if there are classes pulled from /etc/custom-jars.

Solution

Ensure that custom SerDe jar files are not included in the <datameer-install-path>/etc/custom-jars. If extra custom SerDe jar files in the custom-jars path are found, they need to be removed. It will will be necessary to restart the Datameer service to make the change active.