Export Job to Hive Dropping Records

Problem

When exporting to Hive, a large chunk of records gets consistently dropped. Exporting these records directly to HDFS is possible without an issue. The dropped records only appear when attempting to export to either a new or existing Hive table.

Error Message

There is not an exact error message; instead, the job completes with warnings. The log snip below details the output from a Hive export job that is dropping records.

 INFO [<timestamp>] [MrPlanRunnerV2] (JobExecutionTraceService.java:106) - Copying job execution trace log from /app/datameer/Datameer-<version>-<dist>/build/cache/dfscache/local-job-execution-traces/<jobID> to maprfs:/datalake/corporate/dataliberation/datameer/exportjobs/<configID>/<jobID>/job-execution-trace.log
 INFO [<timestamp>] [JobScheduler worker1-thread-253] (DapJobCounter.java:176) - Job SUCCESS with '1' mr-jobs and following counters:
 INFO [<timestamp>] [JobScheduler worker1-thread-253] (DapJobCounter.java:179) - EXPORT_RECORDS: 1908
 INFO [<timestamp>] [JobScheduler worker1-thread-253] (DapJobCounter.java:179) - EXPORT_DROPPED_RECORDS: 3617

Troubleshooting

Further investigation in the task logs revealed the following nondescript error messaging:

!message:NullPointerException:
!Error Repeated:>100 times
!stack:java.lang.NullPointerException
        at datameer.com.google.common.base.Preconditions.checkNotNull(Preconditions.java:210)
        at datameer.com.google.common.collect.Lists.newArrayList(Lists.java:142)
        at datameer.das.plugin.hive.RecordObjectInspector.coerceColumnForHive(RecordObjectInspector.java:118)
        at datameer.das.plugin.hive.RecordObjectInspector.getStructFieldsDataAsList(RecordObjectInspector.java:108)
        at org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serialize(LazySimpleSerDe.java:396)
        at datameer.das.plugin.hive.HiveFileOutputAdapter.write(HiveFileOutputAdapter.java:80)
        at datameer.das.plugin.hive.HiveOutputAdapter.write(HiveOutputAdapter.java:74)
        at datameer.dap.common.job.dapexport.ExportJob$ExportRecordProcessor$1.computeValue(ExportJob.java:115)
        at datameer.dap.common.job.dapexport.ExportJob$ExportRecordProcessor$1.computeValue(ExportJob.java:108)
        at datameer.dap.sdk.sequence.Sequence$Singleton.moveToNext(Sequence.java:246)
        at datameer.dap.sdk.sequence.Sequence$14.computeNext(Sequence.java:647)
        at datameer.dap.sdk.sequence.Sequence$Simple.moveToNext(Sequence.java:157)
        ...

In this instance, the cause was identified by isolating the columns in which were exported to Hive until it were determined the offending fields. Through this process of elimination following a binary sort pattern, two columns were determined to contain null values.

Solution

The formulas attached to those columns were updated to include the DENULLIFY wrapper and the NullPointerException listed above was resolved. All records were now exported to Hive as expected.