SQL Deadlock Observed During Post-processing of a Completed Hadoop Job

Problem

The following error is observed when a job has completed in Hadoop and the Datameer server is performing post-processing on the job internally:

[system] ERROR [2014-01-01 00:00:00.000] [JobScheduler thread-1] (JDBCExceptionReporter.java:234) - Deadlock found when trying to get lock; try restarting transaction
[system] ERROR [2014-01-01 00:00:00.000] [JobScheduler thread-1] (SingleThreadedController.java:121) - Error occurred.
javax.persistence.RollbackException: Error while committing the transaction

The above occurrence was observed in Datameer 3.1.15. This error was also observed when the HousekeepingService was actively running.

Cause

The cause of this issue is a code bug. There is a race condition between the JobScheduler and HousekeepingService components. 

Solution

Resolution of this issue is planned for a future release of Datameer. To work-around this issue, the housekeeping.max_data_per_request variable may be significantly reduced from its default value of 10,000. This value controls how much data is sent in a single SQL transaction by the HousekeepingService. Reducing the data in a transaction will significantly reduce the likelihood that an SQL deadlock will occur.

Here are the steps to enable the work-around using a suggested value of 50 for this work-around:

  1. Stop Datameer.
  2. Add the following parameter to the active properties of the Datameer Server (i.e. live.properties):
    • housekeeping.max_data_per_request=50
  3. Start Datameer.