SQL Deadlock Observed During Post-processing of a Completed Hadoop Job


The following error is observed when a job has completed in Hadoop and the Datameer server is performing post-processing on the job internally:

[system] ERROR [2014-01-01 00:00:00.000] [JobScheduler thread-1] (JDBCExceptionReporter.java:234) - Deadlock found when trying to get lock; try restarting transaction
[system] ERROR [2014-01-01 00:00:00.000] [JobScheduler thread-1] (SingleThreadedController.java:121) - Error occurred.
javax.persistence.RollbackException: Error while committing the transaction

The above occurrence was observed in Datameer 3.1.15. This error was also observed when the HousekeepingService was actively running.


The cause of this issue is a code bug. There is a race condition between the JobScheduler and HousekeepingService components. 


Resolution of this issue is planned for a future release of Datameer. To work-around this issue, the housekeeping.max_data_per_request variable may be significantly reduced from its default value of 10,000. This value controls how much data is sent in a single SQL transaction by the HousekeepingService. Reducing the data in a transaction will significantly reduce the likelihood that an SQL deadlock will occur.

Here are the steps to enable the work-around using a suggested value of 50 for this work-around:

  1. Stop Datameer.
  2. Add the following parameter to the active properties of the Datameer Server (i.e. live.properties):
    • housekeeping.max_data_per_request=50
  3. Start Datameer.