Hadoop Distribution Upgrades and Datameer

Question

If we upgrade the Hadoop cluster distribution will Datameer stop working or will it just be degraded by not using the new features of newer versions of Hadoop distributions?

Answer

Every Datameer package is built for a certain Hadoop distribution. This is being done in order to ensure that we pack the product with the same versions of libraries a particular Hadoop distribution includes.

For example, in case you use CDH-5.10.0 at the cluster, it would be required to install the package Datameer-<version>-cdh-5.10.0. Difference in lib versions might lead to unpredictable issues with Datameer jobs which are quite hard to investigate.

When you upgrade the Hadoop distribution it is strongly recommended to upgrade Datameer as well in order to follow any changes made in the newer Hadoop version.

We try to timely test and pack Datameer for all major releases of main Hadoop distributions. 

In case you run Datameer-<version>-cdh-5.10.0 on a CDH-5.10.0 cluster and would like to upgrade the cluster to e.g., CDH-5.12.0, it would be required to install the Datameer package compiled for this new Hadoop distribution - Datameer-<version>-cdh-5.12.0.

If there was no change in main Hadoop libs names and content between CDH-5.10.0 and CDH-5.12.0, Datameer continues to work as expected, but again, difference in lib versions might lead to unpredictable issues.

Please note that not all Datameer versions are packed for all Hadoop distributions. It might be required to upgrade Datameer to the version that was compiled and tested against the new Hadoop distribution.

It is also possible to sync libraries without Datameer reinstallation/upgrade. Here is our article that describes such process. How to Synchronize Hadoop Libraries to Datameer with Cloudera.