Datameer HDFS Connections, JDBC Connector, and Performance

Datameer JDBC Driver

The JDBC (Java Database Connection) connector creates a Java-specific connection to a database via API. Unlike HDFS, this is not a "streaming" connection (which reads raw data), but its convenience makes it popular and very versatile. A streaming connection (HDFS Connector) has better performance but is limited in scope and convenience - thus the different performance measurements between JDBC and HDFS connectors.

 

Datameer HDFS Connections

Hive Thrift - Creates a Hive Server connector (using the Thrift Protocol) to read the HDFS Metastore, schema, etc., in a two-step process to all Datameer in order to access HDFS directly.

  • Step 1: Only Metadata is transferred
  • Step 2: The files are read directly from HDFS

Datameer doesn't use the HDFS connector to access Hive Views for HiveServer 2 (HS2)

  • In order to access Hive Views on HS2, you must use the Datameer JDBC Connector

 

JDBC Driver Performance, Datameer, and HiveServer 2

It is important to note, the HS2 configuration determines which mode Datameer uses (this must be configured in Datameer, according to HS2 configuration). An HS2 Overview is available here:

There are two primary modes:

  • Data Mode (HTTP) - (Often used with a firewall, slower performance)
    • Jetty server required
  • Binary Mode (TCP/IP) (Faster performance)
    • TThreadPoolServer from Thrift is required

Note: Increasing performance of Datameer's JDBC connector driver is difficult because Datameer delegates performance to HS2. If HS2 increases their performance, or the execution framework is changed, then the performance of the JDBC connector in Datameer could increase.

 

Reference Material

Please refer to our documentation for further information and detailed instructions.

JDBC Generic (Hive Server2)

Datameer v7