Too Many Open Files Error - Datameer Connections to Hadoop Stuck in CLOSE_WAIT State

Problem

After Datameer has been running without error for quite some time, the following warning message is generated in the conductor.log file:

[anonymous]  WARN [2014-01-01 00:00:00.000] [LeaseRenewer:datameer@hadoop-name-node.datameer.com.com:8020] (LeaseRenewer.java:458) - Failed to renew lease for [DFSClient_NONMAPREDUCE_-451590243_47] for 3222 seconds.  Will retry shortly ...
java.io.IOException: Failed on local exception: java.io.IOException: Too many open files; Host Details : local host is: "datameer-app-host/10.0.0.123"; destination host is: "hadoop-name-node.datameer.com":8020; 
	at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:764)
	at org.apache.hadoop.ipc.Client.call(Client.java:1351)
	at org.apache.hadoop.ipc.Client.call(Client.java:1300)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
	at $Proxy97.renewLease(Unknown Source)
	at sun.reflect.GeneratedMethodAccessor258.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:597)
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
	at $Proxy97.renewLease(Unknown Source)
	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.renewLease(ClientNamenodeProtocolTranslatorPB.java:499)
	at org.apache.hadoop.hdfs.DFSClient.renewLease(DFSClient.java:713)
	at org.apache.hadoop.hdfs.LeaseRenewer.renew(LeaseRenewer.java:417)
	at org.apache.hadoop.hdfs.LeaseRenewer.run(LeaseRenewer.java:442)
	at org.apache.hadoop.hdfs.LeaseRenewer.access$700(LeaseRenewer.java:71)
	at org.apache.hadoop.hdfs.LeaseRenewer$1.run(LeaseRenewer.java:298)
	at java.lang.Thread.run(Thread.java:662)
Caused by: java.io.IOException: Too many open files
	at sun.nio.ch.IOUtil.initPipe(Native Method)
	at sun.nio.ch.EPollSelectorImpl.<init>(EPollSelectorImpl.java:49)
	at sun.nio.ch.EPollSelectorProvider.openSelector(EPollSelectorProvider.java:18)
	at org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.get(SocketIOWithTimeout.java:409)
	at org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:325)
	at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:203)
	at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:529)
	at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:493)
	at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:547)
	at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:642)
	at org.apache.hadoop.ipc.Client$Connection.access$2600(Client.java:314)
	at org.apache.hadoop.ipc.Client.getConnection(Client.java:1399)
	at org.apache.hadoop.ipc.Client.call(Client.java:1318)
	... 16 more

An administrator on the Datameer application server can run the following command to identify the number of open network connections in a CLOSE_WAIT state: "lsof | grep -c CLOSE_WAIT". In environments affected by this issue, the count of CLOSE_WAIT connections steadily increases over time until the limit of open files is reached for the user that started the conductor.sh binary. 

Cause

The cause seems to be HDFS-5671 "Fix socket leak in DFSInputStream#getBlockReader". 

Solution

To work-around this issue, restart the Datameer application to successfully close all the CLOSE_WAIT socket connections.
To resolve this issue permanently, please contact Datameer Support for more information and reference DAP-21185.