Import Job or Export Job Failure - awstasks.com.jcraft.jsch.JSchException - Connection reset

Problem

An Import Job or Export Job fails in Datameer and the following error message is generated: 

Caused by: java.io.IOException: awstasks.com.jcraft.jsch.JSchException: Session.connect: java.net.SocketException: Connection reset 

More expansively, here is an example stacktrace from an Export Job. This stacktrace is generated in the job log: 

java.lang.RuntimeException: java.lang.RuntimeException: Failed to generate file for 'RecordStream[sheetName=export,description=datameer.dap.common.job.dapexport.ExportJob$ExportRecordProcessor@3415aedf]' 
at datameer.dap.common.graphv2.ConcurrentClusterSession$1.run(ConcurrentClusterSession.java:56)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.RuntimeException: Failed to generate file for 'RecordStream[sheetName=export,description=datameer.dap.common.job.dapexport.ExportJob$ExportRecordProcessor@3415aedf]'
at datameer.dap.common.graphv2.ClusterSession.execute(ClusterSession.java:240)
at datameer.dap.common.graphv2.ConcurrentClusterSession$1.run(ConcurrentClusterSession.java:53)
... 5 more
Caused by: java.lang.RuntimeException: java.io.IOException: awstasks.com.jcraft.jsch.JSchException: Session.connect: java.net.SocketException: Connection reset
at datameer.com.google.common.base.Throwables.propagate(Throwables.java:160)
at datameer.dap.common.graphv2.hadoop.ServerSideContext.finalizeExport(ServerSideContext.java:62)
at datameer.dap.common.job.dapexport.ExportJob$ExportRecordProcessor.afterMrJobEnd(ExportJob.java:180)
at datameer.dap.common.graphv2.RecordStream.afterJobEnd(RecordStream.java:76)
at datameer.dap.common.graphv2.BaseMrClusterJob.afterJobEnd(BaseMrClusterJob.java:142)
at datameer.dap.common.graphv2.ClusterJob.run(ClusterJob.java:138)
at datameer.dap.common.graphv2.ClusterSession.execute(ClusterSession.java:227)
... 6 more
Caused by: java.io.IOException: awstasks.com.jcraft.jsch.JSchException: Session.connect: java.net.SocketException: Connection reset
at datameer.awstasks.ssh.JschRunner.run(JschRunner.java:215)
at datameer.awstasks.ssh.JschRunner.execute(JschRunner.java:222)
at datameer.awstasks.exec.ShellExecutor.execute(ShellExecutor.java:25)
at datameer.dap.hadoop.filesystem.LinuxShellCommandExecutor.deletePath(LinuxShellCommandExecutor.java:52)
at datameer.dap.hadoop.filesystem.ScpFileSystem.delete(ScpFileSystem.java:134)
at datameer.dap.sdk.util.FileOutputAdapter.finalizeExport(FileOutputAdapter.java:342)
at datameer.dap.common.graphv2.hadoop.ServerSideContext.finalizeExport(ServerSideContext.java:60)
... 11 more
Caused by: awstasks.com.jcraft.jsch.JSchException: Session.connect: java.net.SocketException: Connection reset
at awstasks.com.jcraft.jsch.Session.connect(Session.java:558)
at awstasks.com.jcraft.jsch.Session.connect(Session.java:183)
at datameer.awstasks.ssh.JschRunner.createFreshSession(JschRunner.java:369)
at datameer.awstasks.ssh.JschRunner.openSession(JschRunner.java:289)
at datameer.awstasks.ssh.JschRunner.run(JschRunner.java:207)
... 17 more

 

Cause

The cause of this issue is that the network connection between a Hadoop Data Node and the Import/Export SSH or SFTP server was dropped by the SSH or SFTP server. 

To identify the root cause of the dropped connection, the SSH or SFTP daemon logs may need to be investigated on the Import/Export target server.

A common cause is that there is a limit on the number of concurrent SSH/SFTP connections to target server and this limit was temporarily exhausted. 

 

Resolution

If the root cause of the issue was a temporary issue (i.e. limit exhausted or network outage), it may be possible to work-around this issue by simply re-running the job.

It may be helpful to reduce the concurrency of the Import Job or Export Job. This may cause the performance of the job to decrease since less concurrent network connections are established. To limit the maximum concurrency, set the following parameter in the Custom Properties of the affected job (the example value is 1):

das.splitting.max-split-count=1