Data Collection for Root Cause Analysis

When troubleshooting a Production Down issue, if the administrator would like to investigate the root cause of the issue, it is important to collect information before the environment is restarted. 

Even when all this information is available, a root cause may not be able to be determined.

Pre-requirement

It is assumed that you have prepared commands for faster operations as mentioned in our Installation Guide and our knowledge base article Setup Bash Shell Aliases.

On behalf you may create one of the necessary commands via 

alias dmpid='ps -ef | grep -i "java.*jetty.*datameer" | grep -v grep | tr -s " " | cut -d " " -f2'

Items to Collect before Restarting Datameer

    • Gather dmesg and /var/log/messages
    • Get all! logs from the logs/ directory that were updated in the past day or two at least, especially the application log file (aka conductor.log):
      tar -zcvf rca.tar.gz logs/*.*
    • To list and collect the open files (lsof), run on the Datameer environment
      lsof > lsof.out
    • Additional gather the network connections
      netstat -tonp | grep -i WAIT > netstat.out
    • Force a heap dump
      $JAVA_HOME/bin/jmap -F -dump:format=b,file=heapdump.hprof $(dmpid)
    • If the /dev page is accessible, collect a Thread Dump (i.e. http(s)://<host>:<port>/dev/threaddump)
    • Collect also a Java thread dump, if possible. 
      $JAVA_HOME/bin/jstack -l $(dmpid) > jstack.out
      kill -3 `dmpid`
    • If a heapdump exists from an OutOfMemoryException, collect the heapdump file
    • If a javacore file exists, collect the javacore file
    • Gather an application database dump, if possible.
      mysqldump -h<host> -udap -pdap dap | gzip > Datameer-<version>-<dist>-<date>.sql.gz

Further Troubleshooting

Even when all the information above is collected, the root cause itself may require further troubleshooting to diagnose. If an issue is recurring, Datameer Support may recommend activating Memory Profiling from our Documentation. This is not recommended generally and should only be activated at the request of Datameer Support.