Datameer drops records while importing larger file


Hi there,

I am using SFTP connection to import a file into datameer. The file size is 2 GB and contains 11,008,546 records. After importing the file to datameer, it gives the following details:

  • Records Imported : 6,057,731
  • Data Volume Imported : 203.0 MB
  • Total Sample Size : 5,000
  • Record Throughput : 36,938 Records/sec
  • Dropped Records : 4,950,815
  • Dropped Splits : 0

 How to import all the records into datameer?




1 comment

  • Avatar
    Konsta Danyliuk

    Hello Lee,

    When Datameer drops records during ingestion, this most probably means that there is something in that dataset it "doesn't like", e.g. particular records don't match the schema.

    To find out the reason why some records were dropped you may check corresponding log file.

    It usually called error-merged.log and available at Job Details page (navigate JobHistory -> Job ID) under Errors section (at very bottom). Alternatively you could download full JobTrace which contains this file a well.

    Here are our support articles about job trace:

    How to Collect a Job Trace
    How to Interpret the Contents of a Job Trace


Please sign in to leave a comment.