Using Rest-Api to run a file-upload


I have a large set of excel files with many sheets that get updated weekly.  Due to the way permissions work across our servers we have been unable to get datameer to work with SFTP, so now I'm exploring using file-uploads but file-uploads have their own unique behavior in datameer.  (We will continue to explore getting SFTP to work, but the network admins seem to indicate that it's not a high priority in the near future)

It turns out using the Rest-api to "run" a file-upload causes, as far as I can tell, datameer to re-run using the copy of the file that it created in the /user/datameer/fileupload directory on HDFS.

Is there any way to force it to reupload from the original source file in an automated manner?

Do I have to create a script that manually copies the files into datameer's HDFS directory and then re-run the file upload?

Is there any other combination of workarounds that would allow me to automate this process?


Alberto Rodriguez

1 comment

  • Avatar
    Joel Stewart

    I do not recommend manually adjusting any of the files in Datameer's HDFS directory. This could lead to corruption of your Datameer instance since the data in HDFS may be inconsistent with what is expected from the Datameer "dap" metastore database. This could lead to unpredictable behaviors.

    I strongly recommend ironing out any permissions issues with the files. However, if a scripted approach is required you could put together a solution that still uses HDFS but outside of the Datameer private directory. For example, this scripted approach would be one to consider if the direct network security cannot be agreed to easily:

    1. Create a directory on HDFS to copy the data files to: i.e. /sftp-replacement-files/
    2. Create a Datameer Connection to HDFS to the new directory created in step 1.
    3. Create an Import Job to collect data from the connection created in step 2.
    4. Create a script that performs the following steps:
    • Copy the files from the SFTP locations to the directory from step 1.
    • Run the Import Job from step 3 to ingest the new data.
    • Delete the files copied from the SFTP locations (or archive them) -- the import job keeps a consistent copy in Datameer's private folder

    Does this work-around make sense if the network solution cannot be reached as the first recommendation? 

Please sign in to leave a comment.