How to Leverage the REST API with Datameer

Goal

Are you more of a command line aficionado than a user interface promoter? Wish you could navigate Datameer artifacts using the command line? With this guide, you will be able to do a variety of things with Datameer artifacts without the Datameer user interface (UI).

 

Sample data

This guild will take give you step-by-step instructions on using some of Datameer's Rest API. You can follow along by going to the Datameer App Market and then download the Flight Delays App.

 

Learn

Find our REST call (GET)

In this example, you will start with the GET command. This command will allow you to retrieve the configurations for a particular Datameer artifact. In order to run a REST API command via command line, you will need your username and password for Datameer, as well as the URL for Datameer.

Open up whichever command line application you prefer. Here is the command you will be using:

curl -u <username>:<password> -X GET 'http://<Datameer-serverIP>:<port-number>/rest/import-job/ <job-configuration-id>' 

 

You must make sure to fill in your actual username and password for your Datameer instance along with the Datameer URL and job configuration ID. Your job configuration ID can be found in your Datameer UI. Go to your Datameer instance and select the Airports import job and view the Information Browser to the right of the artifacts. You will find the ID here:

A completed REST call with GET will look something like this:

curl -u admin:admin -X GET 'http://localhost:8080/rest/import-job/23'

 

Understanding the return for the GET command

Once you run the command, your return should look like this:

{

  "version": "4.0.2",

  "className": "datameer.dap.common.entity.DataSourceConfigurationImpl",

  "file": {

    "uuid": "df4fe482-d07f-4f53-aa04-b58aac43f594",

    "path": "/Users/admin/Applications/Flight Delays/Resources/Airports.imp",

    "description": "",

    "name": "Airports"

  },

  "pullType": "MANUALLY",

  "minKeepCount": 1,

  "properties": {

    "TextFileFormat": [

      "TEXT"

    ],

    "fileNameTimeRange_mode": [

      "OFF"

    ],

    "fileNameTimeRange_startDate": [

      ""

    ],

    "filter.minAge": [

      ""

    ],

    "filter.maxAge": [

      ""

    ],

    "characterEncoding": [

      "UTF-8"

    ],

    "recordSampleSize": [

      "1000"

    ],

    "escapeCharacter": [

      ""

    ],

    "detectColumnDefinition": [

      "SELECT_PARSE_AUTO"

    ],

    "collectAdditionalFields": [

      "false"

    ],

    "quoteCharacter": [

      "\""

    ],

    "delimiter": [

      ","

    ],

    "csv.max-lines-per-record": [

      "1"

    ],

    "external.store": [

      "false"

    ],

    "filter.page.does.split.creation": [

      "false"

    ],

    "fileType": [

      "CSV"

    ],

    "GenericConfigurationImpl.temp-file-store": [

      "1dad24d5-96c2-4af1-8460-b206f8df3cd2"

    ],

    "incrementalMode": [

      "false"

    ],

    "histogram.generation": [

      "false"

    ],

    "file": [

      "flightdelays/ICAOAirports.csv.zip"

    ],

    "strictQuotes": [

      "false"

    ]

  },

  "hadoopProperties": "",

  "dataStore": {

    "path": "/Users/admin/Applications/Flight Delays/Resources/Examples in S3.dst",

    "uuid": "a61e955c-576d-47b5-b50a-8554403eddbb"

  },

  "errorHandlingMode": "DROP_RECORD",

  "maxLogErrors": 1000,

  "maxPreviewRecords": 5000,

  "notificationAddresses": "",

  "notificationSuccessAddresses": "",

  "fields": [

    {

      "id": 347,

      "pattern": "",

      "acceptEmpty": true,

      "name": "id",

      "origin": "0",

      "valueType": "{\"type\":\"INTEGER\"}",

      "include": true,

      "version": 3

    },

    {

      "id": 348,

      "pattern": "",

      "acceptEmpty": true,

      "name": "ident",

      "origin": "1",

      "valueType": "{\"type\":\"STRING\"}",

      "include": true,

      "version": 3

    },

    {

      "id": 349,

      "pattern": "",

      "acceptEmpty": true,

      "name": "type",

      "origin": "2",

      "valueType": "{\"type\":\"STRING\"}",

      "include": true,

      "version": 3

    },

    {

      "id": 350,

      "pattern": "",

      "acceptEmpty": true,

      "name": "name",

      "origin": "3",

      "valueType": "{\"type\":\"STRING\"}",

      "include": true,

      "version": 3

    },

    {

      "id": 351,

      "pattern": "",

      "acceptEmpty": true,

      "name": "latitude_deg",

      "origin": "4",

      "valueType": "{\"type\":\"FLOAT\"}",

      "include": true,

      "version": 3

    },

    {

      "id": 352,

      "pattern": "",

      "acceptEmpty": true,

      "name": "longitude_deg",

      "origin": "5",

      "valueType": "{\"type\":\"FLOAT\"}",

      "include": true,

      "version": 3

    },

    {

      "id": 353,

      "pattern": "",

      "acceptEmpty": true,

      "name": "elevation_ft",

      "origin": "6",

      "valueType": "{\"type\":\"INTEGER\"}",

      "include": true,

      "version": 3

    },

    {

      "id": 354,

      "pattern": "",

      "acceptEmpty": true,

      "name": "continent",

      "origin": "7",

      "valueType": "{\"type\":\"STRING\"}",

      "include": true,

      "version": 3

    },

    {

      "id": 355,

      "pattern": "",

      "acceptEmpty": true,

      "name": "iso_country",

      "origin": "8",

      "valueType": "{\"type\":\"STRING\"}",

      "include": true,

      "version": 3

    },

    {

      "id": 356,

      "pattern": "",

      "acceptEmpty": true,

      "name": "iso_region",

      "origin": "9",

      "valueType": "{\"type\":\"STRING\"}",

      "include": true,

      "version": 3

    },

    {

      "id": 357,

      "pattern": "",

      "acceptEmpty": true,

      "name": "municipality",

      "origin": "10",

      "valueType": "{\"type\":\"STRING\"}",

      "include": true,

      "version": 3

    },

    {

      "id": 358,

      "pattern": "",

      "acceptEmpty": true,

      "name": "scheduled_service",

      "origin": "11",

      "valueType": "{\"type\":\"STRING\"}",

      "include": true,

      "version": 3

    },

    {

      "id": 359,

      "pattern": "",

      "acceptEmpty": true,

      "name": "gps_code",

      "origin": "12",

      "valueType": "{\"type\":\"STRING\"}",

      "include": true,

      "version": 3

    },

    {

      "id": 360,

      "pattern": "",

      "acceptEmpty": true,

      "name": "iata_code",

      "origin": "13",

      "valueType": "{\"type\":\"STRING\"}",

      "include": true,

      "version": 3

    },

    {

      "id": 361,

      "pattern": "",

      "acceptEmpty": true,

      "name": "local_code",

      "origin": "14",

      "valueType": "{\"type\":\"STRING\"}",

      "include": true,

      "version": 3

    },

    {

      "id": 362,

      "pattern": "",

      "acceptEmpty": true,

      "name": "home_link",

      "origin": "15",

      "valueType": "{\"type\":\"STRING\"}",

      "include": true,

      "version": 3

    },

    {

      "id": 363,

      "pattern": "",

      "acceptEmpty": true,

      "name": "wikipedia_link",

      "origin": "16",

      "valueType": "{\"type\":\"STRING\"}",

      "include": true,

      "version": 3

    },

    {

      "id": 364,

      "pattern": "",

      "acceptEmpty": true,

      "name": "keywords",

      "origin": "17",

      "valueType": "{\"type\":\"STRING\"}",

      "include": true,

      "version": 3

    },

    {

      "id": 365,

      "pattern": "",

      "acceptEmpty": false,

      "name": "dasFileName",

      "origin": "fileInfo.fileName",

      "valueType": "{\"type\":\"STRING\"}",

      "include": false,

      "version": 3

    },

    {

      "id": 366,

      "pattern": "",

      "acceptEmpty": false,

      "name": "dasFilePath",

      "origin": "fileInfo.filePath",

      "valueType": "{\"type\":\"STRING\"}",

      "include": false,

      "version": 3

    },

    {

      "id": 367,

      "pattern": "",

      "acceptEmpty": false,

      "name": "dasLastModified",

      "origin": "fileInfo.lastModified",

      "valueType": "{\"type\":\"DATE\"}",

      "include": false,

      "version": 3

    }

  ]

}

 

This is a return of the configurations that have been set up to import the data into Datameer. If you look closely, you will recognize some things that you see in the import wizard within the Datameer UI:

"version": "4.0.2",

  "className": "datameer.dap.common.entity.DataSourceConfigurationImpl",

  "file": {

    "uuid": "df4fe482-d07f-4f53-aa04-b58aac43f594",

    "path": "/Users/admin/Applications/Flight Delays/Resources/Airports.imp",

    "description": "",

    "name": "Airports"

 

The first part gives you general information about your Datameer instance, the path of the artifact in Datameer, any description and the name of the artifact. 

The next portion (below) gives you more details on the configurations of the import, such as file format, if a histogram will be generated for this job, CSV configurations, record sample size, partitioning, custom hadoop properties, email notification settings, etc. If you were in the Datameer UI, you would see the same type of configurations by right clicking on the artifact and selecting “Configure”. 

"pullType": "MANUALLY",

  "minKeepCount": 1,

  "properties": {

    "TextFileFormat": [

      "TEXT"

    ],

    "fileNameTimeRange_mode": [

      "OFF"

    ],

    "fileNameTimeRange_startDate": [

      ""

    ],

    "filter.minAge": [

      ""

    ],

    "filter.maxAge": [

      ""

    ],

    "characterEncoding": [

      "UTF-8"

    ],

    "recordSampleSize": [

      "1000"

    ],

    "escapeCharacter": [

      ""

    ],

    "detectColumnDefinition": [

      "SELECT_PARSE_AUTO"

    ],

    "collectAdditionalFields": [

      "false"

    ],

    "quoteCharacter": [

      "\""

    ],

    "delimiter": [

      ","

    ],

    "csv.max-lines-per-record": [

      "1"

    ],

    "external.store": [

      "false"

    ],

    "filter.page.does.split.creation": [

      "false"

    ],

    "fileType": [

      "CSV"

    ],

    "GenericConfigurationImpl.temp-file-store": [

      "1dad24d5-96c2-4af1-8460-b206f8df3cd2"

    ],

    "incrementalMode": [

      "false"

    ],

    "histogram.generation": [

      "false"

    ],

    "file": [

      "flightdelays/ICAOAirports.csv.zip"

    ],

    "strictQuotes": [

      "false"

    ]

  },

  "hadoopProperties": "",

  "dataStore": {

    "path": "/Users/admin/Applications/Flight Delays/Resources/Examples in S3.dst",

    "uuid": "a61e955c-576d-47b5-b50a-8554403eddbb"

  },

  "errorHandlingMode": "DROP_RECORD",

  "maxLogErrors": 1000,

  "maxPreviewRecords": 5000,

  "notificationAddresses": "",

  "notificationSuccessAddresses": "",

 

The final part of the return shows the columns and column configurations for the import job:

"fields": [

    {

      "id": 347,

      "pattern": "",

      "acceptEmpty": true,

      "name": "id",

      "origin": "0",

      "valueType": "{\"type\":\"INTEGER\"}",

      "include": true,

      "version": 3

    },

    {

      "id": 348,

      "pattern": "",

      "acceptEmpty": true,

      "name": "ident",

      "origin": "1",

      "valueType": "{\"type\":\"STRING\"}",

      "include": true,

      "version": 3

    },

    {

      "id": 349,

      "pattern": "",

      "acceptEmpty": true,

      "name": "type",

      "origin": "2",

      "valueType": "{\"type\":\"STRING\"}",

      "include": true,

      "version": 3

    },

.......................................   

    {

      "id": 367,

      "pattern": "",

      "acceptEmpty": false,

      "name": "dasLastModified",

      "origin": "fileInfo.lastModified",

      "valueType": "{\"type\":\"DATE\"}",

      "include": false,

      "version": 3

    }

  ]

}

 

Downloading the return

Now that you have an understanding of what is contained in this return, you will now download the data using a different GET command so you can make changes to it!

 The command will look very similar to our first GET command, except you are adding how you would like to save the file:

 curl -u <username>:<password> -X GET ‘http://<Datameer-serverIP>:<port-number>/ rest/import-job/<job-configuration-id>' > Airports.json 

 

When you run this command, it will save the file in whatever directory you are currently on in your command line, or you must specify the directory you would like to save to. For example, you can use the following command to navigate to your downloads folder:

cd /Users/<username>/Downloads  
 
 
Once you hit enter in terminal, you can run the GET command and the file will save to this folder. If you do not want to navigate to the folder, but would like to specify where it saves to, your GET command will look similar to this:
 
curl -u admin:admin -X GET ‘http://localhost:8080/rest/import-job/3' > /Users/username/Downloads/Airports.json 
 
 
Editing the JSON file

Now that you have a physical file to edit, you canl make changes. You can do this two different ways:

  1. Through a text editor such as TextWrangler, Smultron, or Sublime.
  2. From command line.

If you would like to make these changes from a text editor, it is as simple as opening the downloaded file to the application.

If you would like to edit the file from command line, you will run the following command if you are already in the same directory as the file:

vi Airports.json

 

If you are not in the same directory, your command will look something like this:

vi /Users/username/Downloads/Airports.json

 

Now you will change a few things. If you are using your command line, press the “i” to begin editing.

First, let’s change the number of preview records from 1,000 to 5,000. This will increase the sample set that the workbook will use. To do this, find the following configuration: 

Before change

"recordSampleSize": [

  "1000"

 ], 

 

Change the 1000 to 5000

After change

"recordSampleSize": [

  "5000" 

], 

 

Now, change some of the names of the columns. Find the “fields” section of the JSON. Make sure you are in edit mode and change some of the column names in the configuration “name”.

You can change these names to whatever you wish as along as they start with alphanumeric characters and no spaces (i.e. use “ID_Tags” instead of “ID Tags”).

In the example below, change the name of the column from "id" to "Identifier" 

After changes

"fields": [

    {

      "id": 347,

      "pattern": "",

      "acceptEmpty": true,

      "name": "id",

      "origin": "0",

      "valueType": "{\"type\":\"INTEGER\"}",

      "include": true,

      "version": 3

    },