How to Parse XML File Format


Do you have data in an XML source but unsure the best way to bring it into Datameer? Not sure what to do with it once you get it into a workbook? This guild will show you how to bring your XML data into Datameer and once it is here, how to extract useful information using Datameer's built in formula builder.

Download the sample data to follow along with the example:



Uploading the XML file to Datameer

Open your Datameer instance and select File Upload.

Select the demo data (TicketData.xml) as the file to upload and XML as the File Type.

Click Next.


Now, you need to determine the Record Tag Name and the XPaths for parsing the XML.

For the Record Tag Name, you want to decide what you are most interested in looking at in your data. In this example, you want to look at the ticket content. Type ticket for the XML Record Tag Name because this is the tag used in the XML file.

Next, you need to fill in the XPath. The purpose of XPath is to parse your data into columns. Each XPath will represent one column of your data. If you look at the actual data, you will see various tags within each ticket tag:

  <subject>hello world</subject>
  <score type="integer">0</score>
  <comments type="array">
      <is-public type="boolean">true</is-public>
      <text>This is the first comment.</text>
      <is-public type="boolean">true</is-public>
      <attachments type="array">
      <text>How many comments do i need?</text>
    <group id="product_info">
      <capability name="has_qwerty_keyboard" value="true"/>
      <capability name="pointing_method" value="touchscreen"/>
      <capability name="mobile_browser_version" value="7.11"/>
      <capability name="uaprof" value=""/>
      <capability name="model_name" value="iPAQ 910"/>
      <capability name="brand_name" value="HP"/>
      <capability name="release_date" value="2008_july"/>
    <group id="wml_ui">
      <capability name="softkey_support" value="true"/>
      <capability name="table_support" value="true"/>


You can pull the idsubjectscorecomments, and group. Here is what your XPath should look like:



Each line puts the path for the record. For example, the id tag lives within the ticket tag, so you need to specify this with /ticket/id. The score path is a little different because you want it to grab the text value and not the data type. You do not have to worry about this for the other data since there is only text value.

Once you have this configured, click Next.


Now you will see a preview of your data. The only thing you need to do here is change the column names. This is only to identify them more easily through this tutorial. 

Original Name Changed Name
ticket_comments Comments
ticket_group Group
ticket_id ID
ticket_score Score
ticket_subject Subject

Click Next and then through to the end of the wizard. Save the file as XMLParseExample. The file should automatically run after saving.


Extract data using JSON functions

Add the data into the workbook.

Right click on the source sheet and select Duplicate.

Bring over all columns so you can clean up the data with JSON functions.

Name the duplicated sheet CleanUp


Click in the next empty column to bring up the formula builder. Select the JSON_VALUE function. You will be using this on the Comments column to strip out the list of the comments. The JSON key should be comment.

Name this column Comment.

The workbook should look like this:


Duplicate the CleanUp sheet, but only bring over the ID column and the Comment column.


Now, you will break the comment array up into separate rows. Think of this as doing a reverse GROUPBY - you are exploding out the data.

Click in the next empty column and find the JSON_ELEMENTS function. You will be using this on the Comment column. 


You will notice that now, instead of 2 rows, you now have 4. This means that there is more than one comment in each ticket. 


In the next open column, you will be pulling out the type of comment (whether it is public or private). There are 2 levels of this, so you will be making a nested function.

First, click in the next open column and use JSON_VALUE on the Comment1 column. The JSON Key will be is-public

Name this column IsPublic.


Now, you will nest the function to pull out the TRUE from is-public. In the formula bar, edit the formula and add another JSON_VALUE around the function:



Your results will strip the “true” or “false” out of the JSON content:

In the next 2 columns, you will also use JSON_VALUE.

For the next column, you will pull out the type of comment. This will be from the column Comment1.


Name this column Type. Name the next column Text.


JSON_VALUE uses the provided key to pull out specific data in the JSON string. The keys indicate specifically what data to extract. Your columns should look like this after performing these functions:

Name this sheet TicketExtraction.


You will now duplicate the CleanUp sheet, but only bring over ID and Group.


Now that you have a new sheet, you want to break up the array of data into rows from the Group column. Click in the column next to Group and find JSON_ELEMENTS in the formula builder.

Name the new column GroupRow

This will break up the grouping of each ticket and breaks it down into rows. Your data will look similar to this:


Now, you will extract the data from your GroupRow column using id as the JSON key. Click in the next empty column and find the JSON_VALUE function in the builder just like you did on the previous sheet.

Name this column GroupName as you have extracted the name of each group.


In the next column, you will do the same thing to extract capability.


Name this column Capability. Your new columns will look something like this:


Now that you have a Capability column, you want to extract all capabilities. Click in the next available column and find JSON_ELEMENTS in the formula builder.


This extracts all capabilities into rows. Name column CapabilityRow.


Now you want to extract data from CapabilityRow. Click in the next available column and find JSON_VALUE again. You will now be finding the name of the capabilities.


Name this column CapName.

You will now do the same thing to get the data for each capability in the next column by using the JSON Key value.


Name the column CapValue. Your new columns will look similar to this:


Name this sheet Capabilities.

Now you can save and run your workbook.

You have successfully parsed XML during your import and extracted data using JSON functions!

Further Information

Check out the documentation on Importing XML Files and Using XPath and XML Path Language (XPath).