Adding Column Names automatically (without manually renaming the column names)

2 followers
0
Avatar

How to insert column names automatically (without manually renaming the column names) to an existing DATAMEER workbook which has no headers.There are more than 200 columns in the workbook

Brainsphere

4 comments

  • Avatar
    Joel Stewart

    Thank you for the question Brainsphere. I do not quite understand the situational details to be able to make a recommendation. Could you elaborate with an example using just a few columns for reference? 

    0
  • Avatar
    Brainsphere

    Thanks Joel for your response.

    The file is being sent from an upstream system and is without headers and has 50+ columns.Some cleaning and parsing of data in certain columns is also required.

    Example
    Column A= cola=ABC Systems Column B = colb=XYZ Column C = colc=$100 Column D cold= 9000

    After parsing the text to remove the col.. text I get

    Column A= ABC Systems Column B = XYZ Column C = $100 Column D = 9000

     

    The files have a defined schema but the column names are not present.Also while uploading the file in datameer we cannot apply the 'custom schema' option as the file needs some cleaning before it can have the format Column A= ABC Systems Column B = XYZ Column C = $100 Column D = 9000.

    Please note that the file could have data from multiple sources and depending upon the source the schema could change.So while uploading the file the custom schema option cannot be applied.

    Once the file is changed to the format
    Column A= ABC Systems Column B = XYZ Column C = $100 Column D = 9000.


    The coloumn names will then need to be changed to the schema column names so that the final workbook should have Company Name = ABC Systems Company Address = XYZ etc.

    Since there are multiple files like this and each file has 50+ columns it is not reasonable to rename the column names manually.

    Would appreciate your insight.

    -1
  • Avatar
    Joel Stewart

    Thanks for the additional details. In this circumstance, I think the most scalable approach is to follow these steps: 

    1. Upload the original file as is -- use the default column names from Datameer during this step. 
    2. Perform the transformations as required in a Workbook and run the workbook.
    3. Use the REST API to download a copy of the workbook definition which includes the column names. 
    4. Find and replace the default column names with the desired column names in the resulting JSON file.
    5. Use the REST API again to update the workbook definitions on the Datameer server. 

    If using the REST API and some direct exchanges that way, there is an alternative method that could be used as well but would require downloading and re-uploading the data set (if it's large, this may be tedious): 

    1. Upload the original file as is -- use the default column names from Datameer during this step. 
    2. Perform the transformations as required in a Workbook and run the workbook.
    3. Download the Workbook Results
    4. Re-upload the results but use the Column Headers feature to upload a separate file with the titles of the columns in the first line of the file. 
    0
Please sign in to leave a comment.