Approach to unpivot a workbook

3 followers
0
Avatar

Hello Datameer,

We have the following technical question; in which way in Datameer can we convert the columns header (months) in rows. The workbook is already uploaded. 

Example

 

Jan

Feb

Mar

INICIO

126.0

40.0

176.0

DTU/DTI

49.0

141.0

204.0

 

Desired result

 

INICIO

JAN

126.0

INICIO

FEB

40.0

INCIO

MAR

176

DTU/DTI

JAN

49.0

...

 

Pablo Redondo

Official comment

  • Avatar
    Joel Stewart

    I can't think of a function that can be used to perform this directly, however it can be done in a couple of steps. 

    In a sheet, create a keyed list of the values that you'd like to split later. Here is an example function to do that: 

    LIST("Jan: "+T(#Jan); "Feb: "+T(#Feb); "Mar: " + T(#Mar))

    After that, you can create a second sheet and use the EXPAND(#Sheet1!MyList) function to separate the list into one row per list. You may also reference #Sheet1!column1 to copy the original values over in this data too. 

    Then, you may split the contents of the EXPAND data from "Jan: 126.0" into the different columns using the SUBSTR function. 

    Lastly, be sure to convert "126.0" from a STRING back to a FLOAT using the FLOAT() function. 

    0

6 comments

  • Avatar
    Nikhil Kumar

    Joel - thanks for the suggestion. We are running into the following error with the EXPAND function. Any ideas how to workaround this.

     

    Sheet 1, the expand function it´s displaying the values in right order

    Sheet 2, the copy function takes the values for the expand function in the sheet 1 in wrong order

     


    0
  • Avatar
    Joel Stewart

    The row order is not expected to be maintained between different sheets. Hadoop may process the records on different distributed nodes and the order of the output records may differ from the input records. This occurs during a "Shuffle" phase.

    All the rows should still be consistent with respect to their relative column data. 

    If the order of the results is critical to the analysis, you may implement a sort on the sheet to force the order to be preserved. This is generally not recommended because it significantly reduces the performance of the job by reducing the distribution of processing.

    0
  • Avatar
    Nikhil Kumar

    Joel -

    Thanks again! So, I just implemented your technique and Im going to suggest your approach which is LIST("Jan: "+T(#Jan); "Feb: "+T(#Feb); "Mar: " + T(#Mar)). I think the issue with the original approach is that he is doing a group rownumber and that requires a shuffle I'm guessing. With your approach, there should be no shuffle correct?

     

    0
  • Avatar
    Joel Stewart

    The GROUPROWNUMBER function would be tying this solution too closely to the order of the rows. I advise taking a critical look at the expected outcome to verify if the row number itself is required. 

    Without full scope of the workbook that the user is developing, it's difficult to say if there would be any shuffling or not. The best way I can think to answer this question is to recommend building analytics where the order of the rows is not critical. 

    0
  • Avatar
    Nikhil Kumar

    Thanks Joel. I have suggested your approach. I agree that we should divorce ourselves from having t do a group number and also not worry about the order of records. But I have a feeling that they are looking to display the records in the Jan, Feb .... Dec order. I will confirm and get back to you.

    0
Please sign in to leave a comment.