How to generate a unique row number

1 follower
0
Avatar

We do not have any columns to uniquely identify a record in a workbook and so we have a need to generate a row number(unique) for every record in a workbook.

Thanks for the help.

Saurabh Agashe

Official comment

  • Avatar
    Saurabh Agashe

    This should help you generate unique row numbers for each record:

    1) Create a new sheet in the workbook.

    2) In the first column, add the function GROUPBY(1)

    3) In the second column, add the function GROUPROWNUMBER()

    4) Add any desired columns after these first two columns have been established.

    Note that using this behavior forces the processing of this worksheet to all occur in a single reducer. This is the only way to ensure that the row numbers are consistent. This may have a significant negative performance impact when generating the results.

    Hope this helps!

    0

6 comments

  • Avatar
    Saurabh Agashe

    Thank you for the solution and the heads up. Can you please let me know how do I control the number of reducers from datameer side?

    0
  • Avatar
    Saurabh Agashe

    Thank you. But this setting will be applicable for all jobs running during that time. Right ? Is there a way to restrict these settings for a specific job ?

    0
  • Avatar
    Saurabh Agashe

    You can add Custom Hadoop Properties in the save/configure dialog for each individual job, as opposed to applying properties cluster wide.

    0
Please sign in to leave a comment.