how to solve getting duplicates from this join

2 followers
0
Avatar

Hi everyone,

 

I am having difficulties trying to solve this problem of getting duplicates after a join that I am having. I have tried groupby on the joined data but that doesn't seem to work.

 

I have a table like Table A below that contains a  few months worth of data but for this example I will make a simple example that is similar to my data. 

 

Table A 

Name  statement date   hours

joe          jan 5-18              5

joe         jan 10-18             5

joe         jan  20-18            5

tom        jan 5-18               5

tom       jan 10-18            10

tom      jan  20-18            15

 

Table B  contains groupby function of Table A and a groupsum of the  hours coloumn to get the total hours in the month for that person and a format date column to help group the sum of the total hours in the month.

Table B

 

Name  statement date  total Hours in month

joe          JAN-2018              15

tom       JAN-2018              30

 

Ultimately I want to join the total hours worked column to the name and month of the person in Table A.  Now when I join them I get a massive amount of duplicates and I don't know how to fix this. The result is like below.

 

INNER Joined table A and Table B

 

Name  statement date   hours  total Hours

joe          jan 5-18              5              15

joe         jan 10-18             5              15

joe         jan  20-18            5             15  

joe          jan 5-18              5              15

joe         jan 10-18             5              15

joe         jan  20-18            5             15 

tom        jan 5-18               5             30

tom       jan 10-18            10             30

tom      jan  20-18            15            30

tom        jan 5-18               5             30

tom       jan 10-18            10             30

tom      jan  20-18            15            30

 

I want to outcome to look like this

 

Name  statement date   hours  total Hours in month

joe          jan 5-18              5              15

joe         jan 10-18             5              15

joe         jan  20-18            5             15  

tom        jan 5-18               5             30

tom       jan 10-18            10             30

tom      jan  20-18            15            30

 

How can I fix this?

 

I hope you guy understand what I am trying to explain. 

 

 

thanks 

n

4 comments

  • Avatar
    Joel Stewart

    Can you share the join criteria that you're using in Datameer? Perhaps a screenshot would be helpful to verify the logic and selection of the multiple matching criteria.

    1
  • Avatar
    Joel Stewart

    The join criteria needs to have two criteria, not just one. Specifically, the join should be conducted on the "name" and the "month/year". Since the "month/year" is not a current column in table A, this would need to be added before the join in Datameer can occur. 

    Joining on the two criteria, will eliminate the duplicate matches since they are currently matching every record where the name is the same and not considering the date relationship.

    0
Please sign in to leave a comment.