Dataframes matching

Cin1 · October 26, 2021, 1:40pm

Dear R user,
I have two data frames.

the first contains a column with a list of "macro-codes", a second column with a list of "micro- codes" (these can be repeated across macro-codes), a third column with a weight for each micro-code
the second contains just a column with micro-codes (you can find some of them in the first df).

I need to create a third df with a further column in which i find 1/0 if the micro-code is present or not in the second df. In xls, I would do this with Vlook_up. How can i do it in R?

Any suggestion would be really appreciated,
Thank you in advance for your precious support!

Regards

edgararuiz · October 26, 2021, 1:48pm

Hi @Cin1 , you can do that with a table join. You can take everything from the first table (x) and have it match only those that are common from your second table (y). The command I would use is left_join(), here is the link to the dplyr reference page: Mutating joins — mutate-joins • dplyr

FJCC · October 26, 2021, 2:26pm

I agree that some kind of join is likely to do what you need. If your intention is to filter the data based on the new 1/0 values. you can achieve the filtering in one step with one of the Filter Joins.

Cin1 · October 26, 2021, 5:49pm

Thank you Edgar,

I have tried the following:

df_left_J <- left_join(PTF, ISIN_NC, by = "ID_ISIN")

I suppose I should add the mutate function...However, I see that the obs go from 233k of the starting df (PTF) to 2.009k of the final df (df_left_J). The command increases the number of rows...I just want to maintain the same inital df, with an additional column from df (ISIN_NC).

edgararuiz · October 26, 2021, 5:51pm

Right, that is because there are duplicate ID's possibly on both tables. You'll need to dedup the ID's first.

Personally, I like using

unique1 <- table1 %>%  group_by(id) %>% summarise() %>% ungroup()

That returns a table of unique id's that you can then use to run the match

edgararuiz · October 26, 2021, 5:55pm

Actually, try that with only the second table, and join the original first table to that one. That should allow you to keep everything on the first table, despite dups on that first table

unique2 <- table2 %>%  group_by(id) %>% summarise() %>% ungroup() %>% mutate(match = 1)