Help with dataframe (vlookup)

I have a dataframe that looks like:
ID Text_1 Text_2 Text_3
1 apple orange NA
2 NA banana NA
3 NA NA pear
4 citrus NA grape
5 NA carrot NA

This is basically a lookuptable in order to identify ID that I want to write it into dataframe b

Line Text_1 Text_2 Text_3
1 apple NA NA
2 apple orange NA
3 NA carrot NA
4 citrus NA grape
5 NA NA grape
6 NA NA pear
7 NA NA pear
8 NA carrot NA

I want to update dataframe b to produce:

Line Text_1 Text_2 Text_3 ID
1 apple NA NA
2 apple orange NA 1
3 NA carrot NA 5
4 citrus NA grape 4
5 NA NA grape
6 NA NA pear 3
7 NA NA pear 3
8 NA carrot NA 5

Could anyone help me with the code required to make this work. It has to be exact match when doing the loopup.

Help is highly appreciated

dplyr package has a left_join function you can use

Have tried but can only make it work when i do lookup at one column (i.e. Text_1)

To match on multiple columns, pass a vector of column names to the by parameter of left_join().

library(dplyr, warn.conflicts = FALSE)

id_df <- tribble(~ ID, ~ Text_1, ~ Text_2, ~ Text_3,
                 1, "apple", "orange", NA,
                 2, NA, "banana", NA,
                 3, NA, NA, "pear",
                 4, "citrus", NA, "grape",
                 5, NA, "carrot", NA)

line_df <- tribble(~ Line, ~ Text_1, ~ Text_2, ~ Text_3,
                   1, "apple", NA, NA,
                   2, "apple", "orange", NA,
                   3, NA, "carrot", NA,
                   4, "citrus", NA, "grape",
                   5, NA, NA, "grape",
                   6, NA, NA, "pear",
                   7, NA, NA, "pear",
                   8, NA, "carrot", NA)

left_join(line_df, id_df, by = c("Text_1", "Text_2", "Text_3"))
#> # A tibble: 8 x 5
#>    Line Text_1 Text_2 Text_3    ID
#>   <dbl> <chr>  <chr>  <chr>  <dbl>
#> 1     1 apple  <NA>   <NA>      NA
#> 2     2 apple  orange <NA>       1
#> 3     3 <NA>   carrot <NA>       5
#> 4     4 citrus <NA>   grape      4
#> 5     5 <NA>   <NA>   grape     NA
#> 6     6 <NA>   <NA>   pear       3
#> 7     7 <NA>   <NA>   pear       3
#> 8     8 <NA>   carrot <NA>       5

Created on 2020-05-02 by the reprex package (v0.3.0)

Thanks a lot.

What if line_df looks like this:
line_df <- tribble(~ Line, ~ Text_1, ~ Text_2, ~ Text_3,
1, "apple", "xy", NA,
2, "apple", "orange", "ff",
3, NA, "carrot", "yy",
4, "citrus", "gg", "grape",
5, NA, "tt", "grape",
6, NA, "oo", "pear",
7, NA, NA, "pear",
8, NA, "carrot", NA)

The current code does not work since it has to be exact match.

I still want to produce the same resulting tibble with ID in the last column. The inner join has to take into consideration that I only want to compare lines in id_df for text elements in that data frame that are not empty. This should produce same result, even if some of the NA fields have text. See below

left_join(line_df, id_df, by = c("Text_1", "Text_2", "Text_3"))
#> # A tibble: 8 x 5
#> Line Text_1 Text_2 Text_3 ID
#>
#> 1 1 apple "xy" NA
#> 2 2 apple orange "ff" 1
#> 3 3 carrot "yy" 5

How can this be done?

I'm sorry, I didn't quite follow what you want to do. Could you please state which observations should be matched and which should not?

I want the left join to match only with fields that are not NA in line_df.
I.e. row 2 in line_df should get an ID = 1 since the text "vb" in text 3 should be discarded

id_df <- tribble(~ ID, ~ Text_1, ~ Text_2, ~ Text_3,
                 1, "apple", "orange", NA,
                 2, NA, "banana", NA,
                 3, NA, NA, "pear",
                 4, "citrus", NA, "grape",
                 5, NA, "carrot", NA)

line_df <- tribble(~ Line, ~ Text_1, ~ Text_2, ~ Text_3,
                   1, "apple", "xx", NA,
                   2, "apple", "orange", "vb",
                   3, NA, "carrot", NA,
                   4, "citrus", NA, "grape",
                   5, "ee", NA, "grape",
                   6, NA, NA, "pear",
                   7, "dd", NA, "pear",
                   8, NA, "carrot", NA)

left_join(line_df, id_df, by = c("Text_1", "Text_2", "Text_3"))

I want the left join to produce the same output line_df as you got for each line_df

line_df <- tribble(~ Line, ~ Text_1, ~ Text_2, ~ Text_3, ~ ID,
                   1, "apple", "xx", NA,NA,
                   2, "apple", "orange", "vb",1, 
                   3, NA, "carrot", NA,5, 
                   4, "citrus", NA, "grape",4, 
                   5, "ee", NA, "grape",NA, 
                   6, NA, NA, "pear",3,
                   7, "dd", NA, "pear",3,
                   8, NA, "carrot", NA,5)

Hence, the left join shall only consider values in id_df that are not NA when performing the left join

Tricky. I think you may have to write a custom function for this as this kind of functionality isn't provided out-of-the-box by dplyr.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.