Hello,
I am trying to join two data frames using dplyr. Neither data frame has a unique key column. The closest equivalent of the key column is the dates variable of monthly data. Each df has multiple entries per month, so the dates column has lots of duplicates.
I was able to find a solution from Stack Overflow, but I am having a really difficult time understanding that solution. Can you help me find a simpler solution that is easier for beginner level users to understand?
Here is a simple reproducible example:
df_1 <- data_frame(
dates = c(as.Date("2018-07-01"), as.Date("2018-06-01"), as.Date("2018-06-01"), as.Date("2018-05-01")),
x1 = c(10L, 11L, 12L, 13L),
text1 = c("text a", "text b", "text c", "text d")
)
df_2 <- data_frame(
dates = c(as.Date("2018-07-01"), as.Date("2018-06-01"), as.Date("2018-05-01"), as.Date("2018-04-01")),
x2 = c(20L, 21L, 22L, 23L),
text2 = c("text aa", "text bb", "text cc", "text dd")
)
Notice that rows 2 & 3 in df_1 both refer to "2018-06-01" (i.e. a duplicate in the key column, other columns have different data)
# A tibble: 4 x 3
dates x1 text1
<date> <int> <chr>
1 2018-07-01 10 text a
2 2018-06-01 11 text b
3 2018-06-01 12 text c
4 2018-05-01 13 text d
> df_2
# A tibble: 4 x 3
dates x2 text2
<date> <int> <chr>
1 2018-07-01 20 text aa
2 2018-06-01 21 text bb
3 2018-05-01 22 text cc
4 2018-04-01 23 text dd
If I do a simple left_join, I get this:
> df_combined_1 <- left_join(df_1, df_2, by = "dates")
> df_combined_1
# A tibble: 4 x 5
dates x1 text1 x2 text2
<date> <int> <chr> <int> <chr>
1 2018-07-01 10 text a 20 text aa
2 2018-06-01 11 text b 21 text bb
3 2018-06-01 12 text c 21 text bb
4 2018-05-01 13 text d 22 text cc
I want a joined data frame that is something like this:
df_need <- data_frame(
dates = c(as.Date("2018-07-01"), as.Date("2018-06-01"), as.Date("2018-06-01"), as.Date("2018-05-01")),
x1 = c(10L, 11L, 12L, 13L),
text1 = c("text a", "text b", "text c", "text d"),
x2 = c(20L, 21L, NA, 23L),
text2 = c("text aa", "text bb", NA, "text dd")
)
df_need
> df_need
# A tibble: 4 x 5
dates x1 text1 x2 text2
<date> <int> <chr> <int> <chr>
1 2018-07-01 10 text a 20 text aa
2 2018-06-01 11 text b 21 text bb
3 2018-06-01 12 text c NA <NA>
4 2018-05-01 13 text d 23 text dd
Here is the Stack Overflow solution that seems to match exactly what I am looking for:
Is it possible to create a solution that is (a) a bit easier to understand for beginners (b) uses the purr package or some other tidyverse solution?
Thanks in advance for any comments and guidance.