I have two dataframes, I want to replace the values in df1 using values in df2, by the common id. Is this the correct way? The actual dataset is big, so when I did it like this and checked the result, the values in df1 are not the same as in df2. So I'm wondering how this works. Thanks for your help.
In my file, the row orders are different in df1 and df2, so the resulting value column in df1 is not the same as the value column in df2. How to do it correctly?
If df2 will always have rows with all the ids of df1, you can do it as in the following code. If df1 will sometimes have ids that are not in df2, then you will have to do an extra step to keep the df1 values in those rows. I included extra steps for printing out intermediate results.
set.seed(9334)
df1= data.frame(id=c('1_1','1_2','1_3','2_1'),
max=c(3,5,5,6),
min=c(1,2,3,.1),
value=rnorm(4))
df2= data.frame(id=c('1_1','1_2','1_3','1_4','2_1','2_2'),
value=rnorm(6)*10-3)
df1
#> id max min value
#> 1 1_1 3 1.0 -0.5818039
#> 2 1_2 5 2.0 -0.3141435
#> 3 1_3 5 3.0 0.7452423
#> 4 2_1 6 0.1 -1.1327702
df2
#> id value
#> 1 1_1 -11.7366967
#> 2 1_2 0.3124703
#> 3 1_3 -6.2261151
#> 4 1_4 -2.5779802
#> 5 2_1 -1.4631567
#> 6 2_2 -4.2419756
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
df1 <- left_join(df1, df2, by = "id")
#> Warning: Column `id` joining factors with different levels, coercing to
#> character vector
df1
#> id max min value.x value.y
#> 1 1_1 3 1.0 -0.5818039 -11.7366967
#> 2 1_2 5 2.0 -0.3141435 0.3124703
#> 3 1_3 5 3.0 0.7452423 -6.2261151
#> 4 2_1 6 0.1 -1.1327702 -1.4631567
df1 <- select(df1, id, max, min, value = value.y)
df1
#> id max min value
#> 1 1_1 3 1.0 -11.7366967
#> 2 1_2 5 2.0 0.3124703
#> 3 1_3 5 3.0 -6.2261151
#> 4 2_1 6 0.1 -1.4631567