I have some data frames that I want to bind row-wise together. The individual data frames have come from Excel and, because the Excel files contain several columns with the same column name the data when read by R has column names renamed to be unique. This is expected. Rather than provide the Excel files, I can generate the data as follows:
df1 <- data.frame(`A...1`=rnorm(5),
`A...2`=rnorm(5),
B=rnorm(5))
df2 <- data.frame(`A...1`=rnorm(5),
B=rnorm(5))
Thus, the data from the data frames looks like the following:
> df1
A...1 A...2 B
1 -0.2520014 1.6274111 0.6183493
2 0.5152256 -0.9451730 1.1118546
3 0.2925951 1.1416916 0.3140801
4 0.6923316 1.5487287 -1.0121066
5 -1.2140627 -0.3060724 0.3947781
> df2
A...1 B
1 -0.1887456 -0.9406678
2 -1.0122872 -0.8838331
3 0.2787877 -1.4449848
4 -0.2589786 -0.1398463
5 -1.1378909 0.2089307
>
Now I would like to combine these two data frames row-wise, as shown below. The resultant combined data frame is a bit strange:
> library("tidyverse")
> df <- bind_rows(df1,df2)
New names:
* A...1 -> A
> df
A...1 A...2 B A
1 -0.2520014 1.6274111 0.6183493 NA
2 0.5152256 -0.9451730 1.1118546 NA
3 0.2925951 1.1416916 0.3140801 NA
4 0.6923316 1.5487287 -1.0121066 NA
5 -1.2140627 -0.3060724 0.3947781 NA
6 NA NA -0.9406678 -0.1887456
7 NA NA -0.8838331 -1.0122872
8 NA NA -1.4449848 0.2787877
9 NA NA -0.1398463 -0.2589786
10 NA NA 0.2089307 -1.1378909
>
The thing that is odd is that bind_rows()
renames the column in the second data frame named "A...1" to "A", which then means the combined data frame doesn't have the correct data in the correct column.
It seems that bind_rows()
is checking the column names for the second data frame (but not the first). In bind_cols()
there is an option to determine the level of checking of the column names. In bind_rows()
it seems as if the following is being done:
> vctrs::vec_as_names(c("A...1","B"),repair="unique")
New names:
* A...1 -> A
[1] "A" "B"
>
Changing the repair option is not possible in bind_rows()
as there is no similar option as there is for bind_cols()
.
So, two questions:
- Should I have done the row-wise binding of the data frames in a different way? I don't have control over the original Excel files, so the names might be manipulated on reading.
- Is this a bug, or at least a shortcoming, of
bind_rows()
?
Stephen