Hi,
I have spent some time trying to solve this today but not really been able to come up with a solution.
I have some messy spreadsheets (about 20) where the data is divided into a separate spreadsheet for each year and the column naming is not consistent across them, with some not having a column name at all in the spreadsheet and so get assigned "...1" etc after being imported using readxl.
My aim was to create a regular expression which would capture the different column names after being imported and then transform it to a uniform name of "area".
- I would ideally like to match the column named "...1" etc first, then if they are not available, to match to a different string. Turning off the global flag seem to be the way to do this but I cannot find anywhere online which discusses this in R/tidyverse. I am thinking it might not be possible, or, there might be a more clever way of writing the regex which I cannot figure out.
I've tried to produce a minimal reprex below which illustrates the problem. Particularly in year_3, I end up with 2 columns which the "area" name.
All helped appreciated including a simpler way to tackle this problem.
library(tidyverse)
year_1 <- tribble(
~`...1`, ~admissions,
"Hospital 1", 10,
"Hospital 2", 100,
"hospital 3", 200
)
year_2 <- tribble(
~provider_code, ~`...2`, ~admissions,
"H1", "Hospital 1", 20,
"H2", "Hospital 2", 400,
"H3", "hospital 3", 500
)
year_3 <- tribble(
~"Hospital provider code", ~"Commissioning region/Provider", ~admissions,
"H1", "Hospital 1", 350,
"H2", "Hospital 2", 350,
"H3", "hospital 3", 550
)
clean_up_area_column_name <- function(x){
rename({{x}}, area = matches("\\.{3}[0-9]|commissioning region|hospital provider", ignore.case = TRUE))
}
datasets <- list(year_1, year_2, year_3)
map_df(datasets, clean_up_area_column_name)
# A tibble: 9 × 5
area admissions provider_code area1 area2
<chr> <dbl> <chr> <chr> <chr>
1 Hospital 1 10 NA NA NA
2 Hospital 2 100 NA NA NA
3 hospital 3 200 NA NA NA
4 Hospital 1 20 H1 NA NA
5 Hospital 2 400 H2 NA NA
6 hospital 3 500 H3 NA NA
7 NA 350 NA H1 Hospital 1
8 NA 350 NA H2 Hospital 2
9 NA 550 NA H3 hospital 3
<sup>Created on 2022-08-07 by the [reprex package](https://reprex.tidyverse.org) (v2.0.1)</sup>