Rename several columns at once by shared string

So I'm wondering.. I have a table of several colums that have names consisting of A to H and consecutive numbers and a looong annex xxx, e.g. "A1 xxx", "A2 xxx", ..., "H11 xxx", "H12 xxx" and the first one containing the time. Since I want to adress the columns calculating mean and standard deviation from them, it would be really helpful to rename them to something shorter, so only the "A1" is still standing, whithout the appndix.

I was thinking of using a loop to do it, but I feel there might be a smarter solution using tidyverse/dplyr. Is there a possibility of using the same loop for mean and standard deviation? It would be like "A1" to "A3" and "A4" to "A6", ... to "H10" to "H12" belong together and give data for one mean and standard deviation each.

I was trying to start with

head(data) %>% rename_with(~ substring())

But I'm not sure how to proceed and if "substring" is really what I am looking for.
I'm happy for any suggestions.

Hi @Ishavn13, thank you for your question.

I think you're on the right lines with dplyr::rename_with()- I've used it with stringr::str_extract to get the part of the column name you're interested in, e.g. 'A1'.

data <- tibble::tribble(
  ~`Time (h) 2025-01-24T 14.19.33.188`, ~`A A1 OTR (mmol/L/h)`, ~`A A2 OTR (mmol/L/h)`, ~`A A3 OTR (mmol/L/h)`,
  0.33333, 0.639942, 2.309817, -1.577695, 
  0.66666, 1.188565, 0.487093, 1.854478
)

data_renamed <- 
  data |> 
  dplyr::rename_with(
    .cols = 2:length(names(data)), # select all columns except the first one
    .fn = ~stringr::str_extract(
      string = .x,
      pattern = '\\h([[:alnum:]]+)\\h' # pick out the alphanumeric portion between two spaces, e.g. ' A1 '
    ) |> trimws() # remove the spaces
  )

# see the result
data_renamed
#> # A tibble: 2 × 4
#>   `Time (h) 2025-01-24T 14.19.33.188`    A1    A2    A3
#>                                 <dbl> <dbl> <dbl> <dbl>
#> 1                               0.333 0.640 2.31  -1.58
#> 2                               0.667 1.19  0.487  1.85

Created on 2025-01-31 with reprex v2.1.1

I'm not the best with regex, so had to do some cleanup afterwards using trimws(). Maybe you could find a neater way of doing this?

data<-tibble(
  dates=NA,
  `a1 sds`=sample(1:100, 1000, TRUE),
  `a2 ewew ds`=sample(1:100, 1000, TRUE),
  `a3 sds ewe`=sample(1:100, 1000, TRUE),
  `a4 78 wew sds`=sample(1:100, 1000, TRUE),
  
  
)


data %>% 
  rename_with(~ str_sub(., 1, 2), .cols = starts_with("a")) %>% 
  glimpse()

I fancy using pivot_longer() to get these data into a (potentially) more useable format:

suppressPackageStartupMessages(library(tidyverse))

dat <- tibble::tribble(
  ~`Time (h) 2025-01-24T 14.19.33.188`, ~`A A1 OTR (mmol/L/h)`, 
  ~`A A2 OTR (mmol/L/h)`, ~`A A3 OTR (mmol/L/h)`,
  0.33333, 0.639942, 2.309817, -1.577695, 
  0.66666, 1.188565, 0.487093, 1.854478
)

dat %>% 
  pivot_longer(cols=starts_with("A"),
               names_to="old_group",
               values_to="value") %>% 
  arrange(old_group) %>% 
  rename(time = starts_with("Time")) %>%
  extract(col=old_group, into="new_group", regex='\\h([[:alnum:]]+)\\h') %>% 
  group_by(new_group) %>% 
  summarize(mean_val = mean(value, na.rm=TRUE),
            sd_val = sd(value, na.rm=TRUE))
#> # A tibble: 3 × 3
#>   new_group mean_val sd_val
#>   <chr>        <dbl>  <dbl>
#> 1 A1           0.914  0.388
#> 2 A2           1.40   1.29 
#> 3 A3           0.138  2.43

Created on 2025-02-02 with reprex v2.1.1

1 Like