Rename Columns using Vector of Names

katchamp · February 2, 2024, 5:34pm

I would like some advice with fast ways to rename columns that working within dplyr and are pipeable.

I am going to use iris as an example here but my actual data set is much larger with 120 columns which is why some of the known solutions for renaming columns don't work for me.

If I have vector v = c('s_length', 's_width',' p_length', 'p_width', 'species') I want to replace the all the column names on iris with the names in v

I know the following is very easy and accomplishes what I want

v = c('s_length', 's_width',' p_length', 'p_width', 'species')
colnames(iris)<-v

but it's not a pipeable statement. Or at least not that I am aware of.

I also know I could do the following

replace_names = c(s_length = 'Sepal.Length', s_width='Sepal.Width', p_length='Petal.Length', p_width='Petal.Width', species='Species')
iris %>%
    rename(replace_names)

This is pipeable which is good but my actual data frame is 120 columns long. I don't want to have to manually write out something like replace_names for 120 columns. I know I could easily get a vector of the current names and a vector of the names I want (ie v above) but I don't know how to combine them programically into the same form as the replace_names vector that could then be fed into rename.

I would be open to any of the following solutions:

Being able to feed in a vector of new names without defining their relationship to the old column names in dplyr
Forcing the construction colnames(data frame)<-vector_new_col_names to be pipeable
Being able to easily construct a vector of the form c(new_col_1_name = old_col_1_name, ...) from a vector of old names and vector of new names to then be passed to rename
Some other solution???

Thanks in advance for your help

AlexisW · February 2, 2024, 7:57pm

You can use setNames() (base) or set_names()(purrr):

v = c('s_length', 's_width',' p_length', 'p_width', 'species')

iris |>
  setNames(v) |>
  head()
#>   s_length s_width  p_length p_width species
#> 1      5.1     3.5       1.4     0.2  setosa
#> 2      4.9     3.0       1.4     0.2  setosa
#> 3      4.7     3.2       1.3     0.2  setosa
#> 4      4.6     3.1       1.5     0.2  setosa
#> 5      5.0     3.6       1.4     0.2  setosa
#> 6      5.4     3.9       1.7     0.4  setosa

^{Created on 2024-02-02 with reprex v2.0.2}

You can do that with an anonymous function:

v = c('s_length', 's_width',' p_length', 'p_width', 'species')

iris |>
  (\(.x) {colnames(.x) <- v; .x})() |>
  head()
#>   s_length s_width  p_length p_width species
#> 1      5.1     3.5       1.4     0.2  setosa
#> 2      4.9     3.0       1.4     0.2  setosa
#> 3      4.7     3.2       1.3     0.2  setosa
#> 4      4.6     3.1       1.5     0.2  setosa
#> 5      5.0     3.6       1.4     0.2  setosa
#> 6      5.4     3.9       1.7     0.4  setosa

^{Created on 2024-02-02 with reprex v2.0.2}

That seems a lot harder, and also unreadable of course.

Actually I am tempted to ask about the context: if you need to rename all these columns based on an external vector, maybe you're asking the wrong questions (of course I have no idea in your particular case). Maybe these 120 columns should have been rows, and the right approach would be to pivot_longer() before renaming anything. Maybe you're reading it from a csv file with bad names, and the column renaming step should be part of a readr call. Maybe you should really modify the existing names with rename_with() rather than replace them all at once.

Or maybe I'm just wasting time thinking too much about a trivial problem

nirgrahamuk · February 5, 2024, 10:40am

library(tidyverse)

(orig_names <- names(iris))

(new_names <- str_replace_all(orig_names,
                              fixed("."),
                              "_"))
#if the new_names are in the same order as the original names they replace
# then you can relate them by setting them as names of the orig_names vec
names(orig_names) <- new_names

# see this
orig_names

# use this
rename(head(iris),
       all_of(orig_names))

katchamp · February 6, 2024, 3:58pm

Thank you for your many suggestions.

Also the context was I was pulling out a slice of data from a netcdf. It was very easy to get an array of the data I wanted to be the values of my data frame and the vectors of the dimension values that would end up being the column and row names. I was just then having trouble combining them together. Also yes I will eventually pivot longer but I want already have the column names associated with the values before I do so.

system · February 13, 2024, 3:59pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.