Help with column based on last column of a list of data frames

Hi All,

I am trying to mutate a column "disp_M" in this example sample where "disp_M uses last column having variation in name by disp_M_1 in one list, disp_M_2 in another or disp_M3 in yet another and we have many more variables in real data. But I would like to use last column with variation in name within disp_M. I am not not sure how to do that. Any help will be appreciated.

# List name based on Column Name
library(tidyverse)

# Example data
df <- mtcars%>%
  select(everything(), -c(wt:carb)) %>%
  rename("disp_M_1" = disp, "disp_M_2" = hp, "disp_M_3" = drat ) %>%
  mutate(cyl = recode(cyl, "4" = "cyl 4", "6" = "cyl 6", "8" = "cyl 8"))

cyl_4 <- df %>%
  filter(cyl == "cyl 4") %>%
  select(everything(), -disp_M_2, -disp_M_3)

cyl_6 <- df %>%
  filter(cyl == "cyl 6")  %>%
  select(everything(), -disp_M_3)

cyl_8 <- df %>%
  filter(cyl == "cyl 8")

# List of dataframes
cyl <- list(cyl_4, cyl_6, cyl_8)

# Group by Cylinders 
groups <- c("cyl 4", "cyl 6", "cyl 8") 
names(cyl) = groups

# Creating Column disp_M
# Looking to add disp_M such that it gets added in all dataframes within list using last variation in disp_M
disp <- function(x){
  x <- x %>%
    mutate(disp_M = disp_M_1 * 2) # for Cyl 4
  # It should be based on disp_M_2 for Cyl 6 as that is the last column with variation in disp_M name
  # and so on
}

cyl <- map(cyl, ~disp(.))

Hello, this solution uses a custom function with map, to access the last column in every data frame within the list. I'm then multiplying the last column by two. This should work if the last column in every data frame is the one you want to multiply by 2.

library(tidyverse)

# Example data
df <- mtcars%>%
  select(everything(), -c(wt:carb)) %>%
  rename("disp_M_1" = disp, "disp_M_2" = hp, "disp_M_3" = drat ) %>%
  mutate(cyl = recode(cyl, "4" = "cyl 4", "6" = "cyl 6", "8" = "cyl 8"))

cyl_4 <- df %>%
  filter(cyl == "cyl 4") %>%
  select(everything(), -disp_M_2, -disp_M_3)

cyl_6 <- df %>%
  filter(cyl == "cyl 6")  %>%
  select(everything(), -disp_M_3)

cyl_8 <- df %>%
  filter(cyl == "cyl 8")

# List of dataframes
cyl <- list(cyl_4, cyl_6, cyl_8)

# Group by Cylinders 
groups <- c("cyl 4", "cyl 6", "cyl 8") 
names(cyl) = groups


cyl= map(cyl, function(x) x %>% mutate(disp_M= .[,ncol(x)] * 2))

Thanks @bcavinee !
This is great alternative. I am still looking to have a resolution if these columns are anywhere in the data frame. This will help avoid errors if data frame is huge which is the case for my real data.

map(cyl, \(x){
  nx <- names(x)
  last_disp_M <- nx[startsWith(nx, "disp_M")] |> tail(n = 1)
  mutate(x, disp_M = !!sym(last_disp_M) * 2)
})

Thanks @nirgrahamuk !
This is great and is exactly what I was looking for.

Can you please explain 2 things?

  • what does backslash (x) mean in map function. How can we write the same without backslash
  • How can we replace |> with %>% here.

Thanks for helping with the above. I have used simple functions so far and have used only pipes %>%.

the backslash is a shorthand for the keyword function; this is the modern R way of doing anonymous functions. while purrr had its own syntax involving tilde formula ~ they no longer advocate this , but rather the way I showed.

|> is directly replacable by %>%
|> is the modern R pipe, %>% is the prototype that was developed in magrittr/dplyr/tidyverse

1 Like

Thank you @nirgrahamuk !

@nirgrahamuk I'm fairly new to using map functions and have not stumbled across !!sym. Looks like I need to read up on tidy evaluation, great solution!

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.