Hi all,
I'm sorry for the basic question. I'm just struggling with something that should be simple. Say I have the the data frame "Test" that originally has three fields: Col1, Col2, Col3.
I want to create new columns based on each of the original columns. The values in each row of the new columns would specify whether the corresponding value in the matching row on the original column is above or below the initial column's median. So, for example, in the image below, Col4 is based on Col1. Col5 is based on Col2. Col6 based on Col3.
It's quite easy to perform this function on a single column and output a single column:
But if I'm performing this same operation over 50 columns, writing out/copy-paste and editing the code can be tedious and inefficient. I should mention that I am hoping to add the new columns to the data frame, not create another data frame. Additionally, there are about 200 other fields in the data frame that will not have this function performed on them (so I can't just use a mutate_all). And the columns are not uniformly named (my examples above are just examples, not the actual dataset) so I'm not able to find a pattern for mutate_at. Maybe there is a way to manually pass a list of column names to the mutate command?
There must be an easy and elegant way to do this. If anyone could help, that would be amazing.
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
df <- tibble::tibble(col1 = sample(letters, 10),
x = sample(letters, 10),
y = runif(10),
zzz = runif(10))
cols_to_mutate_on = which(colnames(df) %in% c('y', 'zzz'))
df <- df %>%
mutate_at(cols_to_mutate_on, .funs = list(relative_to_median = function(x) {
case_when(x < median(x) ~ 'below',
x == median(x) ~ 'at',
TRUE ~ 'above')
}))
df
#> # A tibble: 10 x 6
#> col1 x y zzz y_relative_to_median zzz_relative_to_median
#> <chr> <chr> <dbl> <dbl> <chr> <chr>
#> 1 g o 0.483 0.437 below below
#> 2 u p 0.436 0.911 below above
#> 3 y r 0.107 0.162 below below
#> 4 s e 0.206 0.797 below above
#> 5 o m 0.744 0.103 above below
#> 6 z d 0.784 0.0999 above below
#> 7 d c 0.0530 0.639 below above
#> 8 t a 0.787 0.0995 above below
#> 9 p t 0.507 0.444 above above
#> 10 e g 0.720 0.726 above above
@valeri I can't thank you enough! This is the exact solution I was looking for. It also means that I can now easily run functions across those columns at different break points.
Thank you! You've saved me a tremendous amount of time and introduced me to a much better method.
If your question's been answered (even by you!), would you mind choosing a solution? It helps other people see which questions still need help, or find solutions if they have similar problems. Here’s how to do it: