Is there a way to modify each value in a data frame variable by mutating each value by the mean of the variable? I'm trying to do use mutate_all and hitting a wall on my understanding. I've tried a few approaches and nothing seems to work.
I want to do something like this:
sw_height_mass <- starwars %>%
select(height, mass)
starwars_means <- starwars %>%
select(height, mass) %>%
mutate_all(mean, na.rm = TRUE) %>%
rename(hmean = height,
mmean = mass)
starwars_what_i_want <-
bind_cols(sw_height_mass, starwars_means) %>%
mutate(new_height = height / hmean,
new_mass = mass / mmean) %>%
select(5:6)
starwars_what_i_want
Ideally with less code, I imagine something like this...
starwars %>%
select(height, mass) %>%
mutate_all((. / mean), na.rm = TRUE)
But my imagination doesn't match reality and I'm not figuring out what will work. My actual data has a lot more variables.
Thanks for any consideration.
You should look at purrr::map()
1 Like
Here is your code:
library(tidyverse)
starwars %>%
select(height, mass) %>%
map_dfc(~ . / mean(., na.rm = T))
5 Likes
Another method might be to use tidyr::spread
and tidyr::gather
.
library(tidyverse)
starwars %>%
select(name, height, mass) %>% # Optional (only here to make output clearer)
gather(key, value, height, mass) %>% # Go from wide:long data
group_by(key) %>% # Groups are `height` and `mass` (see: `gather` args)
mutate(value = value / mean(value, na.rm = T)) %>% # Divide by group mean
spread(key, value)
#> # A tibble: 87 x 3
#> name height mass
#> <chr> <dbl> <dbl>
#> 1 Ackbar 1.03 0.853
#> 2 Adi Gallia 1.06 0.514
#> 3 Anakin Skywalker 1.08 0.863
#> 4 Arvel Crynyd NA NA
#> 5 Ayla Secura 1.02 0.565
#> 6 Bail Prestor Organa 1.10 NA
#> 7 Barriss Offee 0.952 0.514
#> 8 BB8 NA NA
#> 9 Ben Quadinaros 0.935 0.668
#> 10 Beru Whitesun lars 0.946 0.771
#> # ... with 77 more rows
1 Like
Thanks @prosoitos.
You helped me learn a bit more about the whole '~ .' notation. Pointed me in a good direction. Huge help!!
Applying that, I see these two are equivalent...
starwars %>%
select(height, mass) %>%
mutate_all(~ . / mean(., na.rm = TRUE))
starwars %>%
select(height, mass) %>%
map_dfc(~ . / mean(., na.rm = T))
Thanks again.
2 Likes
Thanks @torvaney. Nice approach. Thanks for the commenting / documentation.
They are equivalent here since the input is a data frame. The map
solution is much more general in that it would work for any list as input. (And you can play with map
, map_dbl
, map_dfc
, etc. to get various class as output).
That said, if you are staying in a data frame framework, staying within dplyr
makes sense.