Leon
January 22, 2019, 8:12pm
1
Okay, so I'm looking for a tidy solution to the following:
Given a tibble d
:
library('tidyverse')
set.seed(733744)
n = 10
d = tibble(x1 = rnorm(n), x2 = rnorm(n),
y1 = rnorm(n), y2 = rnorm(n),
z1 = rnorm(n), z2 = rnorm(n))
I.e.
> d
# A tibble: 10 x 6
x1 x2 y1 y2 z1 z2
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1.40 -1.59 0.0458 -0.558 0.484 0.794
2 1.24 0.124 -0.0210 -1.57 -0.234 -2.30
3 -0.234 -1.93 0.804 0.845 -1.90 0.00116
4 0.549 1.12 -0.221 -0.421 0.169 1.11
5 0.633 -0.140 0.00652 -0.200 0.202 1.12
6 -0.257 0.963 -1.86 -0.208 0.237 0.544
7 0.283 -0.152 1.47 0.423 0.747 0.518
8 2.31 -1.46 -0.908 0.603 -0.506 0.850
9 -0.616 0.165 0.651 -0.0481 -1.05 -0.619
10 0.559 -1.18 0.878 -1.19 -1.91 1.02
I want to calculate the row-wise sd()
across columns x
and y
, but NOT z
. The following works, but is not tidy IMHO:
# Equivalent to rowMeans, but for sd
rowSds = function(x){ return(apply(x, 1, sd)) }
# Calculate sd across columns, where the column name contain x or y
d %>% mutate(sd_xy = d %>% select(matches("x|y")) %>% rowSds)
So I am looking for something like this, but without having to write out the variable names:
d %>% rowwise %>% mutate(sd_xy = c(x1, x2, y1, y2) %>% sd) %>% ungroup
There must be some NSE way to do this elegantly?
I do believe there is a better way than that...
d %>%
select(matches("x|y")) %>%
rowwise %>%
do(data.frame(., sd_xy = sd(unlist(.))))
It appears easier to use apply here (at least for me)!
Leon
January 23, 2019, 7:41am
3
Thanks for input, but still not really that tidy
1 Like
too bad! I'm probably not enough familiar with tidy meaning
last attempt based on that
d %>% mutate(sd_xy = pmap_dbl(select(., matches("x|y")), lift_vd(sd)))
thank you for the question, I'm discovering some interesting stuff
4 Likes
So here is a method that I would consider "tidy" in terms of it being all of tidyverse
functions, but it is certainly more verbose than your current solution. However, if you define it in a function as shown then it is not too bad, IMO:
library('tidyverse')
set.seed(733744)
n = 10
d = tibble(x1 = rnorm(n), x2 = rnorm(n),
y1 = rnorm(n), y2 = rnorm(n),
z1 = rnorm(n), z2 = rnorm(n))
rowwise_sd <- function(data, ...){
my_cols <- enquos(...)
data %>%
group_by(row = row_number()) %>%
nest() %>%
mutate(row_sd = map(data, ~{
.x %>%
dplyr::select(!!!my_cols) %>%
gather(key = col, value = value) %>%
summarize(sd = sd(value))
})) %>%
unnest() %>%
select(-row)
}
d %>% rowwise_sd(x1, x2, y1, y2)
#> # A tibble: 10 x 7
#> x1 x2 y1 y2 z1 z2 sd
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 1.40 -1.59 0.0458 -0.558 0.484 0.794 1.25
#> 2 1.24 0.124 -0.0210 -1.57 -0.234 -2.30 1.16
#> 3 -0.234 -1.93 0.804 0.845 -1.90 0.00116 1.30
#> 4 0.549 1.12 -0.221 -0.421 0.169 1.11 0.710
#> 5 0.633 -0.140 0.00652 -0.200 0.202 1.12 0.382
#> 6 -0.257 0.963 -1.86 -0.208 0.237 0.544 1.16
#> 7 0.283 -0.152 1.47 0.423 0.747 0.518 0.688
#> 8 2.31 -1.46 -0.908 0.603 -0.506 0.850 1.69
#> 9 -0.616 0.165 0.651 -0.0481 -1.05 -0.619 0.525
#> 10 0.559 -1.18 0.878 -1.19 -1.91 1.02 1.11
d %>% rowwise_sd(matches("x|y"))
#> # A tibble: 10 x 7
#> x1 x2 y1 y2 z1 z2 sd
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 1.40 -1.59 0.0458 -0.558 0.484 0.794 1.25
#> 2 1.24 0.124 -0.0210 -1.57 -0.234 -2.30 1.16
#> 3 -0.234 -1.93 0.804 0.845 -1.90 0.00116 1.30
#> 4 0.549 1.12 -0.221 -0.421 0.169 1.11 0.710
#> 5 0.633 -0.140 0.00652 -0.200 0.202 1.12 0.382
#> 6 -0.257 0.963 -1.86 -0.208 0.237 0.544 1.16
#> 7 0.283 -0.152 1.47 0.423 0.747 0.518 0.688
#> 8 2.31 -1.46 -0.908 0.603 -0.506 0.850 1.69
#> 9 -0.616 0.165 0.651 -0.0481 -1.05 -0.619 0.525
#> 10 0.559 -1.18 0.878 -1.19 -1.91 1.02 1.11
Created on 2019-01-23 by the reprex package (v0.2.0).
4 Likes
Leon
January 23, 2019, 6:40pm
6
This... This is why this is an awesome forum! Thanks for the discussion @lemairev and @tbradley
I'm going with @lemairev 's suggestion as a solution, which I definitely think is tidy - Excellent!
1 Like
Leon
Closed
January 30, 2019, 6:40pm
7
This topic was automatically closed 7 days after the last reply. New replies are no longer allowed. If you have a query related to it or one of the replies, start a new topic and refer back with a link.