Leon
May 10, 2018, 7:12am
1
A bit tricky - Input needed
Given a tibble my_dat
created like so:
# Simple function for generating data
mk_dat = function(m, d){
d %>% rnorm %>% list %>% rep(m) %>% return
}
# Dimensions of example data
n_row = 5
n_col = 3
depth = 10
# Create tibble example
my_dat = tibble(1:n_row) %>% select
for( var_name in letters[1:n_col] ){
my_dat = my_dat %>% mutate(!!var_name := mk_dat(m = n_row, d = depth))
}
I.e. basically a tensor disguised as a tibble:
> my_dat
# A tibble: 5 x 3
a b c
<list> <list> <list>
1 <dbl [10]> <dbl [10]> <dbl [10]>
2 <dbl [10]> <dbl [10]> <dbl [10]>
3 <dbl [10]> <dbl [10]> <dbl [10]>
4 <dbl [10]> <dbl [10]> <dbl [10]>
5 <dbl [10]> <dbl [10]> <dbl [10]>
I want to create a new variable, which is the rowwise and elementwise mean, such that the first element of the first list in the new variable will be something like:
mean(c(my_dat$a[[1]][1], my_dat$b[[1]][1], my_dat$c[[1]][1]))
and the second element in the first list will be:
mean(c(my_dat$a[[1]][2], my_dat$b[[1]][2], my_dat$c[[1]][2]))
etc...
I could code something complicated, but am interested in input to a more clever solution?
Leon
May 10, 2018, 10:23am
2
I.e. more clever as in more tidy - Perhaps using map()
somehow?
Leon
May 10, 2018, 2:31pm
3
Ok, so suffering from a complete lack of patience, I think I solved it, though it's not too pretty:
# Create dummy test data
test_dat = 1:150 %>% split(sort(. %% 15)) %>% unname
# Define tibble with dummy data
my_dat = tibble(a = test_dat[1:5],
b = test_dat[6:10],
c = test_dat[11:15])
# Do magic
my_dat_aug = my_dat %>%
mutate(d = my_dat %>% select(a, b, c) %>% map(unlist) %>% bind_cols %>%
rowMeans %>% split(sort(. %% nrow(my_dat))) %>% unname)
...and check if it worked:
> test_dat %>% head
[[1]]
[1] 1 2 3 4 5 6 7 8 9 10
[[2]]
[1] 11 12 13 14 15 16 17 18 19 20
[[3]]
[1] 21 22 23 24 25 26 27 28 29 30
[[4]]
[1] 31 32 33 34 35 36 37 38 39 40
[[5]]
[1] 41 42 43 44 45 46 47 48 49 50
[[6]]
[1] 51 52 53 54 55 56 57 58 59 60
> my_dat
# A tibble: 5 x 3
a b c
<list> <list> <list>
1 <int [10]> <int [10]> <int [10]>
2 <int [10]> <int [10]> <int [10]>
3 <int [10]> <int [10]> <int [10]>
4 <int [10]> <int [10]> <int [10]>
5 <int [10]> <int [10]> <int [10]>
> my_dat_aug
# A tibble: 5 x 4
a b c d
<list> <list> <list> <list>
1 <int [10]> <int [10]> <int [10]> <dbl [10]>
2 <int [10]> <int [10]> <int [10]> <dbl [10]>
3 <int [10]> <int [10]> <int [10]> <dbl [10]>
4 <int [10]> <int [10]> <int [10]> <dbl [10]>
5 <int [10]> <int [10]> <int [10]> <dbl [10]>
> my_dat_aug %>% pull(d)
[[1]]
[1] 51 52 53 54 55 56 57 58 59 60
[[2]]
[1] 61 62 63 64 65 66 67 68 69 70
[[3]]
[1] 71 72 73 74 75 76 77 78 79 80
[[4]]
[1] 81 82 83 84 85 86 87 88 89 90
[[5]]
[1] 91 92 93 94 95 96 97 98 99 100
Does that do what you want?
library(tidyverse)
# Simple function for generating data
mk_dat = function(m, d){
d %>% rnorm %>% list %>% rep(m) %>% return
}
# Dimensions of example data
n_row = 5
n_col = 3
depth = 10
# Create tibble example
my_dat = tibble(1:n_row) %>% select
for( var_name in letters[1:n_col] ){
my_dat = my_dat %>% mutate(!!var_name := mk_dat(m = n_row, d = depth))
}
my_dat %>%
dplyr::mutate(means = purrr::pmap(., function(a, b, c){
purrr::pmap_dbl(list(a, b, c), ~mean(..1, ..2, ..3))
}))
#> # A tibble: 5 x 4
#> a b c means
#> <list> <list> <list> <list>
#> 1 <dbl [10]> <dbl [10]> <dbl [10]> <dbl [10]>
#> 2 <dbl [10]> <dbl [10]> <dbl [10]> <dbl [10]>
#> 3 <dbl [10]> <dbl [10]> <dbl [10]> <dbl [10]>
#> 4 <dbl [10]> <dbl [10]> <dbl [10]> <dbl [10]>
#> 5 <dbl [10]> <dbl [10]> <dbl [10]> <dbl [10]>
Created on 2018-05-10 by the reprex package (v0.2.0).
I'm not sure if that's the best idea to do it this way. rowMeans
supports more than 2 dimensions, so you should probably try to do it that way, unless you absolutely have to stay with tibbles.
2 Likes
Leon
May 10, 2018, 6:43pm
5
Unfortunately not, but I have converged on this, which will:
map_means = function(x, na_rm = TRUE){
if( !is_tibble(x) ){ stop("tibble expected") }
n_col = length(x[[1]][[1]])
x %>% unnest %>% rowMeans(na.rm = na_rm) %>%
matrix(ncol = n_col, byrow = TRUE) %>% apply(1, list) %>% map(1) %>%
return
}