FCL
October 23, 2021, 6:59pm
1
Hello,
I have a data frame as in the following example. I wish to calculate the differences in the column values between observations/ rows (all combinations).
my_df <- tibble(a=runif(5), b=runif(5), c=runif(5))
> my_df
# A tibble: 5 x 3
a b c
<dbl> <dbl> <dbl>
1 0.0513 0.267 0.846
2 0.614 0.683 0.937
3 0.230 0.700 0.0651
4 0.671 0.110 0.901
5 0.424 0.520 0.817
I have tried the code below which gives me only the difference between subsequent rows; I want to have all combinations: row2 - row1; row3 - row1; row4 - row1, row5- row1, row3 - row2, row4 - row2, and so on...
Also, the code I wrote does not seem the best to me (!), although it outputs the result I wish, but not for all possible combinations!
my_diff <- as.data.frame(diff(as.matrix(my_df)))
> my_diff
a b c
1 0.5623574 0.41522579 0.09165630
2 -0.3837289 0.01755953 -0.87209740
3 0.4407068 -0.58982681 0.83540813
4 -0.2463205 0.40943495 -0.08358985
I appreciate if someone could provide help in solving my question, if possible a using tidy verse options.
Thanks.
Hello,
Here is one of many possible solutions. This seems to be relatively easy with the combination of combn and apply.
library(tidyverse)
test <- tibble(a=runif(5), b=runif(5), c=runif(5)) %>% as.data.frame()
test
#> a b c
#> 1 0.7360352 0.9480512 0.5935539
#> 2 0.7244965 0.7372150 0.5043489
#> 3 0.4677090 0.8161523 0.3011784
#> 4 0.3436353 0.6341290 0.9938101
#> 5 0.4223668 0.2589678 0.9968452
cols <- combn(names(test), 2, paste, collapse = "-")
cols <- c(cols, sub("(.)-(.)", "\\2-\\1", cols))
test[cols] <- t(apply(test, 1, function(x) {
out <- combn(x, 2, function(x) x[1] - x[2])
c(out, -out)
}))
test
#> a b c a-b a-c b-c b-a
#> 1 0.7360352 0.9480512 0.5935539 -0.21201602 0.1424813 0.3544974 0.21201602
#> 2 0.7244965 0.7372150 0.5043489 -0.01271844 0.2201477 0.2328661 0.01271844
#> 3 0.4677090 0.8161523 0.3011784 -0.34844325 0.1665306 0.5149739 0.34844325
#> 4 0.3436353 0.6341290 0.9938101 -0.29049372 -0.6501748 -0.3596811 0.29049372
#> 5 0.4223668 0.2589678 0.9968452 0.16339892 -0.5744784 -0.7378773 -0.16339892
#> c-a c-b
#> 1 -0.1424813 -0.3544974
#> 2 -0.2201477 -0.2328661
#> 3 -0.1665306 -0.5149739
#> 4 0.6501748 0.3596811
#> 5 0.5744784 0.7378773
Created on 2021-10-23 by the reprex package (v2.0.0)
My apologies. I did not see you want row combinations and someone was nice enough to point it out for me.
I am taking your original dataframe and transposing it as you will see with test_2
from there we have 5 rows and I name them 1 to 5. And then from there I apply the same steps. Let me know if this works for you.
library(tidyverse)
test <- tibble(a=runif(5), b=runif(5), c=runif(5)) %>% as.data.frame()
test
#> a b c
#> 1 0.29827961 0.2847710 0.49713224
#> 2 0.01627554 0.6714382 0.03150204
#> 3 0.77252409 0.9607009 0.37498908
#> 4 0.87203985 0.6141838 0.39510871
#> 5 0.17926325 0.5988435 0.38927007
test_2 <- test %>% t() %>% as.data.frame()
colnames(test_2) <- paste(c(1:length(letters[1:ncol(test_2)])), sep="")
cols <- combn(names(test_2), 2, paste, collapse = "-")
cols <- c(cols, sub("(.)-(.)", "\\2-\\1", cols))
test_2[cols] <- t(apply(test_2, 1, function(x) {
out <- combn(x, 2, function(x) x[1] - x[2])
c(out, -out)
}))
test_2
#> 1 2 3 4 5 1-2 1-3
#> a 0.2982796 0.01627554 0.7725241 0.8720398 0.1792633 0.2820041 -0.4742445
#> b 0.2847710 0.67143820 0.9607009 0.6141838 0.5988435 -0.3866672 -0.6759299
#> c 0.4971322 0.03150204 0.3749891 0.3951087 0.3892701 0.4656302 0.1221432
#> 1-4 1-5 2-3 2-4 2-5 3-4
#> a -0.5737602 0.1190164 -0.7562485 -0.85576431 -0.16298771 -0.09951576
#> b -0.3294128 -0.3140724 -0.2892628 0.05725439 0.07259473 0.34651714
#> c 0.1020235 0.1078622 -0.3434870 -0.36360667 -0.35776803 -0.02011963
#> 3-5 4-5 2-1 3-1 4-1 5-1 3-2
#> a 0.59326084 0.692776596 -0.2820041 0.4742445 0.5737602 -0.1190164 0.7562485
#> b 0.36185749 0.015340342 0.3866672 0.6759299 0.3294128 0.3140724 0.2892628
#> c -0.01428098 0.005838646 -0.4656302 -0.1221432 -0.1020235 -0.1078622 0.3434870
#> 4-2 5-2 4-3 5-3 5-4
#> a 0.85576431 0.16298771 0.09951576 -0.59326084 -0.692776596
#> b -0.05725439 -0.07259473 -0.34651714 -0.36185749 -0.015340342
#> c 0.36360667 0.35776803 0.02011963 0.01428098 -0.005838646
Created on 2021-10-23 by the reprex package (v2.0.0)
1 Like
FCL
October 23, 2021, 8:28pm
4
Thank you @GreyMerchant . I understood your approach but still must digest some parts of your code.
As I understand it, according to your suggestion, I will still need transform test_2
back to the format as in test
so that I can further work on it. How could I do that?
In fact I hoped a more tidy/ straight forward approach existed, but I really appreciate your help
FCL
October 23, 2021, 9:11pm
5
@GreyMerchant I applied your proposed solution and it is working for me. Transposing back was quite straight forward.
But still would appreciate suggestions for a more tidy approach in R!
FCL
October 23, 2021, 10:03pm
6
Someone kindly proposed an answer I find satisfying and also eliminates duplicates.
Still wonder if there exists something even more tidy!
Thanks.
my_df <- tibble(a=runif(5), b=runif(5), c=runif(5))
# Generating the sequence to calculate the combinations
seq1 <- seq(1,nrow(my_df))
seq2 <- seq1
# Generating the Combinations
Combinations <- expand.grid(seq1, seq2)
# Removing the dupilicate Combinations
Combinations <- Combinations[which(Combinations$Var2 < Combinations$Var1),]
# Performing the subtraction
result <- my_df[Combinations$Var1,] - my_df[Combinations$Var2,]
FCL
October 23, 2021, 10:32pm
7
Another approach.
A step by step approach:
library(tidyverse)
set.seed(1)
my_df <- tibble(a=runif(5), b=runif(5), c=runif(5)) %>%
mutate(ID = row_number())
all_combs <- combn(1:nrow(my_df), 2)
my_df_1 <- my_df %>%
slice(all_combs[1,])
my_df_2 <- my_df %>%
slice(all_combs[2,])
combined <- data.frame(comparison = paste0(my_df_1$ID, "-", my_df_2$ID),
difference = my_df_1[,1:3] - my_df_2[,1:3])
which gives:
comparison difference.a difference.b difference.c
1 1-2 -0.10661524 -0.04628558 0.02941782
2 1-3 -0.30734470 0.23759189 -0.48104827
3 1-4 -0.64269913 0.26927564 -0.17812914
4 1-5 0.06382673 0.83660341 -0.56386685
5 2-3 -0.20072946 0.28387748 -0.51046609
6 2-4 -0.53608389 0.31556122 -0.20754697
7 2-5 0.17044197 0.88288900 -0.59328467
8 3-4 -0.33535443 0.03168375 0.30291913
9 3-5 0.37117143 0.59901152 -0.08281857
10 4-5 0.70652586 0.56732777 -0.38573770
FCL
October 23, 2021, 11:48pm
8
UPDATE from a kind user:
library(tidyverse)
set.seed(1)
my_df <- tibble(a=runif(5), b=runif(5), c=runif(5))
gives:
# A tibble: 5 x 3
a b c
<dbl> <dbl> <dbl>
1 0.266 0.898 0.206
2 0.372 0.945 0.177
3 0.573 0.661 0.687
4 0.908 0.629 0.384
5 0.202 0.0618 0.770
And from there:
my_df %>%
mutate(ID = row_number()) %>%
slice(as.numeric(t(combn(1:nrow(.), 2)))) %>%
mutate(group = rep(1:(n()/2), 2)) %>%
group_by(group) %>%
summarize(comparison = paste0(ID[2], "-", ID[1]),
across(c(a, b, c), ~ .[2] - .[1])) %>%
select(-group)
which gives:
# A tibble: 10 x 4
comparison a b c
<chr> <dbl> <dbl> <dbl>
1 2-1 0.107 0.0463 -0.0294
2 3-1 0.307 -0.238 0.481
3 4-1 0.643 -0.269 0.178
4 5-1 -0.0638 -0.837 0.564
5 3-2 0.201 -0.284 0.510
6 4-2 0.536 -0.316 0.208
7 5-2 -0.170 -0.883 0.593
8 4-3 0.335 -0.0317 -0.303
9 5-3 -0.371 -0.599 0.0828
10 5-4 -0.707 -0.567 0.386
1 Like
This definitely seems like the cleanest way I like it.
system
Closed
October 31, 2021, 10:10am
10
This topic was automatically closed 7 days after the last reply. New replies are no longer allowed. If you have a query related to it or one of the replies, start a new topic and refer back with a link.