Hi all,
I am trying to refurbish some old code (written prior tidyverse).
The goal is to create pairwise combinations of several yeast strains, starting from a character vector with the 8 strain names. I have used expand.grid()
to generate a matrix with all 64 (8 by 8) possible pairwise combination of strains on each row.
Ideally, I want to annotate if a pair maybe redundant (e.g. A-B and B-A) so, for each row, I sort lexically the strains names and check the rows that are duplicated.
Since I am using apply()
on MARGIN=1 the resulting matrix is transposed compared to the original one. That is why I have to back-transpose it before looking at duplicated pairs.
I was wondering if it could be done more simply using tidyverse syntax.
So far I could not find a better way to do it than the code below.
library(tidyverse)
strains <- c("AMH", "BAN", "BED", "BPL", "BTT", "CMP", "CPI", "CQC")
# make all pairs of strains
p_strains <- expand.grid(s1 = strains, s2 = strains) %>% as_tibble()
# find redundant pairs (e.g. A-B and B-A)
is_pair_dup <- apply(p_strains, 1, sort) %>% # sort strains alphabetically across columns
t() %>% duplicated() # find pairs duplicated across row
# annotate unique pair of strains and "self pair" (i.e. pair composed of the same strain)
p_strains <- p_strains %>%
mutate(is_identical = s1 == s2, is_duplicated = is_pair_dup, )
# Tidyverse version with chaining
tidy_strains <- expand.grid(s1 = strains, s2 = strains) %>%
as_tibble() %>%
mutate(
is_identical = s1 == s2,
# QUESTION: can i do the following in a more straightforward way (e.g. with c_across )?
is_duplicated = apply(., 1, sort) %>% t() %>% duplicated()
)
# checking that both methods return identical tibble
identical(p_strains, tidy_strains)
#> [1] TRUE
Created on 2022-10-03 with reprex v2.0.2