how can get table combations of variables

I have a data frame that want to get the combination of rows that are 1 . for example in row one i have g and in row two have d, f . then i want a table of all combinations that variables. for example we have 1 of combation d,f and so on.

       a     b     c     d     e     m     f     g     h     i
   
 1     0     0     0     0     0     0     0     1     0     0
 2     0     0     0     1     0     0     1     0     0     0
 3     0     0     0     1     0     0     1     0     0     0
 4     0     1     0     1     1     1     1     1     0     0
 5     0     1     0     1     1     1     1     1     0     0
 6     0     1     0     1     1     1     1     1     0     0
 7     0     1     0     1     1     1     1     1     0     0
 8     0     1     0     1     1     1     1     1     0     0
 9     0     1     0     1     1     1     1     1     0     0
10     1     0     0     0     0     1     0     1     0     0 ```

I'm afraid your requirements havent been made clear.
The simplest interpretation is that you want to remove duplicate records, so that what remains are the unique combinations that have been observer. that can be done with dplyr::distinct() If you want to get a count of how many rows are represented by each combination, that would be count() with the grouping options.

no i want only get combation of columns that is 1. for example we have 1 of combation d,f in row 2.

Do you mean this?

DF <- data.frame(a = c(0,0,1,0), b = c(0,0,0,1), c = c(1,0,0,1), d = c(1,1, 0, 1))
DF
#>   a b c d
#> 1 0 0 1 1
#> 2 0 0 0 1
#> 3 1 0 0 0
#> 4 0 1 1 1
library(tidyr)
library(dplyr)

DFlng <- DF %>% mutate(RowNum = row_number()) %>% #add a column of row numbers
  pivot_longer(a:d ) %>% #put the table in long form
  filter(value == 1) #select rows with 1
DFlng
#> # A tibble: 7 x 3
#>   RowNum name  value
#>    <int> <chr> <dbl>
#> 1      1 c         1
#> 2      1 d         1
#> 3      2 d         1
#> 4      3 a         1
#> 5      4 b         1
#> 6      4 c         1
#> 7      4 d         1

Created on 2020-02-17 by the reprex package (v0.3.0)

2 Likes

thanks
in this tibble RowNum is c,d and i want to get frequency of this combination

library(tidyverse)
DF <- tibble::tribble(
                              ~a, ~b, ~c, ~d, ~e,
                               0,  0,  1,  1,  1,
                               0,  0,  0,  1,  0,
                               1,  0,  0,  0,  0,
                               0,  1,  1,  1,  1,
                               1,  0,  0,  0,  0,
                               0,  0,  1,  1,  1,
                               0,  0,  1,  1,  1,
                              )

DF %>% group_by_all %>% count()
# A tibble: 4 x 6
# Groups:   a, b, c, d, e [4]
      a     b     c     d     e     n
  <dbl> <dbl> <dbl> <dbl> <dbl> <int>
1     0     0     0     1     0     1
2     0     0     1     1     1     3
3     0     1     1     1     1     1
4     1     0     0     0     0     2

?

no , for example in your tibble instate of row 2 that we have c,d,e and freqency in tibble is one and row 3 b,c,d,e is one frequncy. we want to get all frequency on combation letters

the first entry in the initial tibble is 00111 i.e. cde - the frequency is not 1, it is 3 and that can be seen on my result table in the 3rd row where the pattern is 00111, n =3

no my opinion is same this tibble

      a     b     c     d     e     n    letter
  <dbl> <dbl> <dbl> <dbl> <dbl> <int>   <chr>
1     0     0     0     1     0     1    d
2     0     0     1     1     1     3    c,d,e
3     0     1     1     1     1     1    b,c,d,e
4     1     0     0     0     0     2    a

and all of this combition of letters is once repeat in this tibble

library(tidyverse)

DF <- tibble::tribble(
  ~a, ~b, ~c, ~d, ~e,
  0, 0, 1, 1, 1,
  0, 0, 0, 1, 0,
  1, 0, 0, 0, 0,
  0, 1, 1, 1, 1,
  1, 0, 0, 0, 0,
  0, 0, 1, 1, 1,
  0, 0, 1, 1, 1,
)

h <- names(DF)

 #stringrep1 proof of concept specifying a,b,c etc manually
conv <- function(x) {
  ifelse(DF[[x]], x, "")
}
DF$stringrep1 <- paste0(
  conv("a"),
  conv("b"),
  conv("c"),
  conv("d"),
  conv("e")
)


convx <- function(x) {
  res <- list()
  for (y in x)
  {
    res[[y]] <- paste0("conv('", y, "')")
  }
  res
}

##stringrep2 passing header names only 
DF$stringrep2 <- eval(parse(text = paste0(
  "paste0(",
  paste0(
    convx(h),
    collapse = ","
  ),
  ")"
)))

DF2 <- DF %>%
  group_by_all() %>%
  count(name = "frequency_of_combination") 

> DF2
# A tibble: 4 x 8
# Groups:   a, b, c, d, e, stringrep1, stringrep2 [4]
a     b     c     d     e stringrep1 stringrep2 frequency_of_combination
<dbl> <dbl> <dbl> <dbl> <dbl> <chr>      <chr>                         <int>
1     0     0     0     1     0 d          d                                 1
2     0     0     1     1     1 cde        cde                               3
3     0     1     1     1     1 bcde       bcde                              1
4     1     0     0     0     0 a          a                                 2
1 Like

its true for combination of letters but frequency is mistake , because in 4 row we have 4 combination of letters, in 1rd row we have d , 2rd row cde, 3rd row bcde, 4rd row a, that is 1 frequency of this combination in rows.

So, are you asking for the string length? That 'cde' - 3

Hi @saso_008. You can do it like this.

library(tidyverse)

df <- sample(0:3, 100, replace = TRUE) %>%
  matrix(ncol = 10) %>%
  as.data.frame() %>%
  `colnames<-`(letters[1:10])

df
#>    a b c d e f g h i j
#> 1  3 1 2 1 3 3 0 2 0 2
#> 2  3 1 0 3 0 0 1 3 1 3
#> 3  0 1 3 3 2 0 1 1 0 1
#> 4  0 1 0 1 2 2 0 1 2 2
#> 5  1 1 3 2 0 1 2 3 1 2
#> 6  2 0 1 2 3 3 2 1 0 1
#> 7  0 1 2 0 0 2 3 1 3 3
#> 8  2 0 2 3 1 1 0 0 0 2
#> 9  2 3 3 3 1 3 1 3 0 0
#> 10 1 1 1 2 0 0 1 0 0 1

df %>%
  rownames_to_column() %>%
  gather(col, value, -rowname) %>%
  filter(value == 1) %>%
  group_by(rowname) %>%
  summarise(letter = paste(col, collapse = ", ")) %>%
  arrange(as.numeric(rowname))
#> # A tibble: 10 x 2
#>    rowname letter       
#>    <chr>   <chr>        
#>  1 1       b, d         
#>  2 2       b, g, i      
#>  3 3       b, g, h, j   
#>  4 4       b, d, h      
#>  5 5       a, b, f, i   
#>  6 6       c, h, j      
#>  7 7       b, h         
#>  8 8       e, f         
#>  9 9       e, g         
#> 10 10      a, b, c, g, j

Created on 2020-02-18 by the reprex package (v0.3.0)

2 Likes

elegant solution, thanks !

Thanks @rsytong, but i can't get combination of letters that repeat.

@saso_008. I can't get what you mean. Can you give some sample data and the expected result?

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.