How to count and print features detected using R package?

Hi,

I am working with a R dataframe with rownames as features (Gene IDs) and samples as columns. I am interested in counting the numbers in each column with some cut-off. For instance as shown below with small data to consider counting value >= 1 across all columns. In addition to this cut-off, I want to count for below cut-off Then, print all the values passing cut-off values in one table. It will be of help to me plot a grouped barplot for comparison.
value >= 5
value >= 10
value >= 50.

Is there a way to do this via any data manipulation packages like dplyr or tidyverse. For now I can think of repeating the same steps for all cut-off values one by one. Can this be handled in simple steps?

Thank you,

Toufiq

library(tidyverse)

## Input data
dput(Data)
structure(list(S1 = c(0L, 0L, 0L, 0L, 0L, 0L, 11L, 15L, 19L, 
                      0L, 100L, 50L, 10L, 100L, 50L), S2 = c(0L, 0L, 2L, 3L, 4L, 0L, 
                                                             12L, 16L, 20L, 23L, 1000L, 50L, 10L, 50L, 50L), S3 = c(1L, 0L, 
                                                                                                                    9L, 0L, 0L, 0L, 13L, 17L, 21L, 0L, 100000L, 40L, 10L, 100000L, 
                                                                                                                    50L), S4 = c(1L, 0L, 9L, 0L, 0L, 0L, 14L, 18L, 22L, 0L, 22L, 
                                                                                                                                 60L, 10L, 0L, 100000L)), class = "data.frame", row.names = c("Gene_1", 
                                                                                                                                                                                              "Gene_2", "Gene_3", "Gene_4", "Gene_5", "Gene_6", "Gene_7", "Gene_8", 
                                                                                                                                                                                              "Gene_9", "Gene_10", "Gene_11", "Gene_12", "Gene_13", "Gene_14", 
                                                                                                                                                                                              "Gene_15"))
#>          S1   S2     S3     S4
#> Gene_1    0    0      1      1
#> Gene_2    0    0      0      0
#> Gene_3    0    2      9      9
#> Gene_4    0    3      0      0
#> Gene_5    0    4      0      0
#> Gene_6    0    0      0      0
#> Gene_7   11   12     13     14
#> Gene_8   15   16     17     18
#> Gene_9   19   20     21     22
#> Gene_10   0   23      0      0
#> Gene_11 100 1000 100000     22
#> Gene_12  50   50     40     60
#> Gene_13  10   10     10     10
#> Gene_14 100   50 100000      0
#> Gene_15  50   50     50 100000

## Count
Data.v1 <- 
  Data %>%
  gather(x, value, S1:S4) %>%
  group_by(x) %>%
  tally(value >= 1)

dput(Data.v1)
structure(list(x = c("S1", "S2", "S3", "S4"), n = c(8L, 12L, 
                                                    10L, 9L)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, 
                                                                                                                       -4L))
#> # A tibble: 4 × 2
#>   x         n
#>   <chr> <int>
#> 1 S1        8
#> 2 S2       12
#> 3 S3       10
#> 4 S4        9


Created on 2023-02-09 with reprex v2.0.2

Does this do what you want? It uses purrr. Also gather() has been superceded by pivot_longer() so I used that.

library(tidyverse)
greater_than_values <- c(1, 5, 10, 50)

# output in a list
map(set_names(greater_than_values),
    ~Data %>%
      pivot_longer(S1:S4, names_to = "x", values_to = "value") %>%
      group_by(x) %>%
      tally(value >= .x)# where the number gets passed in
    )

# in a bit table
map_df(set_names(greater_than_values),
    ~Data %>%
      pivot_longer(S1:S4, names_to = "x", values_to = "value") %>%
      group_by(x) %>%
      tally(value >= .x), # where the number gets passed in
    .id = "greater_than"
)

First output:

# $`1`
# # A tibble: 4 x 2
# x         n
# <chr> <int>
# 1 S1        8
# 2 S2       12
# 3 S3       10
# 4 S4        9
# 
# $`5`
# # A tibble: 4 x 2
# x         n
# <chr> <int>
# 1 S1        8
# 2 S2        9
# 3 S3        9
# 4 S4        8

# then the rest ......

Second output:

# # A tibble: 16 x 3
# greater_than x         n
# <chr>        <chr> <int>
# 1 1            S1        8
# 2 1            S2       12
# 3 1            S3       10
# 4 1            S4        9
# 5 5            S1        8
# 6 5            S2        9
# 7 5            S3        9
# 8 5            S4        8
# 9 10           S1        8
# 10 10           S2        9
# 11 10           S3        8
# 12 10           S4        7
# 13 50           S1        4
# 14 50           S2        4
# 15 50           S3        3
# 16 50           S4        2
1 Like

@williaml perfect, thank you very much. # in a bit table method works. This is what I was looking to get. This resolves my query.

Just wanted to point that the below code does not work:

# output in a list
map(set_names(greater_than_values),
    ~Data %>%
      pivot_longer(S1:S4, names_to = "x", values_to = "value") %>%
      group_by(x) %>%
      tally(value >= .x)# where the number gets passed in
    )
1 Like

Thanks. It works for me. What error do you get?

1 Like

@williaml I re-launched R Studio and ran the same code. It works now. Not sure why It wasn't running yesterday. Thank you.

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.