Im working with scRNAseq data, I have a data matrix where the rows are gene names and the columns are cells; the matrix cells themselves are read counts for a given gene in each cell.
I have a bunch of gene names that begin with 392XXXX and I want to label all the cells that have at least 1 count in at least one of the 392XXXX genes.
Lets say the data frame name is data.frame, I know it should be something like this:
if data.frame["392*", ] > 0
then data.frame$Label = True
You can do much better: R has a function called rowSums(), which computes the sum of all values in each row. So you just want the rows whose sum is >0.
There is a catch: if you are working with scRNA-Seq data, your counts are likely stored in a sparse matrix (so that the zeros don't take any memory). To work with that format, you will need to load the Matrix package (with a capital M).
library(Matrix)
# generate example data
set.seed(2)
counts <- Matrix(matrix(rbinom(12,5,.1),
nrow = 4,
dimnames = list(letters[1:4],LETTERS[1:3])))
counts
#> 4 x 3 sparse Matrix of class "dgCMatrix"
#> A B C
#> a . 2 .
#> b 1 2 .
#> c . . .
#> d . 1 .
# see the sums
rowSums(counts)
#> a b c d
#> 2 3 0 1
# subset matrix
counts[rowSums(counts) > 0 ,]
#> 3 x 3 sparse Matrix of class "dgCMatrix"
#> A B C
#> a . 2 .
#> b 1 2 .
#> d . 1 .
# list cell names
rownames(counts)[rowSums(counts) > 0]
#> [1] "a" "b" "d"