How to label metadata of data frame based on given condition

Im working with scRNAseq data, I have a data matrix where the rows are gene names and the columns are cells; the matrix cells themselves are read counts for a given gene in each cell.

I have a bunch of gene names that begin with 392XXXX and I want to label all the cells that have at least 1 count in at least one of the 392XXXX genes.

Lets say the data frame name is data.frame, I know it should be something like this:

if data.frame["392*", ] > 0
then data.frame$Label = True

Something like this, but I don't know the syntax?

Any help greatly appreciated!

Thanks

You can do much better: R has a function called rowSums(), which computes the sum of all values in each row. So you just want the rows whose sum is >0.

There is a catch: if you are working with scRNA-Seq data, your counts are likely stored in a sparse matrix (so that the zeros don't take any memory). To work with that format, you will need to load the Matrix package (with a capital M).

library(Matrix)

# generate example data
set.seed(2)
counts <- Matrix(matrix(rbinom(12,5,.1),
                        nrow = 4,
                        dimnames = list(letters[1:4],LETTERS[1:3])))

counts
#> 4 x 3 sparse Matrix of class "dgCMatrix"
#>   A B C
#> a . 2 .
#> b 1 2 .
#> c . . .
#> d . 1 .

# see the sums
rowSums(counts)
#> a b c d 
#> 2 3 0 1

# subset matrix
counts[rowSums(counts) > 0 ,]
#> 3 x 3 sparse Matrix of class "dgCMatrix"
#>   A B C
#> a . 2 .
#> b 1 2 .
#> d . 1 .

# list cell names
rownames(counts)[rowSums(counts) > 0]
#> [1] "a" "b" "d"

Created on 2022-03-19 by the reprex package (v2.0.1)

Note that if you're using {Seurat}, you may prefer to use the subset() function.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.