Hi,
I have a dataframe (25000 rows * 4 columns
) containing string/gene symbols as rows and samples as columns. It seems like roughly 75% of the strings/gene symbols are unique (for instance, RFC2
, HSPA6
, PAX8
), however there are some rows with multi strings separated by a forward slash or a dot (for instance, DDR1.....MIR4640
, MIR5193.....UBA7
, LINC00152.....LOC101930489
). Is it possible to count total number of rows containing a multi string pattern in R?
Example of input dataset (see below)
dput(data.matrix_v2)
structure(list(GSM647547 = c(0.776, 1.916, 1.004, 1.2, 1.008,
0.805, 0.851, 1.082, 2.02, 1.03, 1.024, 1.043, 0.941, 1.215,
1.109, 1.138, 1.007, 1.244, 1.254, 0.995), GSM647552 = c(1.004,
1.741, 0.968, 1.276, 1.126, 1.772, 1.318, 1.067, 0.341, 0.88,
1.288, 0.958, 1.354, 1.939, 1.65, 1.738, 1.058, 0.827, 0.925,
1.122), GSM647553 = c(0.96, 1.4, 0.437, 1.19, 1.092, 0.872, 0.821,
1.042, 0.426, 0.949, 1.08, 0.92, 1.107, 1.543, 1.18, 1.053, 0.971,
0.663, 1.091, 1.146), GSM647565 = c(1.358, 1.207, 1.254, 1.068,
1.043, 0.757, 0.999, 1.254, 1.055, 1.025, 1.036, 1.383, 1.035,
1.174, 1.271, 0.958, 1.158, 1.571, 1.509, 1.026)), class = "data.frame", row.names = c("DDR1.....MIR4640",
"RFC2", "HSPA6", "PAX8", "GUCA1A", "MIR5193.....UBA7", "THRA",
"PTPN21", "CCL5", "CYP2E1", "EPHB3", "ESRRA", "CYP2A6", "SCARB1",
"TTLL12", "LINC00152.....LOC101930489", "WFDC2", "MAPK1", "MAPK1.1",
"ADAM32"))
Expected Output
Print total number of rows containing a multi string pattern
Thank you,
Toufiq