Hi everyone,
I'm a master's student in biochemistry and currently struggling with an R exercise.
I'm a complete beginner in R and would just need an experienced eye to tell me what I'm doing wrong (since my teacher won’t give any feedback).
Here’s the question:
Load the gene expression dataset "dataset155.Rdata" and answer the following.
- A gene is considered "active" if its expression level is strictly above its theoretical upper tercile.
- A gene is considered "inactive" if its expression level is strictly below its theoretical lower tercile.
- Determine how many genes have a standard deviation ≤ 0.25.
- Among those genes, determine how many are active in at least 58 experiments.
- Among the genes identified in question 2, determine how many are inactive in fewer than 44 experiments.
And here’s my code:
Q1
sum(apply(dataset, 2, sd) <= 0.25)
Q2
donnéesok <- which(apply(dataset, 2, sd) <= 0.25)
donnéesok <- dataset[, donnéesok]
moyemp <- apply(donnéesok, 2, mean)
sdemp <- apply(donnéesok, 2, sd)
terciletheosup <- qnorm(2/3, mean = moyemp, sd = sdemp)
comparaison <- sweep(donnéesok, 2, terciletheosup, ">")
sum(apply(comparaison, 2, sum) >= 58)
Q3
indice <- apply(comparaison, 2, sum) >= 58
donnéesok <- donnéesok[, indice]
moyemp <- apply(donnéesok, 2, mean)
sdemp <- apply(donnéesok, 2, sd)
terciletheoinf <- qnorm(1/3, mean = moyemp, sd = sdemp)
comparaison <- sweep(donnéesok, 2, terciletheoinf, "<")
sum(apply(comparaison, 2, sum) < 44)
At least one of my answers is wrong, but I don’t know which step causes the divergence.
Thanks for any help !