Hello, I´m quite new in analysing data with R and thankful for any advice.
Currently I try to get rid of outliers in my already normalized dataset.
Having a dataset of 17 genes in nine samples with treatment and without (=control) with at least four measurements per combination, leads me to checking for outliers using Q1, Q3 and IQR.
Data should be taken as outliers in case there are > Q1-(1.5*IQR) or < Q3 +(1.5*IQR) with IQR (Interquartile Range) calculated in R for each measurement group (with each having 4 measurements).
As output I would like to have the original dataframe excluding outliers (show them as empty measurement best or as NA).
My main problem is now, to do so for all sample-gene-treatment combination automatically (otherwise there will be 306 calculation steps; 17 genes * 2 treatments * 9 samples).
To simplify I reduced my dataframe, including only two genes with and without treatment for all samples.
At the moment my dataframe each time reduce itself not only for the identified outliers per sample, instead reduce in each sample and gene (Please have a look at the code). Unfortunately I´m not able to find the mistake.
Thanks a lot!
CODE:
#ddCT_Gen = reduced dataframe including only 2 genes (Gen1 and Gen2)
#Treatment = without or with treatment
#Ko = Control sample
#M1-M8 = sample measurements
#so far n = 4ddCT_Gen
Gen Treatment Ko M1 M2 M3 M4 M5 M6 M7 M8
1 Gen1 Gen1_without 1 1.8 0.8 1.1 0.9 0.9 0.9 0.8 1.1
2 Gen1 Gen1_without 1 1.4 1.0 2.2 1.2 1.5 1.5 1.3 1.5
3 Gen1 Gen1_without 1 1.3 0.8 0.8 0.8 1.0 0.8 0.8 1.1
4 Gen1 Gen1_without 1 1.3 2.0 1.1 1.4 1.2 1.3 1.2 0.5
5 Gen1 Gen1_with 1 1.0 0.8 1.0 0.7 0.4 0.8 0.9 0.9
6 Gen1 Gen1_with 1 1.1 0.8 1.0 0.8 0.8 0.8 0.6 0.6
7 Gen1 Gen1_with 1 1.3 1.8 0.9 1.5 1.5 0.6 0.8 0.7
8 Gen1 Gen1_with 1 1.0 1.2 1.0 0.8 0.9 0.5 1.2 0.6
9 Gen2 Gen2_without 1 1.5 0.6 1.1 0.6 0.7 0.7 0.7 0.9
10 Gen2 Gen2_without 1 2.0 2.6 2.2 2.3 2.9 1.8 2.8 2.3
11 Gen2 Gen2_without 1 1.9 1.4 1.2 1.0 1.0 0.9 1.3 1.3
12 Gen2 Gen2_without 1 1.5 2.3 1.4 2.4 2.3 1.7 1.6 1.9
13 Gen2 Gen2_with 1 1.0 1.0 0.7 0.7 0.4 0.8 1.0 1.1
14 Gen2 Gen2_with 1 0.8 0.7 0.8 0.7 0.7 0.6 0.6 0.6
15 Gen2 Gen2_with 1 1.0 0.9 0.9 1.0 0.9 0.8 0.9 0.6
16 Gen2 Gen2_with 1 1.1 0.9 1.0 1.0 1.0 0.8 0.8 0.8
#calculate Q1, Q3, IQR to identify outliers in M1 for Gen1 without treatment
Gen1_without <- select(filter(ddCT_Gen, Treatment == "Gen1_without"), c("M1", "M2", "M3", "M4", "M5", "M6", "M7", "M8"))
Q1_Gen1_without_M1 <- quantile(Gen1_without$M1, 0.25) Q3_Gen1_without_M1 <- quantile(Gen1_without$M1, 0.75) IQR_Gen1_without_M1 <- IQR(Gen1_without$M1)
#identify outliers in M1 for Gen1 without treatment
no_outliers_Gen1_without_M1 <- subset(ddCT_Gen, Gen1_without$M1 > (Q1_Gen1_without_M1 - 1.5(IQR_Gen1_without_M1)) & Gen1_without$M1< (Q3_Gen1_without_M1 + 1.5(IQR_Gen1_without_M1)))
no_outliers_Gen1_without_M1
Gen Treatment Ko M1 M2 M3 M4 M5 M6 M7 M8
2 Gen1 Gen1_without 1 1.4 1.0 2.2 1.2 1.5 1.5 1.3 1.5
3 Gen1 Gen1_without 1 1.3 0.8 0.8 0.8 1.0 0.8 0.8 1.1
4 Gen1 Gen1_without 1 1.3 2.0 1.1 1.4 1.2 1.3 1.2 0.5
6 Gen1 Gen1_with 1 1.1 0.8 1.0 0.8 0.8 0.8 0.6 0.6
7 Gen1 Gen1_with 1 1.3 1.8 0.9 1.5 1.5 0.6 0.8 0.7
8 Gen1 Gen1_with 1 1.0 1.2 1.0 0.8 0.9 0.5 1.2 0.6
10 Gen2 Gen2_without 1 2.0 2.6 2.2 2.3 2.9 1.8 2.8 2.3
11 Gen2 Gen2_without 1 1.9 1.4 1.2 1.0 1.0 0.9 1.3 1.3
12 Gen2 Gen2_without 1 1.5 2.3 1.4 2.4 2.3 1.7 1.6 1.9
14 Gen2 Gen2_with 1 0.8 0.7 0.8 0.7 0.7 0.6 0.6 0.6
15 Gen2 Gen2_with 1 1.0 0.9 0.9 1.0 0.9 0.8 0.9 0.6
16 Gen2 Gen2_with 1 1.1 0.9 1.0 1.0 1.0 0.8 0.8 0.8