Hi to all
I have two dataframes (p.D, p.G and test) with 900K rows,
now i want to fill columns from C1 to C5 in p.D by following certain conditions. i followed code to do so
a = Sys.time()
for( i in c(1:nrow(test))){ # For each Combination in test
mm = c()
for( k in c(2:ncol(p.G))){ # For each marker in p.G
mm[k-1] = ifelse( is.na(p.G[ which( p.G$GEN == test$P2[i]), k]) &
is.na(p.G[ which( p.G$GEN == test$P1[i]), k]) , 3, # Both missing
ifelse( is.na(p.G[ which( p.G$GEN == test$P1[i]), k]) &
(p.G[ which( p.G$GEN == test$P2[i]), k] == 0 |
p.G[ which( p.G$GEN == test$P2[i]), k] == 2) , 3.5, # One is missing one is Hom
ifelse( is.na(p.G[ which( p.G$GEN == test$P1[i]), k]) &
p.G[ which( p.G$GEN == test$P2[i]), k] == 1 , 1.5, # One is missing one is Het
ifelse( is.na(p.G[ which( p.G$GEN == test$P2[i]), k]) &
(p.G[ which( p.G$GEN == test$P1[i]), k] == 0 |
p.G[ which( p.G$GEN == test$P1[i]), k] == 2) , 3.5, # One is missing one is Hom
ifelse( is.na(p.G[ which( p.G$GEN == test$P2[i]), k]) &
p.G[ which( p.G$GEN == test$P1[i]), k] == 1 , 1.5, # One is missing one is Het
ifelse(p.G[ which( p.G$GEN == test$P1[i]), k] == 0 &
p.G[ which( p.G$GEN == test$P2[i]), k] == 2, 0, # AA + BB
ifelse(p.G[ which( p.G$GEN == test$P1[i]), k] == 2 &
p.G[ which( p.G$GEN == test$P2[i]), k] == 0, 0, # BB + AA
ifelse(p.G[ which( p.G$GEN == test$P1[i]), k] == 0 &
p.G[ which( p.G$GEN == test$P2[i]), k] == 0, 10, # BB + BB
ifelse(p.G[ which( p.G$GEN == test$P1[i]), k] == 2 &
p.G[ which( p.G$GEN == test$P2[i]), k] == 2, 10, # AA + AA
ifelse(p.G[ which( p.G$GEN == test$P1[i]), k] == 1 &
p.G[ which( p.G$GEN == test$P2[i]), k] == 2, 1, # Het + Hom A
ifelse(p.G[ which( p.G$GEN == test$P1[i]), k] == 1 &
p.G[ which( p.G$GEN == test$P2[i]), k] == 0, 1, # Het + Hom B
ifelse(p.G[ which( p.G$GEN == test$P1[i]), k] == 0 &
p.G[ which( p.G$GEN == test$P2[i]), k] == 1, 1, # Het + Hom B
ifelse(p.G[ which( p.G$GEN == test$P1[i]), k] == 2 &
p.G[ which( p.G$GEN == test$P2[i]), k] == 1, 1, # Het + Hom A
ifelse(p.G[ which( p.G$GEN == test$P1[i]), k] == 1 &
p.G[ which( p.G$GEN == test$P2[i]), k] == 1, 2, # Het + Het
NA))))))))))))))
}
test$C1[i] = mean(mm)
test$C2[i] = length(which(mm == 10))
test$C3[i] = length(which(mm == 0))
test$C4[i] = length(which(mm == 3 | mm == 3.5 | mm == 1.5))
test$C5[i] = p.D[test$row[i], test$col[i]]
if(i %% 1000 == 0){
print(paste0(round((i / nrow(test)) * 100, 3),
"% (i = ", i,") completed in ",
round(Sys.time() - a, 2), " mins"))
}
}
return(test)
}
here numbers 0, 1,2 means
0=missing
1=het (both letters are not same i.e. A/T not same letters like A/A
2=homo 9same letters i.e. A/A or T/T
Here i have issue only with speed not with any part of the code. I tried run this code its running more than 12 hours and not finished yet. are there any ways to increase the speed of this above code? any help in this regard is highly appreciated
Thanks in advance