Issue: I have a large summary of statistics data set of 15 million SNPs with Z-scores that was shared by a former colleague (who I unfortunately CAN NOT reach him via email). I would like to distill the 15 mills rows of SNPs to just the ones with statistically significant so I cross check them with a two-thousand of interest.
I would really appreciate any suggestion/or guidance on the following:
A package that can convert Z-Scores to P-value?
Are there package(s) that would help me efficiently filter out the significant SNPs and allow me to compare two columns from two different data.frame files?
This looks like it should work but have you run the rest of the exercise by a biostatistician? It sounds as a bit strange to someone outside the bio field.
You should be able to just filter the data and do a merge() or inner_join I think.
Please pardon my questions if they seem silly, as I am in the process of learning
This is the first time I see "set.seed" function! Based on what I understood from the my websearch- this is done to make sure I get the same P-values every time this code is ran along with my data set?
why set set n=100? is this something to do with normal distribution for the (68-95-99.7)?
Yes, set.seed(x) ensures you get the same random numbers each time.
I just chose n=100 to get a lot of random numbers for z. You would not use my set.seed, nor my z <- rnorm(100,0,1). Your z is from your column of z scores.