From Z-scores to P-value

Tut009 · August 10, 2020, 5:08pm

Hello,

Issue: I have a large summary of statistics data set of 15 million SNPs with Z-scores that was shared by a former colleague (who I unfortunately CAN NOT reach him via email). I would like to distill the 15 mills rows of SNPs to just the ones with statistically significant so I cross check them with a two-thousand of interest.

I would really appreciate any suggestion/or guidance on the following:

A package that can convert Z-Scores to P-value?
Are there package(s) that would help me efficiently filter out the significant SNPs and allow me to compare two columns from two different data.frame files?

jrkrideau · August 10, 2020, 6:16pm

This looks like it should work but have you run the rest of the exercise by a biostatistician? It sounds as a bit strange to someone outside the bio field.

You should be able to just filter the data and do a merge() or inner_join I think.

fcas80 · August 10, 2020, 6:59pm

How about this, building on jrkrideau:

set.seed(1)
z <- rnorm(100,0,1)
p <- pnorm(z, lower.tail=FALSE) # right-tailed test
df <- data.frame(cbind(z, p))
df2 <- subset(df, p<.05)
df2

jrkrideau · August 10, 2020, 8:20pm

Tidier than mine but can we assume the two files are in the same order?

Tut009 · August 11, 2020, 5:48pm

Thanks for your input and the link; will be trying out the calculation today.

yes, I need to follow-up with a biostatistician since the one who produced this data is no longer in reach!

Tut009 · August 11, 2020, 6:20pm

Please pardon my questions if they seem silly, as I am in the process of learning

This is the first time I see "set.seed" function! Based on what I understood from the my websearch- this is done to make sure I get the same P-values every time this code is ran along with my data set?

why set set n=100? is this something to do with normal distribution for the (68-95-99.7)?

Thank you so much for your time!

fcas80 · August 11, 2020, 6:33pm

Yes, set.seed(x) ensures you get the same random numbers each time.

I just chose n=100 to get a lot of random numbers for z. You would not use my set.seed, nor my z <- rnorm(100,0,1). Your z is from your column of z scores.

system · September 1, 2020, 6:33pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.