# From Z-scores to P-value

Hello,

Issue: I have a large summary of statistics data set of 15 million SNPs with Z-scores that was shared by a former colleague (who I unfortunately CAN NOT reach him via email). I would like to distill the 15 mills rows of SNPs to just the ones with statistically significant so I cross check them with a two-thousand of interest.

I would really appreciate any suggestion/or guidance on the following:

1. A package that can convert Z-Scores to P-value?

2. Are there package(s) that would help me efficiently filter out the significant SNPs and allow me to compare two columns from two different data.frame files?

This looks like it should work but have you run the rest of the exercise by a biostatistician? It sounds as a bit strange to someone outside the bio field.

You should be able to just filter the data and do a merge() or inner_join I think.

set.seed(1)
z <- rnorm(100,0,1)
p <- pnorm(z, lower.tail=FALSE) # right-tailed test
df <- data.frame(cbind(z, p))
df2 <- subset(df, p<.05)
df2

Tidier than mine but can we assume the two files are in the same order?

Thanks for your input and the link; will be trying out the calculation today.

yes, I need to follow-up with a biostatistician since the one who produced this data is no longer in reach!

Please pardon my questions if they seem silly, as I am in the process of learning

This is the first time I see "set.seed" function! Based on what I understood from the my websearch- this is done to make sure I get the same P-values every time this code is ran along with my data set?

why set set n=100? is this something to do with normal distribution for the (68-95-99.7)?

Thank you so much for your time!

Yes, set.seed(x) ensures you get the same random numbers each time.

I just chose n=100 to get a lot of random numbers for z. You would not use my set.seed, nor my z <- rnorm(100,0,1). Your z is from your column of z scores.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.