I have gene expression values for 10,000 genes (A1--A10,000) across 100 samples (S97---S197). I also have weight values for all 100 samples with some NA values.
I would like to perform linear regression of y (weight) versus 10,000 genes' expression, individually across all 100 samples, with output of the p value for each of the 10,000 genes. The goal is to identify genes that correlate in expression to the weight of the sample.
I also want to drop genes that have zero value for expression across all samples.How can i run independent linear regressions on 10,000 genes, and have the output be the p value in a table?
Some of lines of my input data looks like: