Help for using linear mixed model for RNA-seq data in R

Hello,

I have a gene expression matrix from RNA-seq as

 gene Individual1 Individual2
 Gene1  77.9        89.7
 Gene2  67.5        14.8

From RNA-seq data, I performed normalization which is represented by the values as 77.9 etc. In the matrix, genes are in rows, columns have samples with FPKM values for each gene across the samples. I want to run a linear mixed effect model to calculate p-vlaues for each gene across all the samples. I want to run a linear mixed model like:

Model = lmer(FPKM ~ (1|gene), data=X)

I am getting an error as there is no variable called FPKM. However, the FPKM values are represented in a matrix for each individual in the columns for each gene. I need help to run the model to generate p-values for each gene across the samples. I will appreciate any help. Thank you!

X must have a variable with the name FPKM

Thank you! I have the FPKM matrix as shown above. How can I make a variable named FPKM representing an array of FPKM values.

The matrix should be converted to a data frame, with individuals as rows and genes as columns. The gene name should be used to indicate the FPKM value. The gene is the fixed effect and the individual is the random effect.

Thank you! I have converted the data frame with individuals as rows and genes as columns. Can you please help in using gene name as FPKM value to run the model with gene as fixed effect and individual as random effect.

1 Like

suppressPackageStartupMessages({library(dplyr)
  library(lme4)})
# synthetic data set
set.seed(37)
basket <- seq(70.1,79.9,.1)
gene <- seq(1:10) %>% paste0("gene",.)
FPKM <- sample(basket,100,replace = TRUE)
data.frame(gene,FPKM) %>% mutate(gene = as.factor(gene)) -> X
head(X)
#>    gene FPKM
#> 1 gene1 75.4
#> 2 gene2 74.7
#> 3 gene3 72.4
#> 4 gene4 75.0
#> 5 gene5 70.3
#> 6 gene6 73.6
Model = lmer(FPKM ~ (1|gene), data=X)
Model
#> Linear mixed model fit by REML ['lmerMod']
#> Formula: FPKM ~ (1 | gene)
#>    Data: X
#> REML criterion at convergence: 490.2234
#> Random effects:
#>  Groups   Name        Std.Dev.
#>  gene     (Intercept) 0.4158  
#>  Residual             2.7858  
#> Number of obs: 100, groups:  gene, 10
#> Fixed Effects:
#> (Intercept)  
#>       74.91

Created on 2020-09-18 by the reprex package (v0.3.0)

Thank you @technocrat! I have a question about basket variable. I understand it represents FPKM values. Can you please tell which order the values are put such as all values for Gene1 then Gene2 (row-wise)? My real dataset is much bigger and I am making a variable representing all FPKM values to use in above code.

basket holds values in 0.1 increments between in the closed interval [70.1,79.9] to be sampled 100 times with replacement to create the column FPKM. gene is constructed from the closed interval [1,10] with the string gene prepended. When bound as a data frame with FPKM the value recycles and the result is as shown.

To convert from the existing structure

suppressPackageStartupMessages({library(dplyr)
                                library(tidyr)})
X <- data.frame(gene = c("Gene1","Gene2"),
                Individual1 = c(77.9,89.7),
                Individual2 = c(67.5,14.8)
                )
pivot_longer(X,cols = starts_with("I")) %>% select(-name)
#> # A tibble: 4 x 2
#>   gene  value
#>   <chr> <dbl>
#> 1 Gene1  77.9
#> 2 Gene1  67.5
#> 3 Gene2  89.7
#> 4 Gene2  14.8

Created on 2020-09-18 by the reprex package (v0.3.0)