How to extract p-values of regression stored as list

amare · April 22, 2020, 8:35am

Hey guys,
I've been trying for a while to regress between the first observation and the rest of the 199 observations
I used the lappy function and the regression result is stored as a list in the environment.
Here is the code I am using right now.
myre1 <- apply(2:ncol(muscle), function(x) lm(muscle[,1] ~ muscle[,x], data = muscle))
myre2 <- lapply(muscle[,-1], function(x) lm(muscle$GIR ~ x))

to extract the coefficient

myre3 <- lapply(2:ncol(muscle), function(x) coefficients(lm(muscle[,1] ~ muscle[,x], data = muscle)))
myre4 <- lapply(muscle[,-1], function(x) coefficients(lm(muscle$GIR ~ x)))

However, none of them would answer my question. My aim is to get only the list of p_values as a data frame and determine how many observations are less than 0.05.

any help would be highly appreciated!
Best,
Amare myre1 myre2

nirgrahamuk · April 22, 2020, 10:34am

im not sure what it means to regress from one observation to others.
If we were using iris dataset. what would it mean to 'regress' the first observation of Petal.Length on the other observations of it ? or is your data structured very differently ?
examples of what you are doing would be helpful. screenshots are not helpful.
examples using common builtin datasets to r (like iris, mtcars etc) are good
also you can put your own data inside dput() function and get a shareable representation.

amare · April 22, 2020, 11:07am

Thanks for your replay!
Here I put for you as an example what I want to do.

BMI <- c(30, 23,27,35,12, 10,17,24,21,15,16)
g1 <- c(35,2,3,4,5,6,7, 10,12,41,76)  
g2 <- c(20,2,7,2,8,5,5,3,7,2,12)
g3 <- c(5,0,4,5,2,4,8,9,20,1,11)
data <- data.frame(BMI, g1, g2, g3)

# Here I did simple regression between BMI and the rest of the columns using lapply function()
# my gole is to know how many of the variables are less than 0.05 and take export them as data frame in csv
re <- lapply(2:ncol(data), function(x) lm(data[,1] ~ data[,x], data = data))

Thank you!

nirgrahamuk · April 22, 2020, 11:25am

there may be more elegant ways, but i came up with this

BMI <- c(30, 23,27,35,12, 10,17,24,21,15,16)
g1 <- c(35,2,3,4,5,6,7, 10,12,41,76)  
g2 <- c(20,2,7,2,8,5,5,3,7,2,12)
g3 <- c(5,0,4,5,2,4,8,9,20,1,11)
data <- data.frame(BMI, g1, g2, g3)

library(purrr)
re <- map(2:ncol(data),
          ~ lm(data[,1] ~ data[,.x], data = data)) 

re2 <- map(re,
           ~summary(.)$coefficients[,4])

re3 <- map2_dfr(.x = names(data)[2:ncol(data)],
            .y = re2,
            .f = ~tibble(variablename=.x,
                         pval_intercept=.y[1],
                         pval_var=.y[2],
                         ))

siddharthprabhu · April 22, 2020, 12:02pm

The broom package from the tidyverse is designed for exactly this task; where you need the text output of your model to be converted into a neat data frame.

library(tidyverse)
library(broom)

data <- data.frame(BMI = c(30, 23, 27, 35, 12, 10, 17, 24, 21, 15, 16),
                   g1 = c(35, 2, 3, 4, 5, 6, 7, 10, 12, 41, 76), 
                   g2 = c(20, 2, 7, 2, 8, 5, 5, 3, 7, 2, 12), 
                   g3 = c(5, 0, 4, 5, 2, 4, 8, 9, 20, 1, 11))

data %>% 
  pivot_longer(-BMI) %>% 
  group_split(name) %>% 
  set_names(nm = map(., ~ first(.x$name))) %>% 
  map(~ tidy(lm(BMI ~ value, data = .))) %>% 
  map(~ filter(., p.value < 0.05))
#> $g1
#> # A tibble: 1 x 5
#>   term        estimate std.error statistic   p.value
#>   <chr>          <dbl>     <dbl>     <dbl>     <dbl>
#> 1 (Intercept)     22.0      3.15      6.98 0.0000647
#> 
#> $g2
#> # A tibble: 1 x 5
#>   term        estimate std.error statistic  p.value
#>   <chr>          <dbl>     <dbl>     <dbl>    <dbl>
#> 1 (Intercept)     19.8      4.00      4.94 0.000801
#> 
#> $g3
#> # A tibble: 1 x 5
#>   term        estimate std.error statistic  p.value
#>   <chr>          <dbl>     <dbl>     <dbl>    <dbl>
#> 1 (Intercept)     20.6      3.79      5.43 0.000416

^{Created on 2020-04-22 by the reprex package (v0.3.0)}

Note: This is kind of a deep cut tidyverse solution so apologies if you were looking for a base R solution.

amare · April 22, 2020, 2:04pm

Dear Nirgrahamuk,
Thank you so much! Indeed its workout.
However, how I should approach if I replace BMI by categorical data like sex

gender <- c("m","m", "m", "m", "m", "f","f", "f", "f","f","f" )
g1 <- c(35,2,3,4,5,6,7, 10,12,41,76)
g2 <- c(20,2,7,2,8,5,5,3,7,2,12)
g3 <- c(5,0,4,5,2,4,8,9,20,1,11)
data <- data.frame(gender, g1, g2, g3)

Thanks so much!

amare · April 22, 2020, 2:10pm

Thank you so much! It works.
Could you also show if we could change the continues data (BMI) to categorical (sex)?
Many thanks!

nirgrahamuk · April 22, 2020, 2:18pm

you would need gender as a numeric, not factor so

data$gender <- ifelse(data$gender=="f",1L,0L)
#or 
data <- mutate(data,
               gender=ifelse(data$gender=="f",1L,0L))

amare · April 22, 2020, 11:47pm

Hi,
Your example worked out and but do you think I can save the result in CSV format?
Thanks!

siddharthprabhu · April 23, 2020, 5:01am

We could use bind_rows() to collapse the list of data frames into one like this:

library(tidyverse)
library(broom)

data <- data.frame(BMI = c(30, 23, 27, 35, 12, 10, 17, 24, 21, 15, 16),
                   g1 = c(35, 2, 3, 4, 5, 6, 7, 10, 12, 41, 76), 
                   g2 = c(20, 2, 7, 2, 8, 5, 5, 3, 7, 2, 12), 
                   g3 = c(5, 0, 4, 5, 2, 4, 8, 9, 20, 1, 11))

output <- data %>% 
  pivot_longer(-BMI) %>% 
  group_split(name) %>% 
  set_names(nm = map(., ~ first(.x$name))) %>% 
  map(~ tidy(lm(BMI ~ value, data = .))) %>% 
  map(~ filter(., p.value < 0.05)) %>% 
  bind_rows(.id = "var")

output
#> # A tibble: 3 x 6
#>   var   term        estimate std.error statistic   p.value
#>   <chr> <chr>          <dbl>     <dbl>     <dbl>     <dbl>
#> 1 g1    (Intercept)     22.0      3.15      6.98 0.0000647
#> 2 g2    (Intercept)     19.8      4.00      4.94 0.000801 
#> 3 g3    (Intercept)     20.6      3.79      5.43 0.000416

^{Created on 2020-04-23 by the reprex package (v0.3.0)}

Then you can use either base::write.csv() or readr::write_csv() to export the object output to a CSV file.

amare · April 23, 2020, 8:03am

Thanks so much again!.
But in my actual data, each row are repeating. You may see it in the picture.

Capture|436x199 Capture12

here is the code
write.csv(IMATData %>%
pivot_longer(-c("category", "gender","Tissue","age")) %>%
group_split(name) %>%
set_names(nm = map(., ~ first(.x$name))) %>%
map(~ tidy(lm(age~ value, data = .))) %>%
map(~ filter(., p.value < 0.05)) %>%
bind_rows(.id = "var"),file = "D:/Users/amare\df.csv")
Thanks!

Actually I have to learn this amazing package.

siddharthprabhu · April 23, 2020, 8:27am

@amare I don't understand. The output of simple linear regression contains p-values for the intercept and the independent variable so you should have 2 values resulting from each model if their p-values are < 0.05. This is exactly what we see in the output.

If you're not interested in the intercept, you'll have to modify the formula for your linear model to lm(age ~ value - 1, data = .).

amare · April 23, 2020, 8:31am

Thank you so much! I get it! it was help full talking to you and other staffs in the platform!

amare · April 23, 2020, 10:51am

Hello Dear,
excuse for taking your time but I am wondering what If the categorical variable is holding more than two levels like ("A", "B"; "C"; "D").

How I should proceed If I want to see the prediction of g1, g2, and g3 on the response variable "sep" in a separate regression ("sepA vs sepC", "sepB vs sepC", "sepD vs sepC and sepD vs sepA)
I mean I want to contrast levels A, B, D with C and D with A in a separate regression.

You may use the following vectors as an example:
sep <- c("A","B", "C", "C", "D", "A","A", "D", "B","B","D";C" )
g1 <- c(35,2,3,4,5,6,7, 10,12,41,76,45)
g2 <- c(20,2,7,2,8,5,5,3,7,2,12, 13)
g3 <- c(5,0,4,5,2,4,8,9,20,1,11, 24)
data <- data.frame(sep, g1, g2, g3)

Thank you!

nirgrahamuk · April 23, 2020, 11:21am

This forum is best for tacking specific technical questions, rather than educating on broad topics. I think you should research multinomial logistic regression, there are many tutorials online for all sorts of flavours.

amare · April 24, 2020, 11:30am

Hi Siddharthprabhu,
I have tried your suggestion as follow to remove the intercept but the I found nothing in the csv file !
Capture|451x154

write.csv(IMATData %>%
pivot_longer(-c("category", "gender","Tissue","age")) %>%
group_split(name) %>%
set_names(nm = map(., ~ first(.x$name))) %>%
map(~ tidy(lm(age~ value-1, data = .))) %>%
map(~ filter(., p.value < 0.05)) %>%
bind_rows(.id = "var"),file = "D:/Users/amare\df.csv")
May I know where is the probleam?
Thanks!

siddharthprabhu · April 24, 2020, 11:54am

Maybe the model's outputs don't have any independent variables with p-values < 0.05? The code works for me with the sample data.

library(tidyverse)
library(broom)

data <- data.frame(BMI = c(30, 23, 27, 35, 12, 10, 17, 24, 21, 15, 16),
                   g1 = c(35, 2, 3, 4, 5, 6, 7, 10, 12, 41, 76), 
                   g2 = c(20, 2, 7, 2, 8, 5, 5, 3, 7, 2, 12), 
                   g3 = c(5, 0, 4, 5, 2, 4, 8, 9, 20, 1, 11))

data %>% 
  pivot_longer(-BMI) %>% 
  group_split(name) %>% 
  set_names(nm = map(., ~ first(.x$name))) %>% 
  map(~ tidy(lm(BMI ~ value - 1, data = .))) %>% 
  map(~ filter(., p.value < 0.05)) %>% 
  bind_rows(.id = "var")
#> # A tibble: 2 x 6
#>   var   term  estimate std.error statistic p.value
#>   <chr> <chr>    <dbl>     <dbl>     <dbl>   <dbl>
#> 1 g2    value     2.03     0.534      3.80 0.00347
#> 2 g3    value     1.94     0.586      3.31 0.00787

^{Created on 2020-04-24 by the reprex package (v0.3.0)}

amare · April 24, 2020, 12:06pm

I am not sure what is wrong here!
If you could see the following screenshot. One is showing the result of the code with the intercept and the other is without the intercept (value-1) and it print noting. Capture Capture1

siddharthprabhu · April 24, 2020, 12:09pm

Strange. Could you please post the code snippets used to generate each of those results?

amare · April 24, 2020, 12:17pm

Yes!
Here are they.