How to extract p-values of regression stored as list

Hey guys,
I've been trying for a while to regress between the first observation and the rest of the 199 observations
I used the lappy function and the regression result is stored as a list in the environment.
Here is the code I am using right now.
myre1 <- apply(2:ncol(muscle), function(x) lm(muscle[,1] ~ muscle[,x], data = muscle))
myre2 <- lapply(muscle[,-1], function(x) lm(muscle$GIR ~ x))

to extract the coefficient

myre3 <- lapply(2:ncol(muscle), function(x) coefficients(lm(muscle[,1] ~ muscle[,x], data = muscle)))
myre4 <- lapply(muscle[,-1], function(x) coefficients(lm(muscle$GIR ~ x)))

However, none of them would answer my question. My aim is to get only the list of p_values as a data frame and determine how many observations are less than 0.05.

any help would be highly appreciated!
Best,
Amaremyre1 myre2

im not sure what it means to regress from one observation to others.
If we were using iris dataset. what would it mean to 'regress' the first observation of Petal.Length on the other observations of it ? or is your data structured very differently ?
examples of what you are doing would be helpful. screenshots are not helpful.
examples using common builtin datasets to r (like iris, mtcars etc) are good
also you can put your own data inside dput() function and get a shareable representation.

Thanks for your replay!
Here I put for you as an example what I want to do.

BMI <- c(30, 23,27,35,12, 10,17,24,21,15,16)
g1 <- c(35,2,3,4,5,6,7, 10,12,41,76)  
g2 <- c(20,2,7,2,8,5,5,3,7,2,12)
g3 <- c(5,0,4,5,2,4,8,9,20,1,11)
data <- data.frame(BMI, g1, g2, g3)

# Here I did simple regression between BMI and the rest of the columns using lapply function()
# my gole is to know how many of the variables are less than 0.05 and take export them as data frame in csv
re <- lapply(2:ncol(data), function(x) lm(data[,1] ~ data[,x], data = data))

Thank you!

there may be more elegant ways, but i came up with this

BMI <- c(30, 23,27,35,12, 10,17,24,21,15,16)
g1 <- c(35,2,3,4,5,6,7, 10,12,41,76)  
g2 <- c(20,2,7,2,8,5,5,3,7,2,12)
g3 <- c(5,0,4,5,2,4,8,9,20,1,11)
data <- data.frame(BMI, g1, g2, g3)

library(purrr)
re <- map(2:ncol(data),
          ~ lm(data[,1] ~ data[,.x], data = data)) 

re2 <- map(re,
           ~summary(.)$coefficients[,4])

re3 <- map2_dfr(.x = names(data)[2:ncol(data)],
            .y = re2,
            .f = ~tibble(variablename=.x,
                         pval_intercept=.y[1],
                         pval_var=.y[2],
                         ))
1 Like

The broom package from the tidyverse is designed for exactly this task; where you need the text output of your model to be converted into a neat data frame.

library(tidyverse)
library(broom)

data <- data.frame(BMI = c(30, 23, 27, 35, 12, 10, 17, 24, 21, 15, 16),
                   g1 = c(35, 2, 3, 4, 5, 6, 7, 10, 12, 41, 76), 
                   g2 = c(20, 2, 7, 2, 8, 5, 5, 3, 7, 2, 12), 
                   g3 = c(5, 0, 4, 5, 2, 4, 8, 9, 20, 1, 11))

data %>% 
  pivot_longer(-BMI) %>% 
  group_split(name) %>% 
  set_names(nm = map(., ~ first(.x$name))) %>% 
  map(~ tidy(lm(BMI ~ value, data = .))) %>% 
  map(~ filter(., p.value < 0.05))
#> $g1
#> # A tibble: 1 x 5
#>   term        estimate std.error statistic   p.value
#>   <chr>          <dbl>     <dbl>     <dbl>     <dbl>
#> 1 (Intercept)     22.0      3.15      6.98 0.0000647
#> 
#> $g2
#> # A tibble: 1 x 5
#>   term        estimate std.error statistic  p.value
#>   <chr>          <dbl>     <dbl>     <dbl>    <dbl>
#> 1 (Intercept)     19.8      4.00      4.94 0.000801
#> 
#> $g3
#> # A tibble: 1 x 5
#>   term        estimate std.error statistic  p.value
#>   <chr>          <dbl>     <dbl>     <dbl>    <dbl>
#> 1 (Intercept)     20.6      3.79      5.43 0.000416

Created on 2020-04-22 by the reprex package (v0.3.0)

Note: This is kind of a deep cut tidyverse solution so apologies if you were looking for a base R solution.

1 Like

Dear Nirgrahamuk,
Thank you so much! Indeed its workout.
However, how I should approach if I replace BMI by categorical data like sex

gender <- c("m","m", "m", "m", "m", "f","f", "f", "f","f","f" )
g1 <- c(35,2,3,4,5,6,7, 10,12,41,76)
g2 <- c(20,2,7,2,8,5,5,3,7,2,12)
g3 <- c(5,0,4,5,2,4,8,9,20,1,11)
data <- data.frame(gender, g1, g2, g3)

Thanks so much!

Thank you so much! It works.
Could you also show if we could change the continues data (BMI) to categorical (sex)?
Many thanks!

you would need gender as a numeric, not factor so

data$gender <- ifelse(data$gender=="f",1L,0L)
#or 
data <- mutate(data,
               gender=ifelse(data$gender=="f",1L,0L))
1 Like

Hi,
Your example worked out and but do you think I can save the result in CSV format?
Thanks!

We could use bind_rows() to collapse the list of data frames into one like this:

library(tidyverse)
library(broom)

data <- data.frame(BMI = c(30, 23, 27, 35, 12, 10, 17, 24, 21, 15, 16),
                   g1 = c(35, 2, 3, 4, 5, 6, 7, 10, 12, 41, 76), 
                   g2 = c(20, 2, 7, 2, 8, 5, 5, 3, 7, 2, 12), 
                   g3 = c(5, 0, 4, 5, 2, 4, 8, 9, 20, 1, 11))

output <- data %>% 
  pivot_longer(-BMI) %>% 
  group_split(name) %>% 
  set_names(nm = map(., ~ first(.x$name))) %>% 
  map(~ tidy(lm(BMI ~ value, data = .))) %>% 
  map(~ filter(., p.value < 0.05)) %>% 
  bind_rows(.id = "var")

output
#> # A tibble: 3 x 6
#>   var   term        estimate std.error statistic   p.value
#>   <chr> <chr>          <dbl>     <dbl>     <dbl>     <dbl>
#> 1 g1    (Intercept)     22.0      3.15      6.98 0.0000647
#> 2 g2    (Intercept)     19.8      4.00      4.94 0.000801 
#> 3 g3    (Intercept)     20.6      3.79      5.43 0.000416

Created on 2020-04-23 by the reprex package (v0.3.0)

Then you can use either base::write.csv() or readr::write_csv() to export the object output to a CSV file.

1 Like

Thanks so much again!.
But in my actual data, each row are repeating. You may see it in the picture.

Capture|436x199 Capture12

here is the code
write.csv(IMATData %>%
pivot_longer(-c("category", "gender","Tissue","age")) %>%
group_split(name) %>%
set_names(nm = map(., ~ first(.x$name))) %>%
map(~ tidy(lm(age~ value, data = .))) %>%
map(~ filter(., p.value < 0.05)) %>%
bind_rows(.id = "var"),file = "D:/Users/amare\df.csv")
Thanks!

Actually I have to learn this amazing package.

@amare I don't understand. The output of simple linear regression contains p-values for the intercept and the independent variable so you should have 2 values resulting from each model if their p-values are < 0.05. This is exactly what we see in the output.

If you're not interested in the intercept, you'll have to modify the formula for your linear model to lm(age ~ value - 1, data = .).

Thank you so much! I get it! it was help full talking to you and other staffs in the platform!

Hello Dear,
excuse for taking your time but I am wondering what If the categorical variable is holding more than two levels like ("A", "B"; "C"; "D").

How I should proceed If I want to see the prediction of g1, g2, and g3 on the response variable "sep" in a separate regression ("sepA vs sepC", "sepB vs sepC", "sepD vs sepC and sepD vs sepA)
I mean I want to contrast levels A, B, D with C and D with A in a separate regression.

You may use the following vectors as an example:
sep <- c("A","B", "C", "C", "D", "A","A", "D", "B","B","D";C" )
g1 <- c(35,2,3,4,5,6,7, 10,12,41,76,45)
g2 <- c(20,2,7,2,8,5,5,3,7,2,12, 13)
g3 <- c(5,0,4,5,2,4,8,9,20,1,11, 24)
data <- data.frame(sep, g1, g2, g3)

Thank you!

This forum is best for tacking specific technical questions, rather than educating on broad topics. I think you should research multinomial logistic regression, there are many tutorials online for all sorts of flavours.

1 Like

Hi Siddharthprabhu,
I have tried your suggestion as follow to remove the intercept but the I found nothing in the csv file !
Capture|451x154

write.csv(IMATData %>%
pivot_longer(-c("category", "gender","Tissue","age")) %>%
group_split(name) %>%
set_names(nm = map(., ~ first(.x$name))) %>%
map(~ tidy(lm(age~ value-1, data = .))) %>%
map(~ filter(., p.value < 0.05)) %>%
bind_rows(.id = "var"),file = "D:/Users/amare\df.csv")
May I know where is the probleam?
Thanks!

Maybe the model's outputs don't have any independent variables with p-values < 0.05? The code works for me with the sample data.

library(tidyverse)
library(broom)

data <- data.frame(BMI = c(30, 23, 27, 35, 12, 10, 17, 24, 21, 15, 16),
                   g1 = c(35, 2, 3, 4, 5, 6, 7, 10, 12, 41, 76), 
                   g2 = c(20, 2, 7, 2, 8, 5, 5, 3, 7, 2, 12), 
                   g3 = c(5, 0, 4, 5, 2, 4, 8, 9, 20, 1, 11))

data %>% 
  pivot_longer(-BMI) %>% 
  group_split(name) %>% 
  set_names(nm = map(., ~ first(.x$name))) %>% 
  map(~ tidy(lm(BMI ~ value - 1, data = .))) %>% 
  map(~ filter(., p.value < 0.05)) %>% 
  bind_rows(.id = "var")
#> # A tibble: 2 x 6
#>   var   term  estimate std.error statistic p.value
#>   <chr> <chr>    <dbl>     <dbl>     <dbl>   <dbl>
#> 1 g2    value     2.03     0.534      3.80 0.00347
#> 2 g3    value     1.94     0.586      3.31 0.00787

Created on 2020-04-24 by the reprex package (v0.3.0)

I am not sure what is wrong here!
If you could see the following screenshot. One is showing the result of the code with the intercept and the other is without the intercept (value-1) and it print noting. Capture Capture1

Strange. Could you please post the code snippets used to generate each of those results?

Yes!
Here are they.