Exporting summary statistics for sub-groups from single data frame and running multiple t.tests

Anna1 · February 20, 2021, 2:15pm

Hi all, I have a two part question, might be related to how my data is formatted (I am still new to R)
I have data for 10 different ID, each have 15 measurements for size, and PB, the data is arranged in 3 columns:

Example of the dataframe:

data.frame(
stringsAsFactors = FALSE,
ID = c("PS_orange","PS_orange",
"PS_orange","PS_orange","PS_orange","PS_orange",
"PS_orange","PS_orange","PS_orange","PS_orange","PS_orange",
"PS_orange","PS_orange","PS_orange","PS_orange","PET",
"PET","PET","PET","PET"),
Size = c(299L,116L,85L,228L,56L,
128L,113L,75L,118L,299L,71L,235L,133L,237L,50L,
261L,239L,179L,116L,156L),
PB = c(217.95,255,229.44,255,255,
255,255,222.97,255,255,255,235.3,255,205.9,206.8,
90.67,196.8,204,173.1,118)

I wanted to calculate descriptive statistics for all ID both for size and PB and I used:

Li25 <- by(df, df$ID, summary)

This turns back all the descriptive statistics for each ID separately, which is great.

However, when I tried to export the summary statistics and tried to convert the results into a data frame

25df <- as.data.frame(by(df, df$ID, summary))

I get an error message of:

Error in as.data.frame.default(by(df, df$ID, summary)) :
cannot coerce class ‘"by"’ to a data.frame

I have tried other ways to export the summary statistics but have been unsuccessful.
Do you have any resolution for this?

Another part is related to the CI calculations using the t.test function.

In order to run the t.test for each ID I split the data frame

split1 <- split(df, df$ID)

and ran the t.test for each ID separately

t.test(split1[["PS_orange"]]$PB)
t.test(split1[["PET"]]$PB)

This works well but is a bit crumbemsome, I am sure there is a way to tell R to run these in one go and I tried

split(df, df$ID) %>%
lapply(., {
t.test(PB)
})

which turns back

Error in t.test(PB) : object 'PB' not found

If you have suggestions especially to overcome the manual transferring of summary statistics into excel, this would be much appreciated, thanks

FJCC · February 20, 2021, 6:16pm

I don't think summary() is a good function to use for this. It returns character values, as shown in the code below. There may be a built in function for summarizing a data frame but I wrote one using dplyr and tidyr.
It is not clear what you want to do with the t test results. Since you mentioned the confidence interval, I wrote some code to calculate that for each ID and Size/PB.

DF <- data.frame(
  stringsAsFactors = FALSE,
  ID = c("PS_orange","PS_orange",
         "PS_orange","PS_orange","PS_orange","PS_orange",
         "PS_orange","PS_orange","PS_orange","PS_orange","PS_orange",
         "PS_orange","PS_orange","PS_orange","PS_orange","PET",
         "PET","PET","PET","PET"),
  Size = c(299L,116L,85L,228L,56L,
           128L,113L,75L,118L,299L,71L,235L,133L,237L,50L,
           261L,239L,179L,116L,156L),
  PB = c(217.95,255,229.44,255,255,
         255,255,222.97,255,255,255,235.3,255,205.9,206.8,
         90.67,196.8,204,173.1,118))

#summary() returns character values
thing <- by(DF, DF$ID, summary,simplify = TRUE)
thing2 <- thing[[1]]
thing2[2,2]
#> [1] "1st Qu.:156.0  "

#Use dplyr and tidyr to summarize data
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(tidyr)
Stats <- DF %>% pivot_longer(Size:PB) %>% 
  group_by(ID, name) %>%
  summarize(MIN = min(value), Q25 = quantile(value, probs = 0.25),
            MEDIAN = median(value), MEAN = mean(value),
            Q75 = quantile(value, probs = 0.75),
            MAX = max(value))
#> `summarise()` regrouping output by 'ID' (override with `.groups` argument)
Stats  
#> # A tibble: 4 x 8
#> # Groups:   ID [2]
#>   ID        name    MIN   Q25 MEDIAN  MEAN   Q75   MAX
#>   <chr>     <chr> <dbl> <dbl>  <dbl> <dbl> <dbl> <dbl>
#> 1 PET       PB     90.7  118    173.  157.  197.   204
#> 2 PET       Size  116    156    179   190.  239    261
#> 3 PS_orange PB    206.   226.   255   241.  255    255
#> 4 PS_orange Size   50     80    118   150.  232.   299


My_t_test <- function(V) { 
  tmp <- t.test(V)
  tmp$conf.int
}
T_Test <- DF %>% pivot_longer(Size:PB) %>% 
  group_by(ID, name) %>% 
  summarize(CI_low = My_t_test(value)[1], CI_high = My_t_test(value)[2])
#> `summarise()` regrouping output by 'ID' (override with `.groups` argument)
T_Test
#> # A tibble: 4 x 4
#> # Groups:   ID [2]
#>   ID        name  CI_low CI_high
#>   <chr>     <chr>  <dbl>   <dbl>
#> 1 PET       PB      94.5    219.
#> 2 PET       Size   116.     264.
#> 3 PS_orange PB     230.     252.
#> 4 PS_orange Size   102.     197.

^{Created on 2021-02-20 by the reprex package (v0.3.0)}

Anna1 · February 22, 2021, 11:54am

Dear FJCC,

Thank you!
This enabled me to export the summary stats into CSV

And yes, I was after the Ci of the mean values for PB.

Thank you again, your help was much appreciated

system · March 15, 2021, 11:54am

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.