Descriptive statistics of a variable in R

Hi there,

I am new to learning and using R. I have been trying to find out summary statistics of a variable in a filtered data. I am using the following code:

library(dplyr)
diameter7 <- filter(nigeria6, electricity_area == 1)
diameter7 %>%
  group_by(affected7) %>%
  summary(electricty_area)

However, I keep getting the error message that electricity_area is not found. But I have checked the data and electricity_area is very much present. I am confused and can't really figure out what I am doing wrong here. Please help.

what happens if you do

str(diameter7)

?

We probably need to see some sample data. A handy way to supply sample data is to use the dput() function. See ?dput. If you have a very large data set then something like head(dput(myfile), 100) will likely supply enough data for us to work with.

Here are two options.

library(tidyverse)

# for printing
iris %>% split(.$Species) %>% map(summary)
#> $setosa
#>   Sepal.Length    Sepal.Width     Petal.Length    Petal.Width   
#>  Min.   :4.300   Min.   :2.300   Min.   :1.000   Min.   :0.100  
#>  1st Qu.:4.800   1st Qu.:3.200   1st Qu.:1.400   1st Qu.:0.200  
#>  Median :5.000   Median :3.400   Median :1.500   Median :0.200  
#>  Mean   :5.006   Mean   :3.428   Mean   :1.462   Mean   :0.246  
#>  3rd Qu.:5.200   3rd Qu.:3.675   3rd Qu.:1.575   3rd Qu.:0.300  
#>  Max.   :5.800   Max.   :4.400   Max.   :1.900   Max.   :0.600  
#>        Species  
#>  setosa    :50  
#>  versicolor: 0  
#>  virginica : 0  
#>                 
#>                 
#>                 
#> 
#> $versicolor
#>   Sepal.Length    Sepal.Width     Petal.Length   Petal.Width          Species  
#>  Min.   :4.900   Min.   :2.000   Min.   :3.00   Min.   :1.000   setosa    : 0  
#>  1st Qu.:5.600   1st Qu.:2.525   1st Qu.:4.00   1st Qu.:1.200   versicolor:50  
#>  Median :5.900   Median :2.800   Median :4.35   Median :1.300   virginica : 0  
#>  Mean   :5.936   Mean   :2.770   Mean   :4.26   Mean   :1.326                  
#>  3rd Qu.:6.300   3rd Qu.:3.000   3rd Qu.:4.60   3rd Qu.:1.500                  
#>  Max.   :7.000   Max.   :3.400   Max.   :5.10   Max.   :1.800                  
#> 
#> $virginica
#>   Sepal.Length    Sepal.Width     Petal.Length    Petal.Width   
#>  Min.   :4.900   Min.   :2.200   Min.   :4.500   Min.   :1.400  
#>  1st Qu.:6.225   1st Qu.:2.800   1st Qu.:5.100   1st Qu.:1.800  
#>  Median :6.500   Median :3.000   Median :5.550   Median :2.000  
#>  Mean   :6.588   Mean   :2.974   Mean   :5.552   Mean   :2.026  
#>  3rd Qu.:6.900   3rd Qu.:3.175   3rd Qu.:5.875   3rd Qu.:2.300  
#>  Max.   :7.900   Max.   :3.800   Max.   :6.900   Max.   :2.500  
#>        Species  
#>  setosa    : 0  
#>  versicolor: 0  
#>  virginica :50  
#>                 
#>                 
#> 

# a dataframe with summary stats
# :( I don't want to use skimr
iris %>% 
  group_by(Species) %>%
  summarize(summ = summary(Sepal.Length) %>% broom::tidy() %>% list()) %>%
  unnest(summ)
#> Warning: `tidy.summaryDefault()` is deprecated. Please use `skimr::skim()`
#> instead.

#> Warning: `tidy.summaryDefault()` is deprecated. Please use `skimr::skim()`
#> instead.

#> Warning: `tidy.summaryDefault()` is deprecated. Please use `skimr::skim()`
#> instead.
#> # A tibble: 3 x 7
#>   Species    minimum    q1 median  mean    q3 maximum
#>   <fct>        <dbl> <dbl>  <dbl> <dbl> <dbl>   <dbl>
#> 1 setosa         4.3  4.8     5    5.01   5.2     5.8
#> 2 versicolor     4.9  5.6     5.9  5.94   6.3     7  
#> 3 virginica      4.9  6.22    6.5  6.59   6.9     7.9

Created on 2022-01-18 by the reprex package (v2.0.1)

1 Like

Maybe this is a better solution. No complaining from broom!

library(tidyverse)

iris %>% 
  as_tibble() %>% 
  group_by(Species) %>% 
  summarize_all(~summary(.) %>% as_tibble_row() %>% list()) %>% 
  unnest(-Species, names_sep = "_")
#> # A tibble: 3 x 25
#>   Species    Sepal.Length_Mi~ `Sepal.Length_1~ Sepal.Length_Me~ Sepal.Length_Me~
#>   <fct>      <table>          <table>          <table>          <table>         
#> 1 setosa     4.3              4.800            5.0              5.006           
#> 2 versicolor 4.9              5.600            5.9              5.936           
#> 3 virginica  4.9              6.225            6.5              6.588           
#> # ... with 20 more variables: Sepal.Length_3rd Qu. <table>,
#> #   Sepal.Length_Max. <table>, Sepal.Width_Min. <table>,
#> #   Sepal.Width_1st Qu. <table>, Sepal.Width_Median <table>,
#> #   Sepal.Width_Mean <table>, Sepal.Width_3rd Qu. <table>,
#> #   Sepal.Width_Max. <table>, Petal.Length_Min. <table>,
#> #   Petal.Length_1st Qu. <table>, Petal.Length_Median <table>,
#> #   Petal.Length_Mean <table>, Petal.Length_3rd Qu. <table>, ...

Created on 2022-01-18 by the reprex package (v2.0.1)

1 Like

Thank you so much to all of you for help. I finally solved my problem.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.