Grouping and summarising data

I have data that is based on village name, incidence rates and maize yield (high, medium, low) for each village.
I'm trying to group this into maize yield and show the average incidence rates for the different yields (high, medium, low).
I'm very new to this and have been trying for a while but don't understand what I'm doing wrong, any tips or explanation on how to do this?

See the example below for how to do this with a built-in dataset in R.

I think you'd benefit from reading up about {dplyr}!

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

# first lets look at some data
# note that there's three columns:
# * len (numeric)
# * dose (numeric, but distribution seems low)
# * supp (categorical)
summary(ToothGrowth)
#>       len        supp         dose      
#>  Min.   : 4.20   OJ:30   Min.   :0.500  
#>  1st Qu.:13.07   VC:30   1st Qu.:0.500  
#>  Median :19.25           Median :1.000  
#>  Mean   :18.81           Mean   :1.167  
#>  3rd Qu.:25.27           3rd Qu.:2.000  
#>  Max.   :33.90           Max.   :2.000

# how might we get the average "len" within the
# different categories?
ToothGrowth %>%
  group_by(supp) %>%
  summarise(len = mean(len, na.rm = T))
#> # A tibble: 2 × 2
#>   supp    len
#>   <fct> <dbl>
#> 1 OJ     20.7
#> 2 VC     17.0

ToothGrowth %>%
  group_by(dose) %>%
  summarise(len = mean(len, na.rm = T))
#> # A tibble: 3 × 2
#>    dose   len
#>   <dbl> <dbl>
#> 1   0.5  10.6
#> 2   1    19.7
#> 3   2    26.1

ToothGrowth %>%
  group_by(supp, dose) %>%
  summarise(len = mean(len, na.rm = T))
#> `summarise()` has grouped output by 'supp'. You can override using the
#> `.groups` argument.
#> # A tibble: 6 × 3
#> # Groups:   supp [2]
#>   supp   dose   len
#>   <fct> <dbl> <dbl>
#> 1 OJ      0.5 13.2 
#> 2 OJ      1   22.7 
#> 3 OJ      2   26.1 
#> 4 VC      0.5  7.98
#> 5 VC      1   16.8 
#> 6 VC      2   26.1

Created on 2022-08-20 by the reprex package (v2.0.1)

1 Like

look at this example :
lets say we have the below data frame :

df <- data.frame(team = c('A', 'A', 'A', 'B', 'B', 'B', 'B', 'B', 'C', 'C', 'C', 'C'),
position = c('G', 'G', 'F', 'G', 'F', 'F', 'F', 'G', 'G', 'F', 'F', 'F'),
points = c(4, 13, 7, 8, 15, 15, 17, 9, 21, 22, 25, 31))
df

image

we will use this command :
df %>% group_by(team) %>% summarise(total=sum(points))

will group the column ( Team ) to 3 values (A,B,C) then after it will summarize the total of points :

output :
team total
A 24
B 64
C 99

you can notice how , its grouped team and showed the total summary of the points .

I hope its clear .
if you need any further clarification please let me know .

Kind Regards

Using R benefits from applying a functional programming approach—f(x) - y, just like school algebra.

x is what is at hand, in this case a data frame with three variables, one of which is numeric and two of which are character. This organization is tidy because each row represents one unique observation.

y is the desired object, which presents a summary statistic (in this case mean) of the numeric variable stratified by yield.

f is the function to turn x into y. Just as x and y are composite objects, f may consist of the application of one function to one or more other functions. "Functions are first-class objects." Just like f(g(x)=y.

There's usually more than one function(s) available. @JackDavison has illustrated the use of dplyr to do this. dplyr is part of a suite of packages that are very popular on this site and there is a wealth of examples to find using the search tool. dplyr has a tendency to pose solutions in a procedural form—do this, then do that. That can be helpful but can result in a sense that each solution is unique to its problem. The focus may shift to how and the syntax required may multiply.

Here are three ways to create y

my_cars <- mtcars
# convert the grouping variable to a factor; if the cyl variable had
# been "four," "six" and "eight", it would work
# the same way
my_cars$cyl <- as.factor(my_cars$cyl)

# using {stats} (loaded by default)
aggregate(my_cars$hp, by = list(my_cars$cyl), FUN=mean)
#>   Group.1         x
#> 1       4  82.63636
#> 2       6 122.28571
#> 3       8 209.21429

# using data.table
library(data.table)
DT <- data.table(my_cars)
DT[,mean(hp), by = .(cyl)]
#>    cyl        V1
#> 1:   6 122.28571
#> 2:   4  82.63636
#> 3:   8 209.21429

# using gt (for when purpose of creating output is display)
library(gt)
output <- DT[,.(cyl,mpg)] |>
  gt(groupname_col = "cyl") |>
  summary_rows(
    groups = TRUE,
    columns = mpg,
    fns = list(
      average = "mean",
      total = "sum",
      SD = "sd")
  )

output
html { font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, Oxygen, Ubuntu, Cantarell, 'Helvetica Neue', 'Fira Sans', 'Droid Sans', Arial, sans-serif; }

#nqnhbqygif .gt_table {
display: table;
border-collapse: collapse;
margin-left: auto;
margin-right: auto;
color: #333333;
font-size: 16px;
font-weight: normal;
font-style: normal;
background-color: #FFFFFF;
width: auto;
border-top-style: solid;
border-top-width: 2px;
border-top-color: #A8A8A8;
border-right-style: none;
border-right-width: 2px;
border-right-color: #D3D3D3;
border-bottom-style: solid;
border-bottom-width: 2px;
border-bottom-color: #A8A8A8;
border-left-style: none;
border-left-width: 2px;
border-left-color: #D3D3D3;
}

#nqnhbqygif .gt_heading {
background-color: #FFFFFF;
text-align: center;
border-bottom-color: #FFFFFF;
border-left-style: none;
border-left-width: 1px;
border-left-color: #D3D3D3;
border-right-style: none;
border-right-width: 1px;
border-right-color: #D3D3D3;
}

#nqnhbqygif .gt_title {
color: #333333;
font-size: 125%;
font-weight: initial;
padding-top: 4px;
padding-bottom: 4px;
padding-left: 5px;
padding-right: 5px;
border-bottom-color: #FFFFFF;
border-bottom-width: 0;
}

#nqnhbqygif .gt_subtitle {
color: #333333;
font-size: 85%;
font-weight: initial;
padding-top: 0;
padding-bottom: 6px;
padding-left: 5px;
padding-right: 5px;
border-top-color: #FFFFFF;
border-top-width: 0;
}

#nqnhbqygif .gt_bottom_border {
border-bottom-style: solid;
border-bottom-width: 2px;
border-bottom-color: #D3D3D3;
}

#nqnhbqygif .gt_col_headings {
border-top-style: solid;
border-top-width: 2px;
border-top-color: #D3D3D3;
border-bottom-style: solid;
border-bottom-width: 2px;
border-bottom-color: #D3D3D3;
border-left-style: none;
border-left-width: 1px;
border-left-color: #D3D3D3;
border-right-style: none;
border-right-width: 1px;
border-right-color: #D3D3D3;
}

#nqnhbqygif .gt_col_heading {
color: #333333;
background-color: #FFFFFF;
font-size: 100%;
font-weight: normal;
text-transform: inherit;
border-left-style: none;
border-left-width: 1px;
border-left-color: #D3D3D3;
border-right-style: none;
border-right-width: 1px;
border-right-color: #D3D3D3;
vertical-align: bottom;
padding-top: 5px;
padding-bottom: 6px;
padding-left: 5px;
padding-right: 5px;
overflow-x: hidden;
}

#nqnhbqygif .gt_column_spanner_outer {
color: #333333;
background-color: #FFFFFF;
font-size: 100%;
font-weight: normal;
text-transform: inherit;
padding-top: 0;
padding-bottom: 0;
padding-left: 4px;
padding-right: 4px;
}

#nqnhbqygif .gt_column_spanner_outer:first-child {
padding-left: 0;
}

#nqnhbqygif .gt_column_spanner_outer:last-child {
padding-right: 0;
}

#nqnhbqygif .gt_column_spanner {
border-bottom-style: solid;
border-bottom-width: 2px;
border-bottom-color: #D3D3D3;
vertical-align: bottom;
padding-top: 5px;
padding-bottom: 5px;
overflow-x: hidden;
display: inline-block;
width: 100%;
}

#nqnhbqygif .gt_group_heading {
padding-top: 8px;
padding-bottom: 8px;
padding-left: 5px;
padding-right: 5px;
color: #333333;
background-color: #FFFFFF;
font-size: 100%;
font-weight: initial;
text-transform: inherit;
border-top-style: solid;
border-top-width: 2px;
border-top-color: #D3D3D3;
border-bottom-style: solid;
border-bottom-width: 2px;
border-bottom-color: #D3D3D3;
border-left-style: none;
border-left-width: 1px;
border-left-color: #D3D3D3;
border-right-style: none;
border-right-width: 1px;
border-right-color: #D3D3D3;
vertical-align: middle;
}

#nqnhbqygif .gt_empty_group_heading {
padding: 0.5px;
color: #333333;
background-color: #FFFFFF;
font-size: 100%;
font-weight: initial;
border-top-style: solid;
border-top-width: 2px;
border-top-color: #D3D3D3;
border-bottom-style: solid;
border-bottom-width: 2px;
border-bottom-color: #D3D3D3;
vertical-align: middle;
}

#nqnhbqygif .gt_from_md > :first-child {
margin-top: 0;
}

#nqnhbqygif .gt_from_md > :last-child {
margin-bottom: 0;
}

#nqnhbqygif .gt_row {
padding-top: 8px;
padding-bottom: 8px;
padding-left: 5px;
padding-right: 5px;
margin: 10px;
border-top-style: solid;
border-top-width: 1px;
border-top-color: #D3D3D3;
border-left-style: none;
border-left-width: 1px;
border-left-color: #D3D3D3;
border-right-style: none;
border-right-width: 1px;
border-right-color: #D3D3D3;
vertical-align: middle;
overflow-x: hidden;
}

#nqnhbqygif .gt_stub {
color: #333333;
background-color: #FFFFFF;
font-size: 100%;
font-weight: initial;
text-transform: inherit;
border-right-style: solid;
border-right-width: 2px;
border-right-color: #D3D3D3;
padding-left: 5px;
padding-right: 5px;
}

#nqnhbqygif .gt_stub_row_group {
color: #333333;
background-color: #FFFFFF;
font-size: 100%;
font-weight: initial;
text-transform: inherit;
border-right-style: solid;
border-right-width: 2px;
border-right-color: #D3D3D3;
padding-left: 5px;
padding-right: 5px;
vertical-align: top;
}

#nqnhbqygif .gt_row_group_first td {
border-top-width: 2px;
}

#nqnhbqygif .gt_summary_row {
color: #333333;
background-color: #FFFFFF;
text-transform: inherit;
padding-top: 8px;
padding-bottom: 8px;
padding-left: 5px;
padding-right: 5px;
}

#nqnhbqygif .gt_first_summary_row {
border-top-style: solid;
border-top-color: #D3D3D3;
}

#nqnhbqygif .gt_first_summary_row.thick {
border-top-width: 2px;
}

#nqnhbqygif .gt_last_summary_row {
padding-top: 8px;
padding-bottom: 8px;
padding-left: 5px;
padding-right: 5px;
border-bottom-style: solid;
border-bottom-width: 2px;
border-bottom-color: #D3D3D3;
}

#nqnhbqygif .gt_grand_summary_row {
color: #333333;
background-color: #FFFFFF;
text-transform: inherit;
padding-top: 8px;
padding-bottom: 8px;
padding-left: 5px;
padding-right: 5px;
}

#nqnhbqygif .gt_first_grand_summary_row {
padding-top: 8px;
padding-bottom: 8px;
padding-left: 5px;
padding-right: 5px;
border-top-style: double;
border-top-width: 6px;
border-top-color: #D3D3D3;
}

#nqnhbqygif .gt_striped {
background-color: rgba(128, 128, 128, 0.05);
}

#nqnhbqygif .gt_table_body {
border-top-style: solid;
border-top-width: 2px;
border-top-color: #D3D3D3;
border-bottom-style: solid;
border-bottom-width: 2px;
border-bottom-color: #D3D3D3;
}

#nqnhbqygif .gt_footnotes {
color: #333333;
background-color: #FFFFFF;
border-bottom-style: none;
border-bottom-width: 2px;
border-bottom-color: #D3D3D3;
border-left-style: none;
border-left-width: 2px;
border-left-color: #D3D3D3;
border-right-style: none;
border-right-width: 2px;
border-right-color: #D3D3D3;
}

#nqnhbqygif .gt_footnote {
margin: 0px;
font-size: 90%;
padding-left: 4px;
padding-right: 4px;
padding-left: 5px;
padding-right: 5px;
}

#nqnhbqygif .gt_sourcenotes {
color: #333333;
background-color: #FFFFFF;
border-bottom-style: none;
border-bottom-width: 2px;
border-bottom-color: #D3D3D3;
border-left-style: none;
border-left-width: 2px;
border-left-color: #D3D3D3;
border-right-style: none;
border-right-width: 2px;
border-right-color: #D3D3D3;
}

#nqnhbqygif .gt_sourcenote {
font-size: 90%;
padding-top: 4px;
padding-bottom: 4px;
padding-left: 5px;
padding-right: 5px;
}

#nqnhbqygif .gt_left {
text-align: left;
}

#nqnhbqygif .gt_center {
text-align: center;
}

#nqnhbqygif .gt_right {
text-align: right;
font-variant-numeric: tabular-nums;
}

#nqnhbqygif .gt_font_normal {
font-weight: normal;
}

#nqnhbqygif .gt_font_bold {
font-weight: bold;
}

#nqnhbqygif .gt_font_italic {
font-style: italic;
}

#nqnhbqygif .gt_super {
font-size: 65%;
}

#nqnhbqygif .gt_two_val_uncert {
display: inline-block;
line-height: 1em;
text-align: right;
font-size: 60%;
vertical-align: -0.25em;
margin-left: 0.1em;
}

#nqnhbqygif .gt_footnote_marks {
font-style: italic;
font-weight: normal;
font-size: 75%;
vertical-align: 0.4em;
}

#nqnhbqygif .gt_asterisk {
font-size: 100%;
vertical-align: 0;
}

#nqnhbqygif .gt_slash_mark {
font-size: 0.7em;
line-height: 0.7em;
vertical-align: 0.15em;
}

#nqnhbqygif .gt_fraction_numerator {
font-size: 0.6em;
line-height: 0.6em;
vertical-align: 0.45em;
}

#nqnhbqygif .gt_fraction_denominator {
font-size: 0.6em;
line-height: 0.6em;
vertical-align: -0.05em;
}

mpg
6
21.0
21.0
21.4
18.1
19.2
17.8
19.7
average 19.74
total 138.20
SD 1.45
4
22.8
24.4
22.8
32.4
30.4
33.9
21.5
27.3
26.0
30.4
21.4
average 26.66
total 293.30
SD 4.51
8
18.7
14.3
16.4
17.3
15.2
10.4
10.4
14.7
15.5
15.2
13.3
19.2
15.8
15.0
average 15.10
total 211.40
SD 2.56

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.