# How to average variables in R based on another variable

I am really new to R and am hoping to find the average rate of error for each individual id I have in my data set. I have created the data set error_rates that includes the columns "sorter_id_error" and "sorter_error".

My code:

##Determine the average error_rate per volunteer id
error_data<- data.frame(sorter_id_error,sorter_error)
library(tidyverse)
library(dplyr)
error_data %>%
group_by(sorter_id_error) %>% summarise(average=mean(sorter_error))

My error:

I keep getting warnings returned and my data ends up with a bunch of NAs instead of the averages.
sorter_id_error average

1 6 NA
2 7 NA
3 17 NA
4 25 NA
5 33 NA
6 34 NA
7 41 NA
8 45 NA
9 46 NA
10 47 NA

# ... with 118 more rows

There were 50 or more warnings (use warnings() to see the first 50)

What I would like:
I would like to average the the `sorter_error` variable, so that for each averages all the values of the `sorter_id_error` variable by levels.

What am I doing wrong/ how can I fix my code in order to get these values?

Thanks!

Perhaps you have NA values in your data. If so, these can be ignored by setting na.rm = TRUE in your call to mean().

``````library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#>     filter, lag
#> The following objects are masked from 'package:base':
#>
#>     intersect, setdiff, setequal, union
d <- data.frame(id=c(1,2,3, 1, 2, 3, 1, 2, 3, 1), x=c(1,2,3,4,3,4,5,2,3,2) )
Stats <- d %>% group_by(id) %>% summarize(Avg = mean(x))
Stats
#> # A tibble: 3 x 2
#>      id   Avg
#>   <dbl> <dbl>
#> 1     1  3
#> 2     2  2.33
#> 3     3  3.33

#with NA
d <- data.frame(id=c(1,2,3, 1, 2, 3, 1, 2, 3, 1), x=c(1,NA,3,NA,3,NA,5,2,3,2) )
Stats <- d %>% group_by(id) %>% summarize(Avg = mean(x))
Stats
#> # A tibble: 3 x 2
#>      id   Avg
#>   <dbl> <dbl>
#> 1     1    NA
#> 2     2    NA
#> 3     3    NA

#With NA and na.rm = TRUE
d <- data.frame(id=c(1,2,3, 1, 2, 3, 1, 2, 3, 1), x=c(1,NA,3,NA,3,NA,5,2,3,2) )
Stats <- d %>% group_by(id) %>% summarize(Avg = mean(x, na.rm = TRUE))
Stats
#> # A tibble: 3 x 2
#>      id   Avg
#>   <dbl> <dbl>
#> 1     1  2.67
#> 2     2  2.5
#> 3     3  3
``````

Created on 2019-07-12 by the reprex package (v0.2.1)

Hmm I added the specification into my code but am still turning up this warning
accompanied by NA values:

In mean.default(sorter_error, na.rm = TRUE) :
argument is not numeric or logical: returning NA

I have also tried omitting NA values from my data using na.omit.

Here is my code with the specification:
error_data %>%
group_by(sorter_id_error) %>% summarise(average=mean(sorter_error, na.rm = TRUE))

Hi, welcome!

We don't really have enough info to help you out. Could you ask this with a minimal REPRoducible EXample (reprex)? A reprex makes it much easier for others to understand your issue and figure out how to help.

If you've never heard of a reprex before, you might want to start by reading this FAQ:

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.