Take average values only across duplicated rows?

I have a data frame with some rows duplicated by a certain column. I want to take the average in those cases, leaving a data frame with only unique values by that column.

mydf <- data.frame("color"=c("blue", "blue", "red", "orange", "green"), 
                   "value"=c(10,30,1,2,3))

This is just a minimal reprex - but my 'real' data is much bigger and more complex. How can I make a new data frame that only lists "blue" once, with a value of 20 (since it's the average) - and leaves everything else untouched?

This is a typical example of grouping a data frame by one (or more) columns and summarizing the data in another column.

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
mydf <- data.frame("color"=c("blue", "blue", "red", "orange", "green"), 
                   "value"=c(10,30,1,2,3))
mydf_2 <- mydf %>% group_by(color) %>% summarize(value=mean(value))
#> `summarise()` ungrouping output (override with `.groups` argument)
mydf_2
#> # A tibble: 4 x 2
#>   color  value
#>   <chr>  <dbl>
#> 1 blue      20
#> 2 green      3
#> 3 orange     2
#> 4 red        1

Created on 2021-07-30 by the reprex package (v0.3.0)

1 Like

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.