how to calculate the average value based on group id?

I need to create a new variable that is, for each consumer_id, the average value of daily_kwh for the year 2012.

# my data set:
data.frame(
    stringsAsFactors = FALSE,
           treatment = c(1L,1L,1L,1L,1L,1L,1L,
                         1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,
                         1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,
                         1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,1L),
         customer_id = c(1L,1L,1L,1L,1L,1L,1L,
                         1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,
                         1L,1L,1L,2L,2L,2L,2L,2L,2L,2L,2L,2L,2L,
                         2L,2L,2L,2L,2L,2L,2L,2L,2L,2L,2L,2L,2L,2L),
                post = c(0,0,0,0,0,0,0,0,0,
                         0,0,0,1,1,1,1,1,1,1,1,1,1,1,1,0,0,0,0,
                         0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1,
                         1,1),
           daily_kwh = c(49.969696969697,51,
                         49.2424242424242,24.3396226415094,34.75,48.1666666666667,
                         78.3103448275862,62.0606060606061,48.7142857142857,
                         37.3,52.03125,56.3870967741936,55.5806451612903,
                         51.1379310344828,44.1034482758621,31.8666666666667,
                         33.3333333333333,51.6896551724138,64.40625,54,
                         45.2758620689655,38.1818181818182,38.1785714285714,
                         53.0967741935484,65.1212121212121,70.2068965517241,
                         66.3225806451613,35.0666666666667,28.1428571428571,
                         27.5483870967742,34.84375,42.4666666666667,39.3103448275862,
                         35.875,50.9310344827586,55.9,70.78125,128.533333333333,
                         122.551724137931,84.9375,42.0333333333333,
                         29.1333333333333,32.5161290322581,26.258064516129,
                         24.4827586206897,27,49.7931034482759,68.8275862068966),
               month = as.factor(c("1",
                                   "2","3","4","5","6","7","8","9","10",
                                   "11","12","1","2","3","4","5","6","7",
                                   "8","9","10","11","12","1","2","3",
                                   "4","5","6","7","8","9","10","11","12",
                                   "1","2","3","4","5","6","7","8","9",
                                   "10","11","12")),
                year = as.factor(c("2012",
                                   "2012","2012","2012","2012","2012",
                                   "2012","2012","2012","2012","2012","2012",
                                   "2013","2013","2013","2013","2013","2013",
                                   "2013","2013","2013","2013","2013",
                                   "2013","2012","2012","2012","2012","2012",
                                   "2012","2012","2012","2012","2012",
                                   "2012","2012","2013","2013","2013","2013",
                                   "2013","2013","2013","2013","2013","2013",
                                   "2013","2013"))
)

For example, for consumer 1, the average value should be 49.97+ ...+ 56.39 (12 terms because 12 months in year 2012)

Using dplyr package
For processing 2012 only use filter()
For consumer id use group_by()
For New variable use mutate()
For average use mean()

To learn dplyr:

Section 5.1.3

1 Like

I did it. Thanks! Another problem is that when I write down year, customer_id, Do I need to add df$ before the variable? I'm always confused.

# create the average energy use variable 
df <- df %>% 
              filter(year == 2012) %>%
             group_by(customer_id) %>%
             mutate(avg_kwh = mean(daily_kwh))

well done :slight_smile:

generally if using dplyr() you would not be using $ type syntax to access variables.

$ is great for reaching into a dataframe and pulling out a variable as a vector. you often use it when working in base R.

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.