Dear all, what is the relevant formula to calculate the yearly outcomes by treatment group? i have tried many ways however i am unable to sort out the data as required.

@Hash What aggregation method would you like to use? You want to calculate the yearly **average** for each group? Or something else?

i have been trying to calculate the yearly mean outcomes of Y by the treatment group.

Here is the answer to your question after providing me with the data and a bit more explanation in private messages:

```
# Load dplyr
library(dplyr)
# Download the full data
cost <- read.csv("diff_in_diff/cost_data.csv.csv")
# Compute the mean outcome by year and by treatment
cost %>%
group_by(year, treatment) %>%
summarize(avg_Y = mean(Y, na.rm = TRUE))
# A tibble: 14 x 3
# Groups: year [7]
year treatment avg_Y
<int> <int> <dbl>
1 2005 0 5774.
2 2005 1 5294.
3 2006 0 5853.
4 2006 1 5400.
5 2007 0 5887.
6 2007 1 5436.
7 2008 0 6141.
8 2008 1 5367.
9 2009 0 6151.
10 2009 1 5389.
11 2010 0 6069.
12 2010 1 5321.
13 2011 0 5869.
14 2011 1 5195.
```

This is because you did not load the `dplyr`

package with: `library(dplyr)`

. If you don't have it installed, you will need to install it first with `install.packages("dplyr")`

Hey I have a similar problem thanks, I was wondering how do you do a line graph using the aggregated data? It seems like doing the normal plot(x,y) doesnâ€™t seem to work.

Hi @gagoko0087, how about you ask a new question where you provide more details on your issue. You may want to tag me with @gueyenono if you want. It would also help if you could share a sample of your data.