Aggregate data set by year (year is the column name and the years are in the rows of the table)

Hello, I am a basic beginner in R. I'm trying to perform a very simple aggregation of data. I have a table read into R with a column for Year and a column for Rainfall. I want to be able to total rainfall by year. Not sure how to do this. Can anyone help?

Hi Scott, welcome!

You could do something like this

library(dplyr)

your_dataframe %>% 
    group_by(Year) %>% 
    summarise(Rainfall = sum(Rainfall))

If you need more specific help please provide a minimal REPRoducible EXample (reprex). A reprex makes it much easier for others to understand your issue and figure out how to help.

If you've never heard of a reprex before, you might want to start by reading this FAQ:

Thank you so much for your help! It partially worked, but for some reason, its giving me the overall total for each year. Here is the code I used and the result:
Rtest

Please make your question with a reproducible example as requested before. I can't test your code from a screenshot (not a good thing to do here) but you shouldn't be using GBR$ when using piped syntax.

Is this what you're looking for? My apologies, I'm not that technical yet, just getting started.rtest2

If you remove the GBR$ from your group_by and sum function calls, it should fix this. Within the dplyr pipeline, you do not need to use the data.frame name. You can just call the variable names (i.e. X.Year and Rainfall..m.

Hi @ScottW ,

What @andresrcs is asking for is, rather than posting screenshots of your code/output, if you can can provide the code you're running (e.g. as text, you can copy it in to here in between two sets of ``` to format it as code). Specifically, it would be helpful to create an example of your code that we can reproduce on our own computers. That post that was linked to (FAQ: How to do a minimal reproducible example ( reprex ) for beginners) will show you how to make a redproducible example.

In the mean time, I can suggest that you just don't need to use GBR$... inside a pipeline using dplyr, you can just use the variables themselves. So your code would become:

GBR %>%
    group_by(X.Year) %>%
    summarise(Rainfall = sum(Rainfall..mm.))

Thanks everyone for all your help. I am now getting an error. Here is the code:
"GBR %>%
group_by(X.Year) %>%
summarise(Rainfall = sum(Rainfall..mm.))"

I am getting this error:
Error in GBR %>% group_by(X.Year) %>% summarise(Rainfall = sum(Rainfall..mm.)) :
could not find function "%>%"

Hi again,

To format your code as code, you can wrap it in 3 backticks (the ` symbol on your keyboard). You can read more about that in this help post: FAQ: How to make your code look nice? Markdown Formatting

The error your encountering (about not finding the %>% function) is probably due to you restarting R, and not attaching your packages again. Given that you're using dplyr functions, and the %>% function is provided by the dplyr package, make sure that you include library(dplyr) at the top of your code (i.e. before anything else) to attach the package and make it's functions available to you.

Thank you so much Jim, that worked!!! And just to be sure I'm following the right process in the future, is this what you meant about making the code clean?

GBR %>% 
  group_by(X.Year) %>%
  summarise(Rainfall = sum(Rainfall..mm.))```

For future reference this would be a proper reproducible example for your issue, you are supposed to ask questions providing sample data, library calls and relevant code to reproduce your issue. (all this is explained in the link I gave you before, you just have to read it more carefully)

# Library calls for the packages you are using
library(dplyr)

# Sample data on a copy/paste friendly format
GBR <- data.frame(stringsAsFactors = FALSE,
                  X.Year = c(1991, 1991, 1992, 1992),
                  Rainfall..mm. = c(123, 78, 200, 129))

# Relevant code that reproduces your issue
GBR %>% 
    group_by(X.Year) %>%
    summarise(Rainfall = sum(Rainfall..mm.))
#> # A tibble: 2 x 2
#>   X.Year Rainfall
#>    <dbl>    <dbl>
#> 1   1991      201
#> 2   1992      329

Created on 2019-07-24 by the reprex package (v0.3.0.9000)

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.