Hello, I am a basic beginner in R. I'm trying to perform a very simple aggregation of data. I have a table read into R with a column for Year and a column for Rainfall. I want to be able to total rainfall by year. Not sure how to do this. Can anyone help?
Hi Scott, welcome!
You could do something like this
library(dplyr) your_dataframe %>% group_by(Year) %>% summarise(Rainfall = sum(Rainfall))
If you need more specific help please provide a minimal REPRoducible EXample (reprex). A reprex makes it much easier for others to understand your issue and figure out how to help.
If you've never heard of a reprex before, you might want to start by reading this FAQ:
Thank you so much for your help! It partially worked, but for some reason, its giving me the overall total for each year. Here is the code I used and the result:
Please make your question with a reproducible example as requested before. I can't test your code from a screenshot (not a good thing to do here) but you shouldn't be using
GBR$ when using piped syntax.
Is this what you're looking for? My apologies, I'm not that technical yet, just getting started.
If you remove the
GBR$ from your
sum function calls, it should fix this. Within the
dplyr pipeline, you do not need to use the
data.frame name. You can just call the variable names (i.e.
Hi @ScottW ,
What @andresrcs is asking for is, rather than posting screenshots of your code/output, if you can can provide the code you're running (e.g. as text, you can copy it in to here in between two sets of ``` to format it as code). Specifically, it would be helpful to create an example of your code that we can reproduce on our own computers. That post that was linked to (FAQ: How to do a minimal reproducible example ( reprex ) for beginners) will show you how to make a redproducible example.
In the mean time, I can suggest that you just don't need to use
GBR$... inside a pipeline using
dplyr, you can just use the variables themselves. So your code would become:
GBR %>% group_by(X.Year) %>% summarise(Rainfall = sum(Rainfall..mm.))
Thanks everyone for all your help. I am now getting an error. Here is the code:
summarise(Rainfall = sum(Rainfall..mm.))"
I am getting this error:
Error in GBR %>% group_by(X.Year) %>% summarise(Rainfall = sum(Rainfall..mm.)) :
could not find function "%>%"
To format your code as code, you can wrap it in 3 backticks (the ` symbol on your keyboard). You can read more about that in this help post: FAQ: How to make your code look nice? Markdown Formatting
The error your encountering (about not finding the
%>% function) is probably due to you restarting R, and not attaching your packages again. Given that you're using
dplyr functions, and the
%>% function is provided by the
dplyr package, make sure that you include
library(dplyr) at the top of your code (i.e. before anything else) to attach the package and make it's functions available to you.
Thank you so much Jim, that worked!!! And just to be sure I'm following the right process in the future, is this what you meant about making the code clean?
GBR %>% group_by(X.Year) %>% summarise(Rainfall = sum(Rainfall..mm.))```
For future reference this would be a proper reproducible example for your issue, you are supposed to ask questions providing sample data, library calls and relevant code to reproduce your issue. (all this is explained in the link I gave you before, you just have to read it more carefully)
# Library calls for the packages you are using library(dplyr) # Sample data on a copy/paste friendly format GBR <- data.frame(stringsAsFactors = FALSE, X.Year = c(1991, 1991, 1992, 1992), Rainfall..mm. = c(123, 78, 200, 129)) # Relevant code that reproduces your issue GBR %>% group_by(X.Year) %>% summarise(Rainfall = sum(Rainfall..mm.)) #> # A tibble: 2 x 2 #> X.Year Rainfall #> <dbl> <dbl> #> 1 1991 201 #> 2 1992 329
Created on 2019-07-24 by the reprex package (v0.3.0.9000)
This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.