Average of data and creation of a dataset

#1

Hi everyone, I'm new to R and English is not my first language, but hopefully
this will not end up as bad as I fear.

For my PhD thesis, I have to average some data inside an excel file and I'm
trying to do things a bit faster by using R. The excel file looks something
like this

Year | Sodium | Ammonium
2009 | value | value
2008.6 | value | value
2008.3 | value | value
2008 | value | value
2007.5 |value | value
2007 | value | value

With various concentration of different ions for each years. I now need the
average of each ions' concentration for each year (so like, averaging each
values for Sodium between 2009 and 2008, between 2008 and 2007 and so on...
it goes all the way down to 1200 so you can see why I really don't want to
do it by hand)... but now I'm completely lost.
I initially thought to create a loop, but I can't see how I could write it
down.
Any suggestion?

0 Likes

#2

Welcome to R and to the RStudio Community Raffaello. It would be helpful if you provide your data and the code you've tried in reproducible form. For now, here's a made-up example that I hope will address your problem. I've assumed that you've already loaded your data into R, but if you need help with that, please let us know.

library(tidyverse)

# Fake data
set.seed(2)
d = data.frame(Year=runif(50, 2010, 2015),
               Sodium=runif(50),
               Ammonium=runif(50))

ion.means = d %>% 
  # Reshape data to "long" format
  gather(ion, value, -Year) %>% 
  # Round to integer year and group by year and ion
  group_by(Year=round(Year), ion) %>% 
  # Calculate mean concentration by year and ion
  summarise(mean = mean(value, na.rm=TRUE)) %>% 
  # Reshape back to "wide" format
  spread(ion, mean)
ion.means 

          Year Ammonium Sodium
1         2010    0.388  0.208
2         2011    0.566  0.470
3         2012    0.473  0.488
4         2013    0.352  0.543
5         2014    0.559  0.414
6         2015    0.534  0.535
0 Likes

#3

To add upon Joels' solution, maybe you are facing issues with your Year format so, this would be a possible solution to that.

library(tidyverse)
set.seed(2)
df <- data.frame(stringsAsFactors = FALSE,
                 Year = c("2009", "2008.6", "2008.3", "2008", "2007.5", "2007"),
                 Sodium = runif(6),
                 Ammonium = runif(6))
df %>%
    # Extract just the year part before the dot
    mutate(Year = as.numeric(str_remove(Year, "\\.\\d\\d?"))) %>% 
    gather(ion, value, -Year) %>% 
    group_by(Year=round(Year), ion) %>% 
    summarise(mean = mean(value, na.rm=TRUE)) %>% 
    spread(ion, mean)
#> # A tibble: 3 x 3
#> # Groups:   Year [3]
#>    Year Ammonium Sodium
#>   <dbl>    <dbl>  <dbl>
#> 1  2007    0.396  0.944
#> 2  2008    0.617  0.481
#> 3  2009    0.129  0.185
0 Likes

closed #4

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.

0 Likes