aggregate by two columns

Hello,

I have a large tabel with columns; "date" "location" "value" "file"
I would like to aggregate all the values within the same file on the same date but with different locations.
The table looks like below

 datetime            location value fn                
   <dttm>              <chr>    <dbl> <chr>             
 1 1996-01-01 00:00:00 inl_1     4200 2010-GF
 2 1996-01-02 00:00:00 inl_1     4400 2010-GF
 3 1996-01-03 00:00:00 inl_1     4350 2010-GF
 4 1996-01-04 00:00:00 inl_1     4000 2010-GF
 5 1996-01-01 00:00:00 inl_2     3510 2010-GF
 6 1996-01-02 00:00:00 inl_2     3560 2010-GF
 7 1996-01-03 00:00:00 inl_2     3500 2010-GF
 8 1996-01-04 00:00:00 inl_1     4000 2013-GF

What I would like to get is something like:

 datetime            location value fn                
   <dttm>              <chr>    <dbl> <chr>             
 1 1996-01-01 00:00:00 inl_1     7710 2010-GF
 2 1996-01-02 00:00:00 inl_1     7960 2010-GF
 3 1996-01-03 00:00:00 inl_1     7850 2010-GF
 4 1996-01-04 00:00:00 inl_1     4000 2010-GF
 8 1996-01-04 00:00:00 inl_1     4000 2013-GF

I have tried this but it doesnt work:

result <- aggregate(. ~datetime+fn, data = df, sum, na.rm=TRUE)

The error I get is:

Error in Summary.factor(2:1, na.rm = TRUE) : 
  ‘sum’ not meaningful for factors

Does anyone know?

You're getting an error because your LHS includes the variable location which cannot be summed. It'll work fine if you only supply value.

library(dplyr, warn.conflicts = FALSE)

df <- tribble(~ datetime, ~ location, ~ value, ~ fn,
              "1996-01-01 00:00:00", "inl_1", 4200, "2010-GF",
              "1996-01-02 00:00:00", "inl_1", 4400, "2010-GF",
              "1996-01-03 00:00:00", "inl_1", 4350, "2010-GF",
              "1996-01-04 00:00:00", "inl_1", 4000, "2010-GF",
              "1996-01-01 00:00:00", "inl_2", 3510, "2010-GF",
              "1996-01-02 00:00:00", "inl_2", 3560, "2010-GF",
              "1996-01-03 00:00:00", "inl_2", 3500, "2010-GF",
              "1996-01-04 00:00:00", "inl_1", 4000, "2013-GF")

df <- mutate(df, datetime = as.POSIXct(datetime))

aggregate(value ~ datetime + fn, data = df, sum)
#>     datetime      fn value
#> 1 1996-01-01 2010-GF  7710
#> 2 1996-01-02 2010-GF  7960
#> 3 1996-01-03 2010-GF  7850
#> 4 1996-01-04 2010-GF  4000
#> 5 1996-01-04 2013-GF  4000

Or you could use the tidyverse equivalent which is a lot nicer in my opinion.

df %>% 
  group_by(datetime, fn) %>% 
  summarise(total_value = sum(value))
#> `summarise()` regrouping output by 'datetime' (override with `.groups` argument)
#> # A tibble: 5 x 3
#> # Groups:   datetime [4]
#>   datetime            fn      total_value
#>   <dttm>              <chr>         <dbl>
#> 1 1996-01-01 00:00:00 2010-GF        7710
#> 2 1996-01-02 00:00:00 2010-GF        7960
#> 3 1996-01-03 00:00:00 2010-GF        7850
#> 4 1996-01-04 00:00:00 2010-GF        4000
#> 5 1996-01-04 00:00:00 2013-GF        4000

Created on 2020-07-15 by the reprex package (v0.3.0)

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.