Hi!
The problem with your code is that you are not telling aggregate to take into account the station codes at all. So why do you think it will do so all by itself? If I understand your question correctly, then this might help:
set.seed(seed = 48250)
fake_dataset <- data.frame(ozone_concentrations = rexp(n = 100),
station_codes = sample.int(n = 10,
size = 100,
replace = TRUE),
date_times = sample(x = seq(from = as.POSIXct(x = "2000-01-01 00:00:00"),
to = as.POSIXct(x = "2005-12-31 23:59:59"),
by = "sec"),
size = 100,
replace = TRUE))
aggregate(formula = (ozone_concentrations ~ station_codes + years),
data = transform(`_data` = fake_dataset,
years = substr(x = date_times,
start = 1,
stop = 4)),
FUN = sum)
#> station_codes years ozone_concentrations
#> 1 2 2000 0.064241865
#> 2 3 2000 3.391680050
#> 3 5 2000 1.000316242
#> 4 6 2000 0.561307493
#> 5 7 2000 3.528994244
#> 6 8 2000 2.756995374
#> 7 9 2000 5.609195160
#> 8 10 2000 1.264838466
#> 9 1 2001 1.779377927
#> 10 2 2001 2.289763227
#> 11 3 2001 2.161737537
#> 12 4 2001 1.117474744
#> 13 5 2001 1.599183055
#> 14 6 2001 0.150858606
#> 15 9 2001 3.168106966
#> 16 2 2002 1.011305285
#> 17 3 2002 7.791112683
#> 18 4 2002 0.005795882
#> 19 5 2002 3.390604591
#> 20 6 2002 1.898864461
#> 21 7 2002 0.155775124
#> 22 8 2002 3.111518933
#> 23 9 2002 1.935719344
#> 24 1 2003 3.530518554
#> 25 3 2003 3.136105068
#> 26 4 2003 13.354189556
#> 27 5 2003 0.291728424
#> 28 6 2003 0.503696389
#> 29 7 2003 0.740147266
#> 30 8 2003 1.442953513
#> 31 10 2003 3.393911725
#> 32 1 2004 3.108035615
#> 33 2 2004 0.719082672
#> 34 3 2004 0.135565576
#> 35 4 2004 0.349281420
#> 36 5 2004 0.629870183
#> 37 6 2004 3.963184938
#> 38 7 2004 1.852908889
#> 39 8 2004 3.048067579
#> 40 9 2004 2.390117817
#> 41 10 2004 1.088342140
#> 42 1 2005 0.238723134
#> 43 3 2005 0.398160109
#> 44 4 2005 1.044262535
#> 45 6 2005 3.298614198
#> 46 7 2005 2.816752607
#> 47 8 2005 5.153532699
#> 48 9 2005 1.522409497
#> 49 10 2005 1.115334160
Created on 2019-12-30 by the reprex package (v0.3.0)