aggregate with conditions

Hi All,
I am trying to add Temp in such a way that for a given event it will sum till Duration=1.
If Duration>1 for a given Event, it will start from the next row to add up.
I want to get as provided in the expected outcome.

rm(list = ls())
time=c("9/3/1976 23:00","9/3/1976 23:15",	"9/4/1976 2:15","9/4/1976 15:30",	"9/4/1976 15:45",	"9/4/1976 16:15",
      "9/6/1976 13:45"	,"9/6/1976 14:00",	"9/6/1976 14:15",	"9/6/1976 14:30",	"9/6/1976 15:00",	"9/6/1976 19:30",
      "9/6/1976 19:45")
Temp=as.numeric(c(2.54,5.08,2.54,5.18,2.54,2.54,2.54,12.7,
                  2.54,2.54,2.54,2.54,2.54))
Duration=as.numeric(rep(0.25,13))
Event=as.numeric(c(1,1,1,2,2,2,3,3,3,3,3,3,3))
df=cbind(time,Temp,Duration,Event)
df=as.data.frame(df)
df$time=as.POSIXct(df$time,format="%m/%d/%Y %H:%M") 
df$month=format(as.Date(df$time, format="%m/%d/%Y %H:%M"),"%m")
head(df)

df$timescale=as.numeric(df$timescale)
head(df)

#Expected Outcome
Temp	Duration	Event	month
10.16	 0.75	     1	   9
10.26	 0.75	     2	   9
20.32	  1	         3	   9
7.62	 0.75	     3	   9

Thanks for your time.

Hi meitei, welcome to RStudio Community.

First of all, thank you for sharing a properly-formatted reproducible example (or reprex). Makes it really easy for others to help you. However, a couple of suggestions:

  1. Please do not include calls like rm(list = ls()) when sharing code. Nuking other people's environments is not polite.

  2. Although you took steps to format your vectors with the correct data types, this was lost when you used cbind(). As a result, variables such as Temp and Duration in df are character vectors. Instead, I'd recommend using data.frame() as shown below.

My solution to your puzzle isn't super elegant but it does give the result you're looking for.

library(dplyr, warn.conflicts = FALSE)

df <- data.frame(
  time = as.POSIXct(c(
    "9/3/1976 23:00", "9/3/1976 23:15", "9/4/1976 2:15", "9/4/1976 15:30", "9/4/1976 15:45", "9/4/1976 16:15",
    "9/6/1976 13:45", "9/6/1976 14:00", "9/6/1976 14:15", "9/6/1976 14:30", "9/6/1976 15:00", "9/6/1976 19:30",
    "9/6/1976 19:45"), format = "%m/%d/%Y %H:%M"),
  Temp = c(2.54, 5.08, 2.54, 5.18, 2.54, 2.54, 2.54, 12.7, 2.54, 2.54, 2.54, 2.54, 2.54),
  Duration = rep(0.25, 13),
  Event = c(1, 1, 1, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3)
)

df$month = format(df$time, "%m")

head(df)
#>                  time Temp Duration Event month
#> 1 1976-09-03 23:00:00 2.54     0.25     1    09
#> 2 1976-09-03 23:15:00 5.08     0.25     1    09
#> 3 1976-09-04 02:15:00 2.54     0.25     1    09
#> 4 1976-09-04 15:30:00 5.18     0.25     2    09
#> 5 1976-09-04 15:45:00 2.54     0.25     2    09
#> 6 1976-09-04 16:15:00 2.54     0.25     2    09

df %>% 
  group_by(Event) %>% 
  mutate(Event_New = if_else(cumsum(Duration) > 1, Event + 0.1, Event)) %>% 
  group_by(Event_New, .add = TRUE) %>% 
  summarise(Temp = sum(Temp), Duration = sum(Duration), month = first(month), .groups = "drop") %>% 
  select(-Event_New)
#> # A tibble: 4 x 4
#>   Event  Temp Duration month
#>   <dbl> <dbl>    <dbl> <chr>
#> 1     1 10.2      0.75 09   
#> 2     2 10.3      0.75 09   
#> 3     3 20.3      1    09   
#> 4     3  7.62     0.75 09

Created on 2020-07-24 by the reprex package (v0.3.0)

1 Like

Thanks, @siddharthprabhu. Here, I am able to get it what I was expecting.
However, instead of summarising. How can I export as a data frame or a CSV file?

Tibbles are very similar to data frames. Just assign the last block of code to a variable and then you can export it using write.csv() or other methods.

1 Like

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.