Help with prediction model imputing missing values

I am trying to create a prediction model that predicts daily electric usage based on daily temperature highs per customer. My hope is that with forecasted weather data I can predict future usage. Once I have those predicted values I can then find the daily predicted billing for those days and aggregate those values to predict monthly billing. I need help with combining the predicted values with my already existing table and allowing for predicted values to be replaced when data is refreshed. The data will have this structure:
Date: Usage: TempHigh:
01-june 27 94
02-june 32 101
... ... ....

I am not sure what your work flow is but maybe the following example will help you update your data. The data start with actual values on the 1st and 2nd and predicted values on the 3rd, 4th and 5th. New actual values come in for the 3rd and 4th, so those values get updated but the predicted value for the 5this preserved.

library(dplyr)

#data for June 3, 4 and 5 are predicted
ActualAndPredicted <- data.frame(Date = as.Date(c("2022-06-01","2022-06-02",
                                           "2022-06-03", "2022-06-04",
                                           "2022-06-05")),
                                 Usage = c(25, 27,24,33, 29),
                                 TempHigh = c(90, 92, 88, 102, 99))
ActualAndPredicted
#>         Date Usage TempHigh
#> 1 2022-06-01    25       90
#> 2 2022-06-02    27       92
#> 3 2022-06-03    24       88
#> 4 2022-06-04    33      102
#> 5 2022-06-05    29       99

#Includes new data for June 3 and 4
ActualTempAndUsage <- data.frame(Date = as.Date(c("2022-06-01","2022-06-02",
                                                  "2022-06-03", "2022-06-04")),
                                 Usage = c(25,27, 26, 29),
                                 TempHigh = c(90, 92, 90, 100))
ActualTempAndUsage
#>         Date Usage TempHigh
#> 1 2022-06-01    25       90
#> 2 2022-06-02    27       92
#> 3 2022-06-03    26       90
#> 4 2022-06-04    29      100

#left join the updated actual data
ActualAndPredicted <- left_join(ActualAndPredicted,ActualTempAndUsage,
                               by = "Date", suffix = c("", "_Act"))
ActualAndPredicted
#>         Date Usage TempHigh Usage_Act TempHigh_Act
#> 1 2022-06-01    25       90        25           90
#> 2 2022-06-02    27       92        27           92
#> 3 2022-06-03    24       88        26           90
#> 4 2022-06-04    33      102        29          100
#> 5 2022-06-05    29       99        NA           NA

#Update predicted values with actual values
ActualAndPredicted <- ActualAndPredicted |> 
  mutate(Usage = ifelse(!is.na(Usage_Act), Usage_Act, Usage),
         TempHigh = ifelse(!is.na(TempHigh_Act), TempHigh_Act, TempHigh)) |> 
  select(Date, Usage, TempHigh)

ActualAndPredicted
#>         Date Usage TempHigh
#> 1 2022-06-01    25       90
#> 2 2022-06-02    27       92
#> 3 2022-06-03    26       90
#> 4 2022-06-04    29      100
#> 5 2022-06-05    29       99

Created on 2022-08-26 by the reprex package (v2.0.1)

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.