Which regression algorithm could be applied for correcting sensor values?

Please consider the sample dataset below.

In simple terms,
Sensor is defective and hence measured incorrect values since 2000 and we have the data for 10 years with both: measured and actual.

P.S. Although we dont have data for each combination of the application and sensor type on monthly basis.

Now, we want to have the actual from the algorithm for actual values.

We tried, XGBoost and CatBoost by creating another column named diff = measured- actual
and fed to the algorithm to identify the pattern. but not sure which algorithm is appropriate although suspecting Neural network or Time series (ARIMA) could work but not sure
because we have just 10 years data on monthly level

library(tidyverse)

train_data <- data.frame(
  time = c(rep("01.2000",10),rep("02.2000",10),rep(".",3),rep("11.2010",10),rep("12.2010",10)),
  application = c(rep("factory",4),rep("residential",3),rep("research",3),
                  rep("factory",2),rep("residential",5),rep("research",3),
                  rep(".",3),
                  rep("factory",2),rep("residential",2),rep("research",6),
                  rep("factory",7),rep("residential",1),rep("research",2)),
  sensor = c(LETTERS[1:10],LETTERS[10:1],rep(".",3),LETTERS[c(5:1,10:6)],LETTERS[c(3:9,2,1,10)]), 
  measured = c(26.4,2000,1001,23.9,100000,0,1234,12098,34567,0,
               123,676,12,0,100,0,0,98,1,190,
               rep(".",3),
               3454,0,101,9,1,0,14,1298,677,0,
               264,20220,1851,3.9,1044,0,1764,0,34,0),
  actual =  c(26.4,2010,1001,23.9,100100,237,1234,12098,34567,19583,
              123,706,1112,156,100,650,109,98,10,190,
              rep(".",3),
              3454,10,101,19,10,40,44,1298,760,50,
              264,20220,1851,39,1048,870,1765,40,35,1110)
)

# to forecast actual 
test_data <- data.frame(
  time = rep("01.2011",10),
  application = c(rep("factory",7),rep("residential",1),rep("research",2)),
  sensor = LETTERS[c(1,4,5,9,3,2,8,6,7,10)], 
  measured = c(26.4,100000,0,0,
               123,12,
               3454,0,20220,1851)
)

How can we predict/forecast the actual values for 01.2011 data (test_data) ?

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.