Regression - How to deal with multiple measurements of the "same" variable


I have a quite generic question rather than a concrete problem with a code.

I am currently trying to create a regression model. For instance, I have 10 different sensors who all capture temperature data about a machine. there are other variables as well such as the year or establishment, location of the machine, et cetera. I wanna create a model which predicts the "error rate" of the machine. I would know how to approach this problem if there would be several distinct variables. But right now, I have 10 different sensors which capture the temperature of the machine in one day. What would be a wise approach to deal with this issue? Maybe take the average? or treat them separately?


As a start, you might want to include the measurements separately. Perhaps the sensors have somewhat different characteristics. You might then do a model in which you use the average temperature and see which model predicts better.

I can recommend partial least squares as a well-established way to include variables that are highly correlated in a regression. It's able to collapse multiple correlated predictor variables into a smaller number of latent variables. In this case, among your ~15 variables, it may be the case that only 3 latent variables captures the majority of information in the variables.

You do have to choose the number of latent variables to include. There are some heuristics to choose the appropriate number that may give you a simple solution, but the most rigorous way is to use cross-validation to determine the best number. I can recommend caret with method pls as a way to do this in R.

If your data is high-frequency, then each observation may have an autocorrelation with the recent observations. This may lead you to choose a cross-validation method that separates the cross-validation folds in time either with the timeslice argument in trainControl or simply by specifying the fold indices with the index argument to trainControl

Hello startz,

That sounds reasonable! That might be a start :slight_smile:

Hello Arthur.t,

That sounds very detailed. I will have a look at the components you have mentioned within your reply. Thank you very much for your thought I really do appreciate that! It seems you have a lot of experience with this topic. Might you be interested in having a look at my data set to help me break it down? Im quite overwhelmed right now :smiley: