Logistic Regression with panel data

Hello,
I have a dataset consisting of 16 serial numbers for which it was detected every month, starting from April 2016, whether a maintenance event has occurred or not. For each serial number I also have information on the Place, Process, number of cycles per day, months since installation and months since the last event.
I want to predict the likelihood that a generic machine will need maintenance next month. I thought about using a logistic regression model (0 = no maintenance vs 1 = maintenance). First I hope it is correct, then my doubt is about which R function to use since the dataset is of type panel!

Thanks to anyone who will be able to help me.
Ps. if necessary I can upload a sample dataset.

Hi,

Welcome to the RStudio community!

It would indeed be much easier if we had some data and code to work with, as ML input and performance really depends on the underlying structure of the data.

I suggest you try and create a reprex and post it here. A reprex consists of the minimal code and data needed to recreate the issue/question you're having. You can find instructions how to build and share one here:

Good luck,
PJ

summary(alimentazione)
t Matricola Luogo Processo N. cicli medi al giorno
Min. :2016-04-01 00:00:00 Length:1024 Length:1024 Length:1024 Min. : 7.00
1st Qu.:2017-07-24 06:00:00 Class :character Class :character Class :character 1st Qu.:19.25
Median :2018-11-16 00:00:00 Mode :character Mode :character Mode :character Median :31.50
Mean :2018-11-15 19:30:00 Mean :34.56
3rd Qu.:2020-03-08 18:00:00 3rd Qu.:44.25
Max. :2021-07-01 00:00:00 Max. :96.00
N. eventi occorsi (da aprile 2016 a oggi) Mesi trascorsi dal collaudo Mesi dall'evento precedente
Min. : 0.00 Min. : 1 Min. :-25.00
1st Qu.: 0.00 1st Qu.: 46 1st Qu.: 3.00
Median : 2.00 Median : 66 Median : 9.50
Mean : 2.87 Mean : 71 Mean : 24.55
3rd Qu.: 4.00 3rd Qu.: 89 3rd Qu.: 29.00
Max. :20.00 Max. :173 Max. :173.00
Yclass
Min. :0.00000
1st Qu.:0.00000
Median :0.00000
Mean :0.08203
3rd Qu.:0.00000
Max. :1.00000

str(alimentazione)
tibble [1,024 x 9] (S3: tbl_df/tbl/data.frame)
t : POSIXct[1:1024], format: "2016-04-01" "2016-05-01" "2016-06-01" "2016-07-01" ... Matricola : chr [1:1024] "C32006082" "C32006082" "C32006082" "C32006082" ...
Luogo : chr [1:1024] "Iran" "Iran" "Iran" "Iran" ... Processo : chr [1:1024] "Minerario estrattivo" "Minerario estrattivo" "Minerario estrattivo" "Minerario estrattivo" ...
N. cicli medi al giorno : num [1:1024] 96 96 96 96 96 96 96 96 96 96 ... N. eventi occorsi (da aprile 2016 a oggi): num [1:1024] 0 0 0 0 0 0 0 0 0 0 ...
Mesi trascorsi dal collaudo : num [1:1024] 105 106 107 108 109 110 111 112 113 114 ... Mesi dall'evento precedente : num [1:1024] 34 35 36 37 38 39 40 41 42 43 ...
$ Yclass : num [1:1024] 0 0 0 0 0 0 0 0 0 0 ...

This is the type of data I have to build my logistics model. For now I don't have a model code yet.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.