Hello. I have a question about modelling best practices when it comes to using a predictor that only applies to a given subpopulation of my dataset.
To be more specific, I am trying to predict result_A
of a patient based on result_B
. Here I also use other predictors such as age
, gender
and so on that are also available for all rows (patients) in the table.
However, I also want to incorporate previous_result_A
to take into account clinical history which is obviously super important. The thing is not all rows will have previous_result_A
, because not all patients would have done the test before. I did a bit of snooping around here and am under the impression that all I need to do is to create another column with 0 for rows that don't have previous_result_A
and 1 otherwise, and then fill in the blanks in previous_result_A
with 0 or some constant.
As a beginner, I'd just like a bit of clarification that that is indeed the way to proceed. And also whether that approach is applicable to all models. Thanks.