Hello. I have a question about modelling best practices when it comes to using a predictor that only applies to a given subpopulation of my dataset.
To be more specific, I am trying to predict
result_A of a patient based on
result_B. Here I also use other predictors such as
gender and so on that are also available for all rows (patients) in the table.
However, I also want to incorporate
previous_result_A to take into account clinical history which is obviously super important. The thing is not all rows will have
previous_result_A, because not all patients would have done the test before. I did a bit of snooping around here and am under the impression that all I need to do is to create another column with 0 for rows that don't have
previous_result_A and 1 otherwise, and then fill in the blanks in
previous_result_A with 0 or some constant.
As a beginner, I'd just like a bit of clarification that that is indeed the way to proceed. And also whether that approach is applicable to all models. Thanks.