Hi everyone!
I have a question (maybe more theoretical than practical, but I really need a practical solution)
I have a dataset with a bunch of "nested"/"hierarchical" NAs, meaning variables where respondents who answered "No" to the previous question skip the following one (generating an NA in the dataset). For the NA part, I decided to model the NA as a new variable (given that it is a piece of information).
The only problem is that I have some features which are dummies from the same variable, for eg:
- Question K is a dummy where 100 respondents answered "No" so they have to skip question A
- Question A is divided into A1, A2, A3 and all three are dummies, and I have 100 NAs for all three.
Thus, they share the exact amount of NAs and those missing values carry the same amount of info. There is my question: How can I handle it?
The approach which I have tried so far is the following:
The other approach that I thought (to avoid personalised functions) is to use "step_unknown" but it would create a bunch of identical columns and I don't know if it could create too much noise (maybe using after that a "step_corr" the problem will disappear, but I don't know for sure if it is the right approach).
Can someone help me? Thank you very much in advance!