Hierarchical and Nested NAs in Tidymodels

Hi everyone!

I have a question (maybe more theoretical than practical, but I really need a practical solution)

I have a dataset with a bunch of "nested"/"hierarchical" NAs, meaning variables where respondents who answered "No" to the previous question skip the following one (generating an NA in the dataset). For the NA part, I decided to model the NA as a new variable (given that it is a piece of information).

The only problem is that I have some features which are dummies from the same variable, for eg:

  • Question K is a dummy where 100 respondents answered "No" so they have to skip question A
  • Question A is divided into A1, A2, A3 and all three are dummies, and I have 100 NAs for all three.

Thus, they share the exact amount of NAs and those missing values carry the same amount of info. There is my question: How can I handle it?

The approach which I have tried so far is the following:

The other approach that I thought (to avoid personalised functions) is to use "step_unknown" but it would create a bunch of identical columns and I don't know if it could create too much noise (maybe using after that a "step_corr" the problem will disappear, but I don't know for sure if it is the right approach).

Can someone help me? Thank you very much in advance!

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.