I am confused about the advantages of using recipe steps for data transform as opposed to modifying the data itself.
for example, if I have a process like:
- Get data
- Simple cleaning
- Split
- Explore training data
And this process leads me to believe that I want to log transform my dependent variable, what is the advantage of adding to a recipe step_log(y), as compared to adding a mutate(y=log(y)) to my Simple cleaning process above and then rerunning Split.
It is easier to make sure things are going as intended if you modify the actual data, I think. I do see that there are some very handy recipe steps, so that is an advantage, are there others? A disadvantage is that it is harder to evaluate choices (e.g. picking parameters for step_other).
Thanks for your help,
David