Should I still do an initial split for spatial data when modeling?

Hello, I came across the {spatialsample} package, which provides analogous functions to {rsample} and noticed it provides a cross-validation resampling method for spatial data, but not a method for doing an initial_split. I wasn't sure if this was by design or due to the package being relatively new and it just hasn't been implemented yet. Either way, at this time would it be better to:

  • Use a generic initial_split() to make training and test sets even though it won't account for spatial correlation
  • Not perform an initial_split(), use the entire data set in downstream training and resampling steps, and sacrifice the ability to do final performance assessment with a separate test set

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.