Hi, that helps.
Let's step through the data, conceptually.
- Each observation is a row, containing variables.
- Three of the variables lie in longitude, latitude, elevation 3-space, or "location"
- One or more of the variables represent a classification based on your criteria for identifying an event. I'll call these "attributes."
- The same location may have multiple observations at different elevations and attributes. I'll refer to these as the "event."
- The goal is to classify the observations into an undetermined number of categories that each represents the same regional occurrence of some source process.
The data structure I've described is tidy in the sense that observations are rows and variables are columns.
Generically, this is a classification problem of high(ish) dimensional data. I'm going to assume that the number of observations n is greater than the dimensionality p.
I would begin with location to identify non-intersecting surfaces in 3-space. For this, the sf package will be helpful and the spatial tools it provides. Depending on the underlying process (such as an ashfall), these could be relatively flat, like sedimentary strata. If they are pro-glacial features, they could be relatively convoluted.
If your underlying process model supports it, in 4-space, adding relative time, the surfaces should be stratigraphic.
Next, you want to place each attribute in that 4-space. Your null hypothesis, H_0 is that the attributes are randomly distributed. Your alternative hypothesis, H_1 is that they are non-randomly distributed under a distribution to be determined.
In addition to K nearest neighbor and K means clustering, there's also principal component analysis, a method analogous to linear regression but in higher dimensions, seeking to find an orthogonal plane that minimizes variance.
I'm sorry to have to address your question at such a high level of generality. Representative data might help for me and others to make more specific suggestions.