# average distance between all combinations of xy coordinates in data set

I have a big data set with over 600 xy coordinates. Now I want to know the mean of all the distances between all combinations of points; so I want to calculate one number which is the mean. I have a lot of combinations and I can't use dist() and then calculate the average because i reached the maximum amount of point in the matrix.

What are the maximum points of a matrix ?

i don't know but it says it exceeded it, can you help me?

I suppose theres something about how you described your challenge that I'm not picking up on. Perhaps you can say more about it.

``````set.seed(42)
d1 <- data.frame(x=rnorm(700),
y=rnorm(700))

(dist_of_d1 <- dist(d1))

mean(dist_of_d1)
``````

This example involves over 600pairs (700 to be precise).
The dist_of_d1 object is about 2mb

I am doubtful that the problem is the sheer number of combinations, since 600 points has only 360,000 possible combinations, which would fit on most machines. Rather, I think your issue is the dimensions matrix that `dist` gives, since one row isn't necessarily equal to one column in terms of memory space.
Therefore, I think a better bet would might be to generate all of the pairs first - then you are in control of the dimensions of the resulting matrix. Here are a couple different ideas for how you might approach getting to the mean:

### 1. Generate all possible combinations of points, calculate the distance between them, and take the average of all of those distances:

``````library(dplyr)
library(tidyr)

coords <- lapply(
1:600,
function(x) {
data.frame(x = rnorm(1), y = rnorm(1))
}
)

combos <- expand.grid(
coords,
coords
) %>%
unnest(everything(), names_repair = 'unique') %>%
rename(
'x1' = 1,
'y1' = 2,
'x2' = 3,
'y2' = 4
)
#> New names:
#> * x -> x...1
#> * y -> y...2
#> * x -> x...3
#> * y -> y...4

mean_dist <- combos %>%
mutate(
distance = sqrt((x2 - x1)^2 + (y2 - y1)^2)
) %>%
pull(distance) %>%
mean

mean_dist
#> [1] 1.833072

``````

Created on 2022-05-05 by the reprex package (v1.0.0)

### 2. Create the numerator and denominator elementwise without storing the combinations

One other option would be to break the problem down into two pieces: How to generate the combinations, and how to calculate the distance. Once you solve those two problems, finding the mean is really trivial. Working backwards, we realize that a mean is just \dfrac{sum}{n}, where `sum` is the sum of the distances calculated and `n` is the number of distances calculated. So if we are really memory constrained in solving the problem, we just won't store all of the combinations in memory and instead only increment `sum` and `n` for each combination. Here is an example of that:

``````numerator <- 0
denominator <- 0

for(coord1 in coords) {
for(coord2 in coords) {
distance <- sqrt((coord2\$x - coord1\$x)^2 + (coord2\$y - coord1\$y)^2)
numerator <- numerator + distance
denominator <- denominator + 1
}
}
numerator / denominator
#> [1]  1.833072
``````

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.