I have a big data set with over 600 xy coordinates. Now I want to know the mean of all the distances between all combinations of points; so I want to calculate one number which is the mean. I have a lot of combinations and I can't use dist() and then calculate the average because i reached the maximum amount of point in the matrix.
I am doubtful that the problem is the sheer number of combinations, since 600 points has only 360,000 possible combinations, which would fit on most machines. Rather, I think your issue is the dimensions matrix that dist gives, since one row isn't necessarily equal to one column in terms of memory space.
Therefore, I think a better bet would might be to generate all of the pairs first - then you are in control of the dimensions of the resulting matrix. Here are a couple different ideas for how you might approach getting to the mean:
1. Generate all possible combinations of points, calculate the distance between them, and take the average of all of those distances:
2. Create the numerator and denominator elementwise without storing the combinations
One other option would be to break the problem down into two pieces: How to generate the combinations, and how to calculate the distance. Once you solve those two problems, finding the mean is really trivial. Working backwards, we realize that a mean is just \dfrac{sum}{n}, where sum is the sum of the distances calculated and n is the number of distances calculated. So if we are really memory constrained in solving the problem, we just won't store all of the combinations in memory and instead only increment sum and n for each combination. Here is an example of that: