Constrained/capacitated cluster analysis with two variables (demand and coordinates)

frederikkeoe · May 4, 2022, 8:17am

Hi.
What I am trying to achieve
I have 260 demand points (postal codes/GPS coordinates) with an associated daily demand in kg. I have trucks with a capacity of 12000 kg each. I want to cluster the data, so that one truck is able to satisfy the demand of each cluster (in order to achieve high capacity utilization), and so that the distance between the demand points is as small as possible. So basically I am trying to cluster the demand points so that the points near each other are clustered, but with a constraint of 12000 kg (see example of the data below)

I have tried searching all over the internet, but I am not finding anything specifically on how to achieve this. I previously made a succesful cluster analysis only based on the coordinates, but I really need some advice on how to incorporate this volume constraint. Thank you in advance!

|Postal code|Avg. Daily volume|
|1095|23.328675|
|1100|16.08681944|
|1113|13.77077778|
|1119|347.8668778|
|1171|430.9313806|
|1264|578.0968778|
|1300|255.2149139|
|1304|281.2537806|
|1353|423.977775|
|1358|524.1165556|
|1360|88.839|
|1362|41.09620833|
|1366|121.088375|
|1400|10.13843889|
|1420|282.7263889|
|1427|7.380477778|
|1436|566.0231|

nirgrahamuk · May 4, 2022, 9:00am

perhap you can share that approach here, if you are hoping to augment it rather than throw it away and start from scratch, this would seem to save a forum user that might try to help you from reinventing the wheel and doing more work to support you than might be strictly necessary.

I would think that the GPS coordinates would be critical data, as the postal codes alone wont serve the task, so I would assume that the example data you provided is insufficient for me to begin working with from scratch even if I wished to, could you review that also ?

frederikkeoe · May 4, 2022, 9:08am

As far as I've understood from my research online, I won't be able to reuse what I already made, since I used k-means clustering and that doesn't allow for constraints. Also, unfortunately, due to being new in Rstudio I have not been able to figure out how to save the code I'm writing, so I only have the results.

I am not looking for someone to do the task for me since that would be too much to ask, I am only looking for advice on which direction to go in or example code that I can use.

Here is an example with GPS coordinates, and daily volume. The volume numbers are randomly generated, since I'm not allowed to share that detailed information with GPS coordinates and demand.

|Volume|X|Y|
|97|12.51967|55.68966|
|63|11.76248632|55.22817757|
|48|12.597756|55.658126|
|51|12.5775499|55.714389|
|51|11.96025237|55.32906261|
|73|11.56769|55.462895|
|98|11.69224481|55.35309093|
|62|12.1439072|55.4160844|
|85|11.70916898|55.75662893|
|40|12.053885|55.63474|
|96|11.7606181|55.23246115|
|100|11.2906625|55.5039675|
|100|11.6377104|54.79746005|
|37|12.09508|55.567875|
|75|12.14944027|55.73692911|
|64|12.60486333|55.6565721|
|39|12.586355|55.6852|
|62|12.526795|55.64821|
|30|12.07475342|56.05974|
|66|12.5574588|55.668512|
|54|12.3786367|55.8142673|
|65|11.60594433|55.93335956|
|85|11.22897852|55.53814945|
|51|11.67932788|55.84426566|
|58|12.107238|55.750439|
|86|12.28674318|55.73087612|
|70|12.22552561|55.57142777|
|62|12.5378231|55.674093|
|99|12.17486523|55.47872497|
|42|12.55751|55.668815|
|96|12.51410104|55.69227432|
|45|11.75333045|55.21850789|

nirgrahamuk · May 4, 2022, 9:27am

How about doing your traditional position-only cluster analysis.
Then summing up the weight of your final clusters.
Any final cluster that is over the max weight, you then look for how to split ?
It would then be a question of playing with the hyperparameters of your initial clustering, and your second split phase, to find an optimal balance. I'd maybe just use optim() for that, meta process

frederikkeoe · May 4, 2022, 9:38am

I think that can be a solution if I don't find a more structured approach. However, my previous k-means clustering analysis which was based only on location resulted in only four clusters.

The total daily volume is around 600,000 kg, which equals to about 50 clusters, if each cluster has a volume constraint of 12,000 kg. So it will probably be really complex to divide them manually, since the volume can also vary from demand point to demand point. I'm really unfamiliar with R and pressured on time, so this process would be me exporting the clusters to excel and trying to divide them in there, probably not what you were suggesting.

nirgrahamuk · May 4, 2022, 11:47am

Capacitated Vehicle Routing Problem

Perhaps you can adapt the work here.
RPubs - Product Distribution with Nearest Neighbour and Genetic Algorithm.

Also try searching vrp and cvrp

frederikkeoe · May 4, 2022, 1:00pm

Thank you! I actually think that this is what I need and I've been searching for the completely wrong things the entire day.

frederikkeoe · May 5, 2022, 11:09am

Just wanted to let you know that I've now had the chance to look through what you sent, and you really saved the day this was exactly what I was looking for, and with both raw data and code examples, thank you so much.

system · May 12, 2022, 11:09am

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.