How To Find Unique Centroids in R

I am working with the R programming language.

I am trying to calculate the geographic centroids of different polygons within Canada.

I downloaded the following shapefile and tried to calculate and visualize the centroids of each polygon:

library(dplyr)
library(sf)
library(data.table)
library(rvest)
library(leaflet)
library(ggplot2)
library(urltools)
library(leaflet.extras)
library(stringr)
library(magrittr)


# Download zip files
url_1 <- "https://www12.statcan.gc.ca/census-recensement/alternative_alternatif.cfm?l=eng&dispext=zip&teng=lada000b21a_e.zip&k=%20%20%20151162&loc=//www12.statcan.gc.ca/census-recensement/2021/geo/sip-pis/boundary-limites/files-fichiers/lada000b21a_e.zip"


download.file(url_1, destfile = "lada000b21a_e.zip")

# Extract zip files
unzip("lada000b21a_e.zip")

# Read shapefiles
ada <- st_read("lada000b21a_e.shp")

shapefile_1 = ada %>% st_transform(32617)
#sf_cent <- st_centroid(shapefile_1)

sf_cent <- st_point_on_surface(shapefile_1)

# Transform the centroids to the WGS84 CRS
sf_cent_geo <- st_transform(sf_cent, crs = 4326)


# Extract the longitude and latitude coordinates of the centroids
lon <- st_coordinates(sf_cent_geo)[,1]
lat <- st_coordinates(sf_cent_geo)[,2]

ADAUID <- sf_cent_geo$ADAUID
lon <- st_coordinates(sf_cent_geo)[,1]
lat <- st_coordinates(sf_cent_geo)[,2]

shapefile_1 = ada %>% st_transform(32617)
sf_cent <- st_centroid(ada)

ggplot() + 
    geom_sf(data = shapefile_1, fill = 'white') +
    geom_sf(data = sf_cent, color = 'red') 

However, when I examine the results:

Problem: When I examine the results, I see that at times there are multiple centroids within each polygon.

I tried to do some research and consult other references on this topic, but so far I am unable to figure out how to resolve this problem.

For logical purposes, I am trying to only have one centroid in each polygon.

Can someone please show me how to fix this?

I looked at the documentation and theres an st_centroid you could try here instead

1 Like

I think the issue comes down to a matter of scale of what can be seen in a plot. The spatial data in the URL you provided consist of 5433 multipolygons keyed by the ADAUID field, and so you would expect 5433 centroids as well.

Simple feature collection with 5433 features and 4 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: 3689321 ymin: 659305 xmax: 9015751 ymax: 5242009
Projected CRS: NAD83 / Statistics Canada Lambert

If you look closely at the Yukon, there are at least 35 multipolygons identified by their ADAUID, which matches what you've shown.

If you wanted to aggregate up to a larger spatial scale, with using the first 5 digits of the ASAUID, for example:

ada_aggregate <- ada %>%
  mutate(group_id = substr(ADAUID, 1, 5)) %>%
  group_by(group_id) %>%
  summarise(area = sum(LANDAREA),
    geometry = st_union(geometry))

ada_centroid <- st_centroid(ada_aggregate)
ada_centroid

g <- ggplot() +
  geom_sf(data = ada_aggregate) +
  geom_sf(data = ada_centroid, color = "red", size = 0.1) +
  theme_void()

1 Like

@ jrmuirhead: thank you for your answer! Just to clarify - the code I had originally provided is correct? It has in fact identified one centroid per polygon, you just have to zoom in to see them? thank you so much!

Hi @omario
I probably would have gone with st_centroid() instead of `st_point_on_surface(), but otherwise it does return 1 centroid per multipolygon.

1 Like

@ jrmuirhead: thank you so much for your reply! So if I were to use st_centroid(), this would return 1 centroid per polygon?

Yes, it would return 1 centroid per polygon. st_point_on_surface would also return 1 point per polygon except it would be constrained so that the point lies within the boundaries of one of the polygons.

This topic was automatically closed 42 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.