I started working with a shapefile in R. In this shapefile, each "boundary" is uniquely defined by a value in "col1" (e.g. ABC111, ABC112 , ABC113, etc.):
library(sf)
library(igraph)
library(spdeb)
sf <- sf::st_read("C:/Users/me/OneDrive/Documents/shape5/myshp.shp", options = "ENCODING=WINDOWS-1252")
head(sf)
Simple feature collection with 6 features and 3 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: 7201955 ymin: 927899.4 xmax: 7484015 ymax: 1191414
Projected CRS: PCS_Lambert_Conformal_Conic
col1 col2 col3 geometry
620 ABC111 99 Region1 MULTIPOLYGON (((7473971 119...
621 ABC112 99 Region1 MULTIPOLYGON (((7480277 118...
622 ABC113 99 Region1 MULTIPOLYGON (((7477124 118...
627 ABC114 99 Region1 MULTIPOLYGON (((7471697 118...
638 ABC115 99 Region1 MULTIPOLYGON (((7209908 928...
639 ABC116 99 Region1 MULTIPOLYGON (((7206683 937...
> dim(sf)
[1] 500 4
I converted this shapefile into an adjacency matrix and an edge list:
mat <- nb2mat(poly2nb(sf), style = "B")
g1 <- graph_from_adjacency_matrix(mat)
g2 <- as_edgelist(g1)
>g1
IGRAPH 43082db D--- 513 2880 --
+ attr: color (v/c)
+ edges from 43082db:
[1] 1-> 3 1-> 4 1-> 37 1-> 38 1-> 40 1-> 43 1-> 62 1->126 1->197 2-> 3 2-> 24 2-> 37 2-> 38 2->125 3-> 1 3-> 2 3-> 38 3->125 3->197 3->198 3->241 3->265
[23] 4-> 1 4-> 43 4-> 44 4-> 62 4->126 5-> 7 5->408 5->409 5->410 5->478 6-> 7 6->150 6->153 6->291 6->411 6->476 7-> 5 7-> 6 7->168 7->169 7->170 7->291
[45] 7->410 7->476 7->477 7->478 8-> 11 8-> 21 8->213 8->214 8->454 8->489 8->490 9-> 11 9-> 12 9-> 14 9-> 15 9-> 49 9->159 9->161 9->164 9->211 9->212 9->213
[67] 9->223 9->324 9->325 9->326 9->336 9->337 9->343 9->379 9->380 9->383 9->384 9->385 9->386 9->387 9->390 9->395 9->396 9->397 9->413 9->461 9->464 9->465
[89] 9->470 9->471 9->511 9->512 10-> 12 10-> 13 10-> 49 10-> 50 10->210 10->211 10->342 11-> 8 11-> 9 11-> 14 11->213 11->343 11->380 11->454 11->461 11->490 11->491 11->502
[111] 12-> 9 12-> 10 12-> 13 12-> 15 12-> 17 12-> 49 12->354 12->395 12->402 12->513 13-> 10 13-> 12 13-> 50 13->193 13->208 13->342 13->430 13->439 13->503 14-> 9 14-> 11 14-> 16
[133] 14-> 17 14->324 14->343 14->344 14->380 14->396 14->414 14->479 14->491 14->502 15-> 9 15-> 12 15-> 17 15->324 15->395 15->396 15->413 16-> 14 16-> 17 16-> 18 16-> 19 16->303
[155] 16->310 16->311 16->348 16->349 16->350 16->400 16->401 16->403 16->414 16->479 16->480 16->481 16->482 16->491 17-> 12 17-> 14 17-> 15 17-> 16 17-> 18 17->303 17->310 17->346
+ ... omitted several edges
> head(g2)
[,1] [,2]
[1,] 1 3
[2,] 1 4
[3,] 1 37
[4,] 1 38
[5,] 1 40
[6,] 1 43
Now, I also have a "Reference Table" that contains a variable for each value of "col1". For example, the number of days it rained each value of "col1":
#simulate data
var1 = rep("ABC",500)
var2 = seq(111, 511, by=1)
rainfall = rnorm(500, 60, 2)
reference = data.frame(col1 = paste0(var1,var2), rainfall)
> head(reference)
col1 rainfall
1 ABC111 57.09933
2 ABC112 59.41411
3 ABC113 60.71370
4 ABC114 62.04429
5 ABC115 58.30965
6 ABC116 60.35608
I would like to merge the "reference" data frame with the "edge list" or with the "adjacency matrix". My naïve attempt to do this would look something like this:
# create an ID key:
reference$id = 1:nrow(reference)
g2 = data.frame(g2)
g2$id = g2$X1
merged = merge(x = g2 , y = reference, by = "id", all.x = TRUE)
The problem is, I don't know if I have correctly created the ID variable. I just assumed that the order of "col1" in the shapefile is preserved in the same order as in "g2". But I am not sure if this assumption is correct.
- Can someone please let me know if I have done this correctly?
Thank you!