How to change census tract/ neighbourhood boundaries or link 2 shape files

Tamim · August 29, 2020, 9:09am

I have 2 shapefiles/datasets (census tract) for the same city (Cleveland OH). The first shape file from 1930s (190 tracts) include the folowing columns:
Tract_grade, LON, LAT (there is no tract id/number)

shapefile_data_1930:

Tract_grade         LON        LAT
C                -81.93458   41.49005
A                -81.82381   41.49122
B                -81.80392   41.49449
A                -81.78479   41.49376
.                     .         .
.                     .         .

The second shape file from 2010 (177 tracts) include the folowing columns:
Tract_id, LON, LAT

shapefile_data_2010:

Tract_ID         LON        LAT
1011.01       -81.74805   41.48173
1011.02       -81.75908   41.48629
1015.01       -81.75645  41.47409
1017.00       -81.74670  41.47562
.                     .         .
.                     .         .

However, the census tract boundries and of course coordinates have been changed since 1930s, and I need to link the tract_grade (from 1930s shapefile) with the tract id (from 2010 shapefile). So is there a way to relocate the old tract to be within the new tract boundaries /coordinates because I need both Tract_grade (from 1930s) and tract_id (from 2010)

jlacko · August 29, 2020, 9:57am

What kind of metric are you analyzing? Is it feasible to assume uniform density within a census polygon (your metric per tract area)?
If uniform density assumption is reasonable you could translate the values via new & old polygon intersections - I wrote a post on this technique a while back https://www.jla-data.net/eng/spatial-aggregation/

thymaro · August 29, 2020, 10:31am

As an aside, if you're trying to merge the files and this merge results in a filesize greater than 2 GB, then you won't technically have a shapefile anymore. I am not sure about the implications of this, but I mean to know it's impossible to work on a shapefile greater than 2 GB, as it's an illegal filesize for the format, even if your program won't restrict you from creating a file with that size.

Ajackson · August 29, 2020, 7:35pm

Not sure what you are trying to do, but what might work would be to intersect the two polygon files, and look for the maximum area in the intersections to determine how to link up 2010 with 1930. A similar process could migrate properties from one set of polygons to the other. I recently migrated census block attributes to zipcode polygons that way. I can share that if it helps.

Tamim · August 29, 2020, 8:00pm

Thanks Ajackson for your reply Yes I’m trying to migrate census tract attributes from old shapfile (1930) to a new one (2010). I think intersecting the two polygon files, and look for the maximum area in the intersections will solve the problem. I would appreciate it if you could share the way to intersect two polygon files or the work you’ve done. Thanks again

Ajackson · August 31, 2020, 3:27am

Similar issue, this one I took presidential race results by precinct and put them into zipcodes.

library(tidyverse)
library(sf)

knitr::opts_chunk$set(echo = TRUE)

Read in presidential results by precinct

Read in and pare down to Harris county


path <- "/home/ajackson/Dropbox/Rprojects/"

load("../Datasets/electionprecincts/presidential_precincts_2016.rda")

df2 <- df1 %>% 
  filter(state=="Texas") %>% 
  filter(county_name=="Harris County") %>% 
  select(precinct, candidate_normalized, votes) %>% 
  filter((candidate_normalized=="clinton")|
         (candidate_normalized=="trump")) %>% 
  pivot_wider(precinct, names_from=candidate_normalized, values_from=votes)

saveRDS(df2, "/home/ajackson/Dropbox/Rprojects/Datasets/HarrisPrecinct2016.rds")

#   Read in shape file

precincts <- read_sf('/home/ajackson/Dropbox/Rprojects/Datasets/electionprecincts/COH_VOTING_PRECINCTS_HARRIS_COUNTY-shp/COH_VOTING_PRECINCTS_HARRIS_COUNTY.shp')

Zip_poly <- readRDS(paste0(path, "Datasets/ZipCodes_sf.rds"))

Intersect zip polys and precinct polys

Now I need to intersect the two datasets and determine which fraction of
which precincts lie within each zipcode.


intersect <- st_intersection(Zip_poly, precincts)

answer <- intersect %>% 
  mutate(area=st_area(.) %>% as.numeric()) %>% 
  as_tibble() %>% 
  group_by(Zip_Code, PRECINCT) %>% 
  summarize(area = sum(area))

answer <- answer %>% 
  group_by(PRECINCT) %>% 
    mutate(total_area=sum(area)) %>% 
    mutate(fraction=area/total_area) %>% 
  ungroup() 

df2 <- df2 %>% 
  mutate(PRECINCT=str_remove(precinct, "^201"))

foo <- left_join(answer, df2, by="PRECINCT") %>% 
  group_by(Zip_Code) %>% 
    summarize(clinton=sum(fraction*clinton, na.rm=TRUE), 
              trump=sum(fraction*trump, na.rm=TRUE)) %>% 
  mutate(blueness=clinton/(clinton+trump)) %>%
  mutate(total_vote=clinton+trump) %>% 
  select(ZCTA=Zip_Code, blueness, total_vote)
  
saveRDS(foo, "/home/ajackson/Dropbox/Rprojects/Datasets/HarrisBlueness.rds")

system · September 21, 2020, 3:28am

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.