Mapping differences with R

KarmicDreamwork · March 18, 2019, 4:30pm

Hey everyone,

I'm working on a project to map differences in Air Quality from Wildfires.
Currently, I'm trying to map impacts from the Eagle Creek Fire (2017) with the differences from the overall average (1980-2016) in air quality to the 2017 average on a county map.

***To summarize, I want to map the "difference" column in my dataframe for each county on a color-coded map for the entire United States.

With the help of Andrescs, I have been able to make the below dataframe that have each county name by state and the AQI differences.

Now, I'm having trouble with the mapping process. I believe I can do this in R and I found a website that describes the process with the urbnmapr packages (https://medium.com/@urban_institute/how-to-create-state-and-county-maps-easily-in-r-577d29300bb2) but I'm unsure how to do this step by step.

I see they are joining their dataset with the counties but I need help with how to do this (and the rest of it). I have my counties with State.Name and county.Name but I don't have lat/long (see code below). I think the package already has each county in it by state but not sure how to join it properly.

Next, I need to find how to properly code my datasets into the map. I tried it (not shown) but was met with a ton of errors as I think I was doing it all wrong. I'm new to R so please help!

install.packages("devtools")
devtools::install_github("UrbanInstitute/urbnmapr")
library(tidyverse)
library(urbnmapr)
library(ggplot2)

data.frame(stringsAsFactors = FALSE, State.Name = c("South Carolina", 
    "Idaho", "Oklahoma", "Colorado", "Illinois", "Mississippi", 
    "Ohio", "Pennsylvania", "Washington", "South Carolina"), 
    county.Name = c("Abbeville", "Ada", "Adair", "Adams", "Adams", 
        "Adams", "Adams", "Adams", "Adams", "Aiken"), TotalAvgAQI = c(40.8139158576052, 
        43.1923436041083, 44.8194029850746, 49.606386153387, 
        38.7736612702366, 43.1960167714885, 32.4834321590513, 
        40.3827089337176, 26.8103574033552, 37.3312078019505), 
    TotalAQI1717 = c(NA, 46.5274725274725, 40.4725274725275, 
        49, 36.9672131147541, NA, 22.8681318681319, 38.1648351648352, 
        31.1136363636364, 36.7241379310345), difference = c(NA, 
        3.33512892336422, -4.34687551254715, -0.606386153387049, 
        -1.80644815548252, NA, -9.61530029091941, -2.21787376888241, 
        4.30327896028115, -0.607069870916007))
#>        State.Name county.Name TotalAvgAQI TotalAQI1717 difference
#> 1  South Carolina   Abbeville    40.81392           NA         NA
#> 2           Idaho         Ada    43.19234     46.52747  3.3351289
#> 3        Oklahoma       Adair    44.81940     40.47253 -4.3468755
#> 4        Colorado       Adams    49.60639     49.00000 -0.6063862
#> 5        Illinois       Adams    38.77366     36.96721 -1.8064482
#> 6     Mississippi       Adams    43.19602           NA         NA
#> 7            Ohio       Adams    32.48343     22.86813 -9.6153003
#> 8    Pennsylvania       Adams    40.38271     38.16484 -2.2178738
#> 9      Washington       Adams    26.81036     31.11364  4.3032790
#> 10 South Carolina       Aiken    37.33121     36.72414 -0.6070699

mara · March 18, 2019, 5:13pm

If that framework requires lat long and you don't have it, you might want to look at something like tidycensus, which has geo files for cities and counties, etc:

For joins, see:
https://dplyr.tidyverse.org/reference/join.html

and also

KarmicDreamwork · March 22, 2019, 3:49am

Thank you.

I'm not sure how to combine my data with the lat/long info. Or how to upload the shapefiles to do it. It kind of skips over that part.

My concern is I have multiple states so theres duplicate county names. The examples I've found only use one state at a time.

mara · March 22, 2019, 11:27am

It's in the Relational data chapter I linked to above as well, but you can join using multiple columns (e.g. join by state and county):

You might want to review some general resources on working with geospatial data in R which will cover the other issues you've listed above (where shape files come from, coordinate reference systems, etc)

chris.prener · March 22, 2019, 12:30pm

Hey @KarmicDreamwork - @mara gave you some excellent general advice, and I'm happy to add some GIS specific guidance as well. I wouldn't use urbnmapr for this. I would steer you towards a more general solution and one that is less reliant on something built for a specific application.

I think the tigris package will get you exactly what you need in terms of geometric data. The first thing you need to do is install the sf package and the tigris package... do you have those already installed, by chance?

KarmicDreamwork · March 23, 2019, 9:54pm

Hi Chris,

I believe I already have the tigris installed. How would I go about that route?

KarmicDreamwork · March 24, 2019, 2:46pm

I can't get the join to work. It just gives me NAs!
I originally tried to join for counties AND states but that didn't work so this is just trying to join washington counties to lat/long so i can map my differences with ggplot


Shington <- data.frame(stringsAsFactors=FALSE,
                         state = c("Washington", "Washington", "Washington", "Washington",
                                   "Washington"),
                         county = c("Adams", "Asotin", "Benton", "Chelan", "Clallam"),
                         TotalSeptAvg = c(24.3663594470046, 37.8342696629214, 32.9859335038363,
                                          27.1462365591398, 50.5682023486902),
                         TotalSept1717 = c(58.6896551724138, 76.1851851851852, 62.2,
                                           81.5333333333333, 37.4666666666667),
                         difference = c(34.3232957254092, 38.3509155222638, 29.2140664961637,
                                        54.3870967741936, -13.1015356820235)
  )
  wa_county <- data.frame(stringsAsFactors=FALSE,
                          long = c(-119.885696411133, -119.874229431152, -119.879959106445,
                                   -119.862770080566, -119.868492126465),
                          lat = c(46.6043853759766, 46.5700073242188, 46.2205047607422,
                                  46.2090454101562, 46.0428886413574),
                          group = c(2932, 2932, 2932, 2932, 2932),
                          order = c(86732L, 86733L, 86734L, 86735L, 86736L),
                          state = c("washington", "washington", "washington", "washington",
                                    "washington"),
                          county = c("yakima", "yakima", "yakima", "yakima", "yakima")
  )
  
  #followed this tutorial https://eriqande.github.io/rep-res-eeb-2017/map-making-in-R.html (8.2.6.5)
  #following tutorial to try to make county map showing differences among counties (difference column) but unable to get them to join to have lat/long per county
  
  #These are what I had loaded, not sure if I used all of them or not
  library(tidyverse)
  library(mapdata)
  library(maps)
  library(stringr)
  library(viridis)
  
  install.packages(c("maps", "mapdata"))
  devtools::install_github("dkahle/ggmap")
  install.packages("viridis")
  
  
  #Changed column names to make sure they were the same 
  #*already in reprex dataframe
  colnames(wa_county)[colnames(wa_county)=="subregion"] <- "county"
  colnames(wa_county)[colnames(wa_county)=="region"] <- "state"
  head(wa_county)
  
  #replace "county.Name" with subregion to allow for easeir join to add lat/long
  colnames(Shington)[colnames(Shington)=="county.Name"] <- "county"
  colnames(Shington)[colnames(Shington)=="State.Name"] <- "state"
  head(Shington)
  
  
  #attempt to join data frames by county
  #** Anyone know how to do this by county AND state for a dataframe with multiple states??
  
  Meowmeow <- left_join(wa_county, Shington, by = "county", ignore.case=TRUE)
  head(Meowmeow)
  
  #comes out with NA NA NA for difference data and doens't seem to link specific county with lat/long))

I also tried to do it the other way. These are what the results look like. If I do it with wa_county first, I have lat/long but all the differences and avgs are na na na

>Meowmeow <- left_join(Shington, wa_county, by = "county", ignore.case=TRUE)
> head(Meowmeow)
# A tibble: 6 x 10
# Groups:   long [1]
  region.x   subregion TotalSeptAvg TotalSept1717 difference  long   lat group order region.y
  <chr>      <chr>            <dbl>         <dbl>      <dbl> <dbl> <dbl> <dbl> <int> <chr>   
1 Washington Adams             24.4          58.7       34.3    NA    NA    NA    NA NA      
2 Washington Asotin            37.8          76.2       38.4    NA    NA    NA    NA NA      
3 Washington Benton            33.0          62.2       29.2    NA    NA    NA    NA NA      
4 Washington Chelan            27.1          81.5       54.4    NA    NA    NA    NA NA      
5 Washington Clallam           50.6          37.5      -13.1    NA    NA    NA    NA NA      
6 Washington Clark             33.6          52.8       19.2    NA    NA    NA    NA NA

chris.prener · March 24, 2019, 3:48pm

Hey @KarmicDreamwork - I’ve got a reprex for you that I’ll post this afternoon!

chris.prener · March 24, 2019, 6:17pm

OK @KarmicDreamwork, here you go - the reprex is down at the bottom on my reply. Let me walk you through some of this, first, though.

Problem Space

You've got state and county names, and want to get them on a map. You've gone through a number of spatial packages, and it seems like there are two fundamental issues. The first is that you're not sure exactly what geometry you need and where to get the data, and the second is that you are not sure how to join the geometric data you have been able to get your hands on with your sample data.

Understanding Geometric Data

There are two fundamental ways to represent spatial data (what we often call "shapefiles") in R - sp objects and sf objects. sf objects are my preferred approach because they look and act like data frames that we are used to working with in other contexts.

You have made a couple of references to mapping the longitude and latitude of each county. I suspect the data you've come across represents the centroid of each county i.e. its geometric midpoint. I describe this as the place where you could balance a flat piece of cardboard cut out in the shape of the county on your finger. This is represented as a point on a map.

However, in my experience, most people want the actual outline of the county itself. In this case you, you want polygon data that represent the cutout of the county itself. Polygons are a series of points connected by lines. I'm going to assume this is the type of data you want, but feel free to let me know if you'd rather work with centroids.

Accessing Spatial Data

You've tried out a number of packages that contain some pre-formatted spatial data. I generally suggest getting the actual spatial data rather than some that is pre-processed. Packages like urbnmapr, maps and mapdata contain a limited set of spatial data that is sometimes useful. However, it is better to learn how to access the raw spatial data from the source.

The easiest way to get U.S. spatial data is the tigris package, which allows you to download a variety of common geometric data sets at either a national extent, for a particular state, or for a particular county. These data are not stored within the package like the other tools you've identified. Rather, you download it directly from the U.S. Census Bureau's API. This gives you the widest latitude (pun intended) to use whatever packages you want for mapping, and gives you to the experience to access a variety of spatial data sets and not just want comes packaged within a specific tool like urbnmapr.

Issues With Your Specific Data

You have two sets of spatial references - state names and county names. You've already correctly worked out a major flaw with storing spatial data that way - we repeat county names between states frequently (there are Washington Counties all over the U.S.) and sometimes even within states (Baltimore County and City, for example, are technically both counties, as are St. Louis City and County).

If you download the U.S.-wide county data set from tigris, it will contain county names but not state names. One part of the reprex below therefore involves converting your state names automatically to state FIPS codes (a two digit unique identifier) so that you don't have to do it by hand. We'll use a package called cdlTools to do this.

When we download the county data from tigris, we also have to make one edit - to convert the state FIPS codes to numeric from character so that they match the output we get from the fips(). I also cut down the number of columns in our county data to make it easier to work with, but this of course is optional!

Joining Data

Once we have both data sets prepped and ready to join, we can use dplyr to complete the join. There are two data sets - an x data set and a y one. In the reproducible example below, your sample data are the x data and the county geometric data are the y data. Thus we also list the sample data (and its key variables) first:

sample_joined <- left_join(sample, counties, by = 
    c("state.fips" = "STATEFP", "county.name" = "NAME")) %>%
  st_as_sf()

By using two key relationships ("state.fips" = "STATEFP" and "county.name" = "NAME") we can correctly match observations by both county name and state, so that Adams County, Washington is correctly matched with the geometry for that place and Adams County, Colorado is correctly matched with its own geometry.

This will give you a data set with the name number of observations as sample, but each will have the geometric data from counties appended to it. Since our x data are a tibble, we get a nibble out of this. Thus we need to include st_as_sf() after the join to covert our resulting data into an sf object.

If we reversed this, and placed counties as the x data and sample as the y data, we could get resulting object with all of the observations from counties. When a match with sample exists, the sample data will be appended. Otherwise, NA values will be returned. Since counties is already an sf object, you would not need st_as_sf() in this case after the join since the join will return an sf object.

Which data set you list as x and y is a matter of what you want to do with the data next.

Future Data Thoughts

If you continue working with these data, I would suggest using a GEOID that matches the GEOID data present in Census data releases. You may even already have this. It makes the joins easier because GEOID values are always unique and you won't have to worry about Adams County in Washington vs Colorado etc.

Mapping

Since you're using sf objects, you have a bunch of options for mapping. You can use ggplot2, ggmap, tmap, and leaflet just to name a few tools. I like mapview for previewing data and leaflet for web mapping. I teach and use tmap and ggplot2 for static mapping.

Reproducible Example

# dependencies
library(cdlTools)
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(leaflet)
library(tigris)
#> To enable 
#> caching of data, set `options(tigris_use_cache = TRUE)` in your R script or .Rprofile.
#> 
#> Attaching package: 'tigris'
#> The following object is masked from 'package:graphics':
#> 
#>     plot
library(sf)
#> Linking to GEOS 3.6.1, GDAL 2.1.3, PROJ 4.9.3

# sample data
sample <- tibble(
  state.name = c("South Carolina", "Idaho", "Oklahoma", "Colorado", "Illinois", 
                 "Mississippi", "Ohio", "Pennsylvania", "Washington", "South Carolina"), 
  county.name = c("Abbeville", "Ada", "Adair", "Adams", "Adams", "Adams", "Adams", 
                  "Adams", "Adams", "Aiken"), 
  TotalAvgAQI = c(40.8139158576052, 43.1923436041083, 44.8194029850746, 49.606386153387, 
                  38.7736612702366, 43.1960167714885, 32.4834321590513, 40.3827089337176, 
                  26.8103574033552, 37.3312078019505), 
  TotalAQI1717 = c(NA, 46.5274725274725, 40.4725274725275, 49, 36.9672131147541, NA, 
                   22.8681318681319, 38.1648351648352, 31.1136363636364, 36.7241379310345), 
  difference = c(NA, 3.33512892336422, -4.34687551254715, -0.606386153387049, -1.80644815548252, 
                 NA, -9.61530029091941, -2.21787376888241, 4.30327896028115, -0.607069870916007))

# convert state names to FIPS codes
sample <- sample %>% 
  mutate(state.fips = fips(state.name)) %>%
  select(state.name, state.fips, everything())

# download counties
counties <- counties(class = "sf")

# subset counties, make fips numeric
counties <- counties %>%
  select(STATEFP, NAME) %>%
  mutate(STATEFP = as.numeric(STATEFP))

# join sample data with counties
sample_joined <- left_join(sample, counties, by = 
    c("state.fips" = "STATEFP", "county.name" = "NAME")) %>%
  st_as_sf()
#> Warning: Column `state.fips`/`STATEFP` has different attributes on LHS and
#> RHS of join

# leaflet map
sample_joined %>%
  leaflet() %>%
  addTiles() %>%
  addPolygons()
#> Warning: sf layer has inconsistent datum (+proj=longlat +datum=NAD83 +no_defs).
#> Need '+proj=longlat +datum=WGS84'

^{Created on 2019-03-24 by the reprex package (v0.2.1)}

KarmicDreamwork · March 24, 2019, 8:46pm

This is fantastic, thank you!!! I'm so close now!!

Can you use the geometry column like the lat/long for ggplot?
I'm not familiar with how to use leaflet. It looks like there's alot you can do with it!
How could I take that and have it show the differences for each county in the U.S.? Something like this but the entire country:

(Case Studies in Reproducible Research: a spring seminar at UCSC)

*also when i try to save the map, it terminates R

I also tried to join the county fips but was met with NAs (the state one you had worked!)

#all the things...
library(tidyverse)
library(mapdata)
library(maps)
library(stringr)
library(viridis)
library(tigris)
library(cdlTools)
library(ggplot2)
library(sf)
library(leaflet)

#Try 2 to set fips for per county
#WAOR17 <- dplyr::filter(AQI2017, State.Name == 'Oregon' | State.Name == 'Washington')


mew <- SeptWAORDifference
WMew<- dplyr::filter(mew, State.Name == 'Washington')

#combines state name with state FIPS
#try just for WA and OR
# AKA convert state names to FIPS codes
sample2 <- WMew %>% 
  mutate(county.fips = fips(county.Name)) %>%
  select(county.Name, county.fips, everything())

head(sample2)

#download county data
counties <- counties(class = "sf")

head(counties)


# subset counties, make fips numeric
counties <- counties %>%
  select(COUNTYFP, NAME) %>%
  mutate(COUNTYFP = as.numeric(COUNTYFP))

head(counties)
# join sample data with counties
sample_joined2 <- left_join(sample2, counties, by = 
                              c("county.fips" = "COUNTYFP", "county.Name" = "NAME")) %>%
  st_as_sf()

head(sample_joined2)

#results
  county.Name county.fips State.Name SeptAvgAQI SeptAvgAQI17 difference                  geometry
  <chr>             <dbl> <chr>           <dbl>        <dbl>      <dbl> <GEOMETRYCOLLECTION [Â°]>
1 Adams                NA Washington       24.4         58.7       34.3  GEOMETRYCOLLECTION EMPTY
2 Asotin               NA Washington       37.8         76.2       38.4  GEOMETRYCOLLECTION EMPTY
3 Benton               NA Washington       33.0         62.2       29.2  GEOMETRYCOLLECTION EMPTY
4 Chelan               NA Washington       27.1         81.5       54.4  GEOMETRYCOLLECTION EMPTY
5 Clallam              NA Washington       50.6         37.5      -13.1  GEOMETRYCOLLECTION EMPTY
6 Clark                NA Washington       33.6         52.8       19.2  GEOMETRYCOLLECTION EMPTY

chris.prener · March 25, 2019, 12:30am

Nice work! Glad we could help.

Sort of, but I think it is easier now that geom_sf exists. The following code, tacked on to the reprex above, will do the job minimally:

# additional dependencies
library(ggplot2)
library(viridis)

# minimal map
ggplot() +
  geom_sf(data = sample_joined, mapping = aes(fill = difference)) +
  scale_fill_viridis()

If you wanted to add the state boundaries in black and place county boundaries in white:

library(ggplot2)
library(viridis)

# download state data, remove territories plus Alaska and Hawaii
states <- states(class = "sf") %>%
  select(NAME, STATEFP) %>%
  filter(as.numeric(STATEFP) <= 56) %>%
  filter(STATEFP != "02" & STATEFP != "15")

# slightly more complex plot
ggplot() +
  geom_sf(data = sample_joined, 
    mapping = aes(fill = difference), color = "#ffffff") +
  geom_sf(data = states, fill = NA, color = "#000000")
  scale_fill_viridis()

The albersusa package, which is only available from GitHub has some great tools if you want to include Alaska and Hawaii in your map as well.

Can you post the output of str(sample2)?

What syntax were you using? Version of R and RStudio? Version of ggplot2?

KarmicDreamwork · March 25, 2019, 4:52am

Thank you!!!

For the sample 2 results:

> head(sample2)
# A tibble: 6 x 6
# Groups:   county.Name [6]
  county.Name county.fips State.Name SeptAvgAQI SeptAvgAQI17 difference
  <chr>       <lgl>       <chr>           <dbl>        <dbl>      <dbl>
1 Adams       NA          Washington       24.4         58.7       34.3
2 Asotin      NA          Washington       37.8         76.2       38.4
3 Benton      NA          Washington       33.0         62.2       29.2
4 Chelan      NA          Washington       27.1         81.5       54.4
5 Clallam     NA          Washington       50.6         37.5      -13.1
6 Clark       NA          Washington       33.6         52.8       19.2

I set what you said for GGPlot with my code and it looks good.
I updated R and all the packages and it worked!

Does the code look correct to you? I think this is what I need. You are amazing!!!!!!!!!!!!!!!!!

library(tidyverse)
library(mapdata)
library(maps)
library(stringr)
library(viridis)
library(tigris)
library(cdlTools)
library(ggplot2)
library(sf)
library(leaflet)
library(dplyr)

SeptWAORDifference <- data.frame(stringsAsFactors=FALSE,
     State.Name = c("Washington", "Washington", "Oregon", "Oregon",
                    "Washington"),
    county.Name = c("Adams", "Asotin", "Baker", "Benton", "Benton"),
     SeptAvgAQI = c(24.3663594470046, 37.8342696629214, 26.8333333333333,
                    18.1133501259446, 32.9859335038363),
   SeptAvgAQI17 = c(58.6896551724138, 76.1851851851852, 49.4, 31.7857142857143,
                    62.2),
     difference = c(34.3232957254092, 38.3509155222638, 22.5666666666667,
                    13.6723641597697, 29.2140664961637)
)

#September WA, OR differences to not break data (SeptWAORDifference)
#new attempt :D from Chrises ideas on rstudio community

Cow <- SeptWAORDifference
head(Cow)

#combines state name with state FIPS
# AKA convert state names to FIPS codes
Cow1 <- Cow %>% 
  mutate(state.fips = fips(State.Name)) %>%
  select(State.Name, state.fips, everything())

head(Cow1)

#download county data
counties <- counties(class = "sf")

# subset counties, make fips numeric
counties <- counties %>%
  select(STATEFP, NAME) %>%
  mutate(STATEFP = as.numeric(STATEFP))

# join sample data with counties
Cow2 <- left_join(Cow1, counties, by = 
                             c("state.fips" = "STATEFP", "county.Name" = "NAME")) %>%
  st_as_sf()

head(Cow2)
#dataset now has differences and geometric data (geom_sf)

#get state data to add to map
states <- states(class = "sf") %>%
  select(NAME, STATEFP) %>%
  filter(as.numeric(STATEFP) <= 56) %>%
  filter(STATEFP != "02" & STATEFP != "15")

#should make map with county differences and state borders
ggplot() +
  geom_sf(data = Cow2, 
          mapping = aes(fill = difference), color = "#ffffff") +
  geom_sf(data = states, fill = NA, color = "#000000")
scale_fill_viridis()

(*yes, some are named after animal sounds. I couldn't think of any new names...)

When I did it for total county data, it looks like this. I'm guessing I'm just missing alot of county info

KarmicDreamwork · March 25, 2019, 2:54pm

I tried the joins but I'm getting NA NA NA for my differences. I don't believe its joining the lat/long to the county names correctly.

This is what I tried. I originally tried to do a join for county name by state but it didn't work so I'm trying to do a single state for county name but it isn't working either. Please help! I don't know what I'm doing wrong. I have checked multiple forums with no solutions. I found one that converts the columns to as.character which fixes the difference numbers in the join but then turns county names into NAs.

Any ideas?

Shington <- data.frame(stringsAsFactors=FALSE,
                         state = c("Washington", "Washington", "Washington", "Washington",
                                   "Washington"),
                         county = c("Adams", "Asotin", "Benton", "Chelan", "Clallam"),
                         TotalSeptAvg = c(24.3663594470046, 37.8342696629214, 32.9859335038363,
                                          27.1462365591398, 50.5682023486902),
                         TotalSept1717 = c(58.6896551724138, 76.1851851851852, 62.2,
                                           81.5333333333333, 37.4666666666667),
                         difference = c(34.3232957254092, 38.3509155222638, 29.2140664961637,
                                        54.3870967741936, -13.1015356820235)
  )
  wa_county <- data.frame(stringsAsFactors=FALSE,
                          long = c(-119.885696411133, -119.874229431152, -119.879959106445,
                                   -119.862770080566, -119.868492126465),
                          lat = c(46.6043853759766, 46.5700073242188, 46.2205047607422,
                                  46.2090454101562, 46.0428886413574),
                          group = c(2932, 2932, 2932, 2932, 2932),
                          order = c(86732L, 86733L, 86734L, 86735L, 86736L),
                          state = c("washington", "washington", "washington", "washington",
                                    "washington"),
                          county = c("yakima", "yakima", "yakima", "yakima", "yakima")
  )
  
  #followed this tutorial https://eriqande.github.io/rep-res-eeb-2017/map-making-in-R.html (8.2.6.5)
  #following tutorial to try to make county map showing differences among counties (difference column) but unable to get them to join to have lat/long per county
  
  #These are what I had loaded, not sure if I used all of them or not
  library(tidyverse)
  library(mapdata)
  library(maps)
  library(stringr)
  library(viridis)
  
  install.packages(c("maps", "mapdata"))
  devtools::install_github("dkahle/ggmap")
  install.packages("viridis")
  
  
  #Changed column names to make sure they were the same 
  #*already in reprex dataframe
  colnames(wa_county)[colnames(wa_county)=="subregion"] <- "county"
  colnames(wa_county)[colnames(wa_county)=="region"] <- "state"
  head(wa_county)
  
  #replace "county.Name" with subregion to allow for easeir join to add lat/long
  colnames(Shington)[colnames(Shington)=="county.Name"] <- "county"
  colnames(Shington)[colnames(Shington)=="State.Name"] <- "state"
  head(Shington)
  
  
  #attempt to join data frames by county
  #** Anyone know how to do this by county AND state for a dataframe with multiple states??
  
  Meowmeow <- left_join(wa_county, Shington, by = "county", ignore.case=TRUE)
  head(Meowmeow)
  
  #comes out with NA NA NA for difference data and doens't seem to link specific county with lat/long))

chris.prener · March 25, 2019, 3:15pm

The join is failing because your county.fips column is a logical one and (I suspect) has no valid data (i.e. all NA values).

Great!

Things look good. I would strongly encourage you to move away from the arbitrary animal sounds. Name your objects purposefully - airQuality, airQuality_sf, etc. Think about what the object represents and name it so that when you look in your environment tab, it is immediately clear what you've got stored in there. This is better for you in the long run, but also important for reproducibility.

I would consider taking out Hawaii from your analysis or using the albersusa package I mentioned earlier.

If you do take out Hawaii, I would switch your project to a projected coordinate system appropriate for mapping the continental United States. Two options are the USA Contiguous Lambert Conformal Conic and the USA Contiguous Albers Equal Area Conic. You can use the sf::st_transform() function to do this.

In your reproducible example, there are no entries for "yakima" in the Shington object. Also beware that you're trying to join data with multiple observations for the same county to data with a single observation per county. That is going to need a modified workflow.

andresrcs · March 25, 2019, 3:18pm

Also, left_join has no ignore.case argument, look this example where I manually change names to lower case.

Shington <- data.frame(stringsAsFactors=FALSE,
                       state = c("Washington", "Washington", "Washington", "Washington",
                                 "Washington"),
                       county = c("Adams", "Asotin", "Benton", "Chelan", "Yakima"),
                       TotalSeptAvg = c(24.3663594470046, 37.8342696629214, 32.9859335038363,
                                        27.1462365591398, 50.5682023486902),
                       TotalSept1717 = c(58.6896551724138, 76.1851851851852, 62.2,
                                         81.5333333333333, 37.4666666666667),
                       difference = c(34.3232957254092, 38.3509155222638, 29.2140664961637,
                                      54.3870967741936, -13.1015356820235)
)
wa_county <- data.frame(stringsAsFactors=FALSE,
                        long = c(-119.885696411133, -119.874229431152, -119.879959106445,
                                 -119.862770080566, -119.868492126465),
                        lat = c(46.6043853759766, 46.5700073242188, 46.2205047607422,
                                46.2090454101562, 46.0428886413574),
                        group = c(2932, 2932, 2932, 2932, 2932),
                        order = c(86732L, 86733L, 86734L, 86735L, 86736L),
                        state = c("washington", "washington", "washington", "washington",
                                  "washington"),
                        county = c("yakima", "yakima", "yakima", "asotin", "yakima")
)

library(dplyr)

left_join(wa_county, Shington %>% mutate_if(is.character, ~tolower(.)), by = c("state","county"))
#>        long      lat group order      state county TotalSeptAvg
#> 1 -119.8857 46.60439  2932 86732 washington yakima     50.56820
#> 2 -119.8742 46.57001  2932 86733 washington yakima     50.56820
#> 3 -119.8800 46.22050  2932 86734 washington yakima     50.56820
#> 4 -119.8628 46.20905  2932 86735 washington asotin     37.83427
#> 5 -119.8685 46.04289  2932 86736 washington yakima     50.56820
#>   TotalSept1717 difference
#> 1      37.46667  -13.10154
#> 2      37.46667  -13.10154
#> 3      37.46667  -13.10154
#> 4      76.18519   38.35092
#> 5      37.46667  -13.10154

KarmicDreamwork · March 26, 2019, 6:41am

Thank you. I wanted to ask if my code looks correct for my project. I'm starting to doubt myself at the moment. I'm trying to do the overall average AQI for EACH month, such as the average AQI for September from 1996-2016 then compare that to the average AQI for September 2017. Then map those differences per county. Is this the way you'd do it? I'm tight on time right now, so I'm hoping looking at it without a reprex will be enough. I'm more wondering if the

NewaverageS %>% 
  mutate(Date = ymd(Date)) %>%
  filter(month(Date) %in% c(9)) %>%
  filter(year(Date) %in% 1996:2016) %>%
  as_tsibble(key = id(State.Name, county.Name), index = Date) %>%
  group_by(State.Name, county.Name) %>%
  index_by(Year = year(Date)) %>%
  summarise(mean_AQI = mean(AQI),
            days = n())

is in the correct location. I think I have to do them for each month to get the mean for that specific month, correct?

#new average for 1996-2016

install.packages("tsibble")
library(lubridate)
library(dplyr)
library(tsibble)
library(tidyverse)

#AQI averages for county-state for Washington and Oregon for 1996-2016 and 2017

Newaverage <- AQIData
head(Newaverage)

view(Newaverage)

#create new columns for year, month, day
Newaverage <- Newaverage %>%
  dplyr::mutate(year = lubridate::year(Date), 
                month = lubridate::month(Date), 
                day = lubridate::day(Date))
head(Newaverage)

Newaverage1 <- Newaverage %>% filter((year) %in% c(1996:2016))
view(Newaverage1)

view(Newaverage1)
#Warning message:
#In year == 1996:2016 :
# longer object length is not a multiple of shorter object length

#selecting columns that we are interested in
#Newaverage2 <- dplyr::select(Newaverage1, State.Name, county.Name, AQI, Category, year, month, day)
#head(Newaverage2)

#filtering for the month of September (9)
NewaverageS <- Newaverage1
NewaverageS %>% 
  mutate(Date = ymd(Date)) %>%
  filter(month(Date) %in% c(9)) %>%
  filter(year(Date) %in% 1996:2016) %>%
  as_tsibble(key = id(State.Name, county.Name), index = Date) %>%
  group_by(State.Name, county.Name) %>%
  index_by(Year = year(Date)) %>%
  summarise(mean_AQI = mean(AQI),
            days = n())

Newaverage29 <- NewaverageS %>% filter((month) %in% c(9))


#each county-state count for each year


#need to group each county by state for those with over 100 pieces of data and get average AQI and # of days per county

#1. each county per state
#group_by(county.Name, State.Name) %>%
#2. only those with over 300 days calculated
#1980-2016 = 36 years, average month (30)*36 = 1060 days. 300 is about 30% of that goal number. 
#filter(n() > 300)
#3. Average AQI for each county
#summarise (AQIMean = mean (AQI, na.rm = TRUE))

#county.name per state.name with data with over 100 days counted and average AQI
#attempted column for # of days for each county but didnt work #mutate (days = n()) %>% 

#September
Newaverage9 <- Newaverage29 %>% 
  group_by(county.Name, State.Name) %>%
  filter(n() > 300) %>% 
  summarise (AQIMean = mean (AQI, na.rm = TRUE))

head(Newaverage9)

#Find Differences

#New average for 1996-2016
#September (Newaverage9) -> difference (NewDifSept)

#New 2017 Sept (NewSeptAvgAQI17)


head(SeptAvgAQI17)
head(NewSeptAvgAQI17)

#September
# 1996-2016 avg "Newaverage9" vs 2017 avg "NewSeptAvgAQI17"
NewDifSept <- Newaverage9 %>% 
  full_join(SeptAvgAQI17, by = c("State.Name", "county.Name")) %>% 
  select(State.Name, county.Name, Newaverage9 = AQIMean.x, SeptAvgAQI17 = AQIMean.y) %>% 
  mutate(difference = SeptAvgAQI17 - Newaverage9)

#Warning messages:
#1: Column `State.Name` joining factors with different levels, coercing to character vector 
#2: Column `county.Name` joining factors with different levels, coercing to character vector 

#New average for 1996-2016
#September (Newaverage9) -> difference (NewDifSept) -> map -> plot(New96SeptDif3)

New96SeptDif <- NewDifSept
head(New96SeptDif)

#combines state name with state FIPS
# AKA convert state names to FIPS codes
New96SeptDif1 <- New96SeptDif %>% 
  mutate(state.fips = fips(State.Name)) %>%
  select(State.Name, state.fips, everything())

head(New96SeptDif1)

#download county data
counties <- counties(class = "sf")

# subset counties, make fips numeric
counties <- counties %>%
  select(STATEFP, NAME) %>%
  mutate(STATEFP = as.numeric(STATEFP))

# join sample data with counties
New96SeptDif2 <- left_join(New96SeptDif1, counties, by = 
                             c("state.fips" = "STATEFP", "county.Name" = "NAME")) %>%
  st_as_sf()

head(New96SeptDif2)
#dataset now has differences and geometric data (geom_sf)

#get state data to add to map
states <- states(class = "sf") %>%
  select(NAME, STATEFP) %>%
  filter(as.numeric(STATEFP) <= 56) %>%
  filter(STATEFP != "02" & STATEFP != "15")

#map with county differences and state borders
New96SeptDif3 <- ggplot() +
  geom_sf(data = New96SeptDif2, 
          mapping = aes(fill = difference), color = "#ffffff") +
  xlim(lons)+ylim(lats)+
  xlab("Longitude") + ylab("Latitude") +
  ggtitle("September Air Quality Index (AQI) Differences During the Eagle Creek Fire", subtitle ="Comparing 1996-2016 averages to 2017")+
  geom_sf(data = states, fill = NA, color = "#000000")+
  scale_fill_gradientn("AQI", colours =c("blue4","blue","white","red","red4"),limits = c(-100,100))

plot(New96SeptDif3)
#it works!

system · April 16, 2019, 6:41am

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.