Wondering if there's an equivalent in r to the SAS put statement for the purpose of creating a new variable by matching an existing variable to a list.
Let's say I wanted to use a list of zip codes and census MSAs...there's a list of all zips and all MSAs
zip censusmsa
94114 "San Francisco-Oakland, CA"
(and so on, all zips in the US)
I want to match a zip code from an address file and create the MSA field. In SAS I'd run:
hsmsaarea=(put(hszip_use,$zipmsa.));
Which says "create hsmsaarea by taking zip from working file and matching to a list called '$zipmsa' stored in a library". If zip in address file matchines something in the library file, put the MSA value in the working file.
Is there a similar r function that's not a join or merge? Something I can run as a stand-alone line or ideally in a dplyr chain?
Can you add a small example of your two datasets to your question? That will allow folks who want to help something to work with. See some ideas for how to do this here.
A good search term for this problem that may help you find additional approaches is "lookup table". I combined that with "r" and got quite a few hits. While many resources still talk about joins as a primary option, I'm pretty sure I've seen other options using, e.g., match(). (I'm not offering a solution since I do this sort of work with joins, which jives with the way my brain thinks about this problem. )
From your code, it appears that zipmsa is a SAS format defined using PROC FORMAT.
I suggest two possible solutions. Either create a data frame comprising the zip and censusmsa pairs which can be used as a lookup table, or store this mapping as a named character vector (the name is the zip code and the value is the censusmsa).
The example code below demonstrates both the possibilities.
# Dataset containing zip and corresponding name
zip_map_d <- data.frame(
zip = c(1, 2, 3),
name = c("A", "B", "C"),
stringsAsFactors = FALSE
)
# Character vector where the name of each element is the zip code
zip_map_v <- c("1" = "A", "2" = "B", "3" = "C")
# Dataset containing zip codes and some random value
df <- data.frame(
zip = sample(c(1, 2, 3), 10, replace = TRUE),
x = rnorm(10)
)
# Left join using the zip code
library(dplyr)
df2 <- df %>%
left_join(zip_map_d, by = c("zip" = "zip"))
# Create a new variable by looking up the named character vector
df3 <- df %>%
mutate(name = zip_map_v[zip])
Thanks for the lookup table search tip, forgot about that or vlookup equivalent. But then I hardly ever did vlookup in Excel, being so SAS oriented since grad-school.
Thanks for this...the second option is pretty much exactly what I had in mind. Per my other reply I just didn't initially search using the term "lookup" but it makes perfect sense now.