r equivalent of sas put statement

Hello...

Wondering if there's an equivalent in r to the SAS put statement for the purpose of creating a new variable by matching an existing variable to a list.

Let's say I wanted to use a list of zip codes and census MSAs...there's a list of all zips and all MSAs
zip censusmsa
94114 "San Francisco-Oakland, CA"
(and so on, all zips in the US)

I want to match a zip code from an address file and create the MSA field. In SAS I'd run:
hsmsaarea=(put(hszip_use,$zipmsa.));

Which says "create hsmsaarea by taking zip from working file and matching to a list called '$zipmsa' stored in a library". If zip in address file matchines something in the library file, put the MSA value in the working file.

Is there a similar r function that's not a join or merge? Something I can run as a stand-alone line or ideally in a dplyr chain?

Can you add a small example of your two datasets to your question? That will allow folks who want to help something to work with. See some ideas for how to do this here.

A good search term for this problem that may help you find additional approaches is "lookup table". I combined that with "r" and got quite a few hits. While many resources still talk about joins as a primary option, I'm pretty sure I've seen other options using, e.g., match(). (I'm not offering a solution since I do this sort of work with joins, which jives with the way my brain thinks about this problem. :slightly_smiling_face:)

1 Like

From your code, it appears that zipmsa is a SAS format defined using PROC FORMAT.

I suggest two possible solutions. Either create a data frame comprising the zip and censusmsa pairs which can be used as a lookup table, or store this mapping as a named character vector (the name is the zip code and the value is the censusmsa).

The example code below demonstrates both the possibilities.

# Dataset containing zip and corresponding name

zip_map_d <- data.frame(
  zip = c(1, 2, 3),
  name = c("A", "B", "C"),
  stringsAsFactors = FALSE
)

# Character vector where the name of each element is the zip code

zip_map_v <- c("1" = "A", "2" = "B", "3" = "C")

# Dataset containing zip codes and some random value

df <- data.frame(
  zip = sample(c(1, 2, 3), 10, replace = TRUE),
  x = rnorm(10)
)

# Left join using the zip code

library(dplyr)
df2 <- df %>%
  left_join(zip_map_d, by = c("zip" = "zip"))

# Create a new variable by looking up the named character vector

df3 <- df %>%
  mutate(name = zip_map_v[zip])
3 Likes

Thanks for the lookup table search tip, forgot about that or vlookup equivalent. But then I hardly ever did vlookup in Excel, being so SAS oriented since grad-school.

Thanks for this...the second option is pretty much exactly what I had in mind. Per my other reply I just didn't initially search using the term "lookup" but it makes perfect sense now.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.