subset by rows that contain certain characters?

What I have is a large dataset where each observation contains characters for the variable called EA, for example observation 1 EA is "Los Angeles, CA".

I want to select only rows that contain "CA" for in the variable called EA.

Is there a way to do this?

I used grepl to get a string of FALSE and TRUE's, how do I use this to select the rows?

You can do something like this, but very likely you might need to refine the regex pattern for your actual application.

library(tidyverse)

large_dataset <- data.frame(stringsAsFactors = FALSE,
                            EA = c("Los Angeles, CA", "Other text")
)

large_dataset %>% 
    filter(str_detect(EA, pattern = "CA"))
#>                EA
#> 1 Los Angeles, CA

Created on 2019-11-22 by the reprex package (v0.3.0.9000)

1 Like

You can just use the square brackets to subset based on a TRUE/FALSE statement:

example:

df <- df[grepl('CA', df$EA),]