Create duplicate rows based on a condition

Hello, I am working with publicly available data on water systems. Currently, I have tidy data with one line per water system (my unit of observation). However, some of my water systems serve more than one county, for example, in the first row of my reproducible example, the system serves 2 counties separated by a comma in the column Counties Served "Jackson, White".

For my analysis, I need long data where each water system only has one county per row. Does anyone know how to create duplicates of each row, with only one county listed per row? Tidyverse solutions would be most appreciated.

Many thanks in advance.

structure(list(`PWS Name` = c("BALD KNOB NORTH WATER ASSOC", 
"S P G WATER ASSOCIATION", "COLORADO RIVER MUNICIPAL WATER DISTRICT", 
"EMMET WATERWORKS", "INDEPENDENCE JACKSON REGIONAL", "COMMUNITY WATER SYSTEM", 
"WEST STONE COUNTY WATER ASSN", "MADISON CO REGIONAL WATER", 
"SEVIER CO WATER ASSOCIATION", "LIBERTY WOODSON HENSLEY", "SOUTHWEST WATER USERS LLC", 
"BENTON-WASHINGTON REGIONAL PWA", "CLAY CO REG WATER DISTRICT", 
"MADISON CO WATER FACILITIES BD", "LONOKE WHITE PUBLIC WATER AUTHORITY", 
"LAWRENCE CO REG WATER DIST", "SARDIS WATER ASSOCIATION", "MID-ARKANSAS UTILITIES P.W.A", 
"LAFE REGIONAL WATER DISTRIBUTION DISTRIC", "OZARK MTN REGIONAL PWA", 
"WASHINGTON WATER AUTHORITY WWA", "WESTERN GROVE MUNICIPAL WATER", 
"MOUNT OLIVE WATER ASSOCIATION", "BUFFALO ISLAND REG WATER DIST", 
"PERRYVILLE WATERWORKS", "VAN BUREN COUNTY W U A", "TULL WATER", 
"LURTON-PELSOR WATER ASSOC", "CABOT WATERWORKS", "CARROLL-BOONE WATER DISTRICT", 
"JONESBORO WATER SYSTEM", "SE WHITE COUNTY WATER ASSOC", "CASA WATER DEPT", 
"CROSS COUNTY RURAL WATER SYS", "CENTER GROVE WATER ASSOCIATION", 
"DES ARC WATERWORKS", "VANNDALE-BIRDEYE WATER ASSOC", "VALLEY SPRINGS WATERWORKS", 
"SW BOONE COUNTY WATER ASSOC", "SPRINGDALE WATER UTILITIES", 
"CHEROKEE VILLAGE WATER ASSOC", "BODCAW RURAL WATER SYSTEM", 
"ASH FLAT WATER COMPANY", "BRECKENRIDGE UNION WATER ASSOC", "ENTRANOSA WATER AND WASTEWATER COOP", 
"QUITMAN WATERWORKS", "KROOKED KREEK WATER ASSN", "HIGHLAND WATER ASSOCIATION", 
"GRANGE-CALAMINE WATER ASSOC", "CAVE CITY WATERWORKS", "WYE MOUNTAIN WATER ASSOCIATION"
), `State Code` = c("AR", "AR", "TX", "AR", "AR", "AR", "AR", 
"AR", "AR", "AR", "AR", "AR", "AR", "AR", "AR", "AR", "AR", "AR", 
"AR", "AR", "AR", "AR", "AR", "AR", "AR", "AR", "AR", "AR", "AR", 
"AR", "AR", "AR", "AR", "AR", "AR", "AR", "AR", "AR", "AR", "AR", 
"AR", "AR", "AR", "AR", "NM", "AR", "AR", "AR", "AR", "AR", "AR"
), `Counties Served` = c("Jackson, White", "Marion, Searcy", 
"Coke, Concho, Ector, Howard, Martin, Scurry, Ward", "Hempstead, Nevada", 
"Independence, Jackson, White", "Cleburne, Faulkner, Van Buren", 
"Cleburne, Searcy, Stone, Van Buren", "Benton, Carroll, Madison", 
"Polk, Sevier", "Pulaski, Saline", "Grant, Hot Spring, Saline", 
"Benton, Washington", "Clay, Greene, Randolph", "Benton, Madison, Washington", 
"Cleburne, Lonoke, White", "Craighead, Lawrence, Randolph", "Grant, Saline", 
"Faulkner, Pulaski", "Clay, Greene", "Boone, Newton, Searcy", 
"Benton, Washington", "Boone, Newton, Searcy", "Madison, Washington", 
"Craighead, Mississippi, Poinsett", "Perry, Pulaski", "Pope, Searcy, Van Buren", 
"Grant, Saline", "Newton, Pope", "Lonoke, Pulaski", "Boone, Carroll", 
"Craighead, Greene, Poinsett", "Prairie, White", "Conway, Perry, Yell", 
"Craighead, Cross, Jackson, Poinsett, St. Francis, Woodruff", 
"Grant, Jefferson", "Lonoke, Prairie", "Cross, Poinsett", "Boone, Marion", 
"Boone, Newton", "Benton, Washington", "Fulton, Sharp", "Hempstead, Nevada", 
"Fulton, Sharp", "Jackson, Woodruff", "Bernalillo, Santa Fe", 
"Cleburne, Faulkner, Van Buren", "Boone, Newton", "Fulton, Sharp", 
"Independence, Sharp", "Independence, Sharp", "Perry, Pulaski"
)), row.names = c(NA, -51L), class = c("tbl_df", "tbl", "data.frame"
))

You can simply use tidyr::separate_rows(). For example, loading your data as xx:

xx <- structure(...)

library(tidyverse)

xx |>
  separate_rows(`Counties Served`, sep = ", ")
#> # A tibble: 128 × 3
#>    `PWS Name`                              `State Code` `Counties Served`
#>    <chr>                                   <chr>        <chr>            
#>  1 BALD KNOB NORTH WATER ASSOC             AR           Jackson          
#>  2 BALD KNOB NORTH WATER ASSOC             AR           White            
#>  3 S P G WATER ASSOCIATION                 AR           Marion           
#>  4 S P G WATER ASSOCIATION                 AR           Searcy           
#>  5 COLORADO RIVER MUNICIPAL WATER DISTRICT TX           Coke             
#>  6 COLORADO RIVER MUNICIPAL WATER DISTRICT TX           Concho           
#>  7 COLORADO RIVER MUNICIPAL WATER DISTRICT TX           Ector            
#>  8 COLORADO RIVER MUNICIPAL WATER DISTRICT TX           Howard           
#>  9 COLORADO RIVER MUNICIPAL WATER DISTRICT TX           Martin           
#> 10 COLORADO RIVER MUNICIPAL WATER DISTRICT TX           Scurry           
#> # ℹ 118 more rows

Created on 2023-04-19 with reprex v2.0.2

It appears separate_rows() is being superseeded by separate_longer_delim(), the example becomes:

xx |>
  separate_longer_delim(cols = `Counties Served`, delim = ", ")

That's excellent that there's a function for it. Thank you so much.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.