I have wide data with multiple sets of columns that I want to convert to long format. It seems like pivot_longer should be able to handle this, but either I haven't found the right example or I haven't understood the examples I've found. (The help for pivot_longer says it's intended to be simpler to use than gather, but, except where pivot_longer does the same thing as gather (which I find easy to use), I find it difficult to fathom.)
Here's my use case: I have a data frame d that I receive in the following form (reproducible data is at the end of this question):
d
ID EXT ORG 1 INSTITUTION 1 TCE STATUS 1 EXT ORG 2 INSTITUTION 2 TCE STATUS 2 EXT ORG 3
1 001 C30000191 American River College <NA> C30000192 Sacramento City College <NA> <NA>
2 004 C30000192 Sacramento City College <NA> C30006210 Folsom Lake Col <NA> <NA>
3 003 C30000156 Diablo Valley Col Posted C30000192 Sacramento City College <NA> <NA>
4 002 C30000171 Grossmont College <NA> C30000191 American River College Posted C30000192
INSTITUTION 3 TCE STATUS 3 EXT ORG 10 INSTITUTION 10 TCE STATUS 10
1 <NA> <NA> <NA> <NA> <NA>
2 <NA> <NA> <NA> <NA> <NA>
3 <NA> <NA> <NA> <NA> <NA>
4 Sacramento City College Posted C30006268 Woodland Comm Col Posted
Each row is a unique ID, followed by four sets of three columns whose prefixes are EXT ORG, INSTITUTION, and TCE STATUS. Each set of three columns is uniquely identified by the final number, which is either 1 or 2 digits. (My real data has 10 sets of three columns, but this simplified version covers my real use case.)
I want to covert this to long format, which I'm currently doing as follows:
library(tidyverse)
c(1:3,10) %>%
map_df(~d %>%
select(ID, matches(paste0(.x,"$"))) %>%
filter_at(vars(matches("EXT ORG")), all_vars(!is.na(.))) %>%
rename_at(vars(matches("[0-9]{1,2}")), ~ gsub(" [0-9]{1,2}", "", .)) %>%
mutate(institution.number=.x)) %>%
arrange(ID, institution.number)
ID `EXT ORG` INSTITUTION `TCE STATUS` institution.number
<chr> <chr> <chr> <chr> <dbl>
1 001 C30000191 American River College <NA> 1
2 001 C30000192 Sacramento City College <NA> 2
3 002 C30000171 Grossmont College <NA> 1
4 002 C30000191 American River College Posted 2
5 002 C30000192 Sacramento City College Posted 3
6 002 C30006268 Woodland Comm Col Posted 10
7 003 C30000156 Diablo Valley Col Posted 1
8 003 C30000192 Sacramento City College <NA> 2
9 004 C30000192 Sacramento City College <NA> 1
10 004 C30006210 Folsom Lake Col <NA> 2
I'd like to switch to pivot_longer, which I hope and expect will require less code. I've tried various approaches with pivot_longer, for example,
d %>%
pivot_longer(cols=`EXT ORG 1`:`TCE STATUS 10`,
names_to=c("EXT ORG", "INSTITUTION", "TCE STATUS", ".value"),
names_pattern="(.* ?.*) ([0-9]{1,2}$)")
# Error: `regex` should define 4 groups; 2 found.
none of which worked.
So, my question is, how can I do this with pivot_longer?
Reproducible data:
d = structure(list(ID = c("001", "004", "003", "002"), `EXT ORG 1` = c("C30000191",
"C30000192", "C30000156", "C30000171"), `INSTITUTION 1` = c("American River College",
"Sacramento City College", "Diablo Valley Col", "Grossmont College"
), `TCE STATUS 1` = c(NA, NA, "Posted", NA), `EXT ORG 2` = c("C30000192",
"C30006210", "C30000192", "C30000191"), `INSTITUTION 2` = c("Sacramento City College",
"Folsom Lake Col", "Sacramento City College", "American River College"
), `TCE STATUS 2` = c(NA, NA, NA, "Posted"), `EXT ORG 3` = c(NA,
NA, NA, "C30000192"), `INSTITUTION 3` = c(NA, NA, NA, "Sacramento City College"
), `TCE STATUS 3` = c(NA, NA, NA, "Posted"), `EXT ORG 10` = c(NA,
NA, NA, "C30006268"), `INSTITUTION 10` = c(NA, NA, NA, "Woodland Comm Col"
), `TCE STATUS 10` = c(NA, NA, NA, "Posted")), row.names = c(NA,
-4L), class = c("tbl_df", "tbl", "data.frame"))