I have wide data with multiple sets of columns that I want to convert to long format. It seems like pivot_longer
should be able to handle this, but either I haven't found the right example or I haven't understood the examples I've found. (The help for pivot_longer
says it's intended to be simpler to use than gather
, but, except where pivot_longer
does the same thing as gather
(which I find easy to use), I find it difficult to fathom.)
Here's my use case: I have a data frame d
that I receive in the following form (reproducible data is at the end of this question):
d
ID EXT ORG 1 INSTITUTION 1 TCE STATUS 1 EXT ORG 2 INSTITUTION 2 TCE STATUS 2 EXT ORG 3
1 001 C30000191 American River College <NA> C30000192 Sacramento City College <NA> <NA>
2 004 C30000192 Sacramento City College <NA> C30006210 Folsom Lake Col <NA> <NA>
3 003 C30000156 Diablo Valley Col Posted C30000192 Sacramento City College <NA> <NA>
4 002 C30000171 Grossmont College <NA> C30000191 American River College Posted C30000192
INSTITUTION 3 TCE STATUS 3 EXT ORG 10 INSTITUTION 10 TCE STATUS 10
1 <NA> <NA> <NA> <NA> <NA>
2 <NA> <NA> <NA> <NA> <NA>
3 <NA> <NA> <NA> <NA> <NA>
4 Sacramento City College Posted C30006268 Woodland Comm Col Posted
Each row is a unique ID
, followed by four sets of three columns whose prefixes are EXT ORG
, INSTITUTION
, and TCE STATUS
. Each set of three columns is uniquely identified by the final number, which is either 1 or 2 digits. (My real data has 10 sets of three columns, but this simplified version covers my real use case.)
I want to covert this to long format, which I'm currently doing as follows:
library(tidyverse)
c(1:3,10) %>%
map_df(~d %>%
select(ID, matches(paste0(.x,"$"))) %>%
filter_at(vars(matches("EXT ORG")), all_vars(!is.na(.))) %>%
rename_at(vars(matches("[0-9]{1,2}")), ~ gsub(" [0-9]{1,2}", "", .)) %>%
mutate(institution.number=.x)) %>%
arrange(ID, institution.number)
ID `EXT ORG` INSTITUTION `TCE STATUS` institution.number
<chr> <chr> <chr> <chr> <dbl>
1 001 C30000191 American River College <NA> 1
2 001 C30000192 Sacramento City College <NA> 2
3 002 C30000171 Grossmont College <NA> 1
4 002 C30000191 American River College Posted 2
5 002 C30000192 Sacramento City College Posted 3
6 002 C30006268 Woodland Comm Col Posted 10
7 003 C30000156 Diablo Valley Col Posted 1
8 003 C30000192 Sacramento City College <NA> 2
9 004 C30000192 Sacramento City College <NA> 1
10 004 C30006210 Folsom Lake Col <NA> 2
I'd like to switch to pivot_longer
, which I hope and expect will require less code. I've tried various approaches with pivot_longer
, for example,
d %>%
pivot_longer(cols=`EXT ORG 1`:`TCE STATUS 10`,
names_to=c("EXT ORG", "INSTITUTION", "TCE STATUS", ".value"),
names_pattern="(.* ?.*) ([0-9]{1,2}$)")
# Error: `regex` should define 4 groups; 2 found.
none of which worked.
So, my question is, how can I do this with pivot_longer
?
Reproducible data:
d = structure(list(ID = c("001", "004", "003", "002"), `EXT ORG 1` = c("C30000191",
"C30000192", "C30000156", "C30000171"), `INSTITUTION 1` = c("American River College",
"Sacramento City College", "Diablo Valley Col", "Grossmont College"
), `TCE STATUS 1` = c(NA, NA, "Posted", NA), `EXT ORG 2` = c("C30000192",
"C30006210", "C30000192", "C30000191"), `INSTITUTION 2` = c("Sacramento City College",
"Folsom Lake Col", "Sacramento City College", "American River College"
), `TCE STATUS 2` = c(NA, NA, NA, "Posted"), `EXT ORG 3` = c(NA,
NA, NA, "C30000192"), `INSTITUTION 3` = c(NA, NA, NA, "Sacramento City College"
), `TCE STATUS 3` = c(NA, NA, NA, "Posted"), `EXT ORG 10` = c(NA,
NA, NA, "C30006268"), `INSTITUTION 10` = c(NA, NA, NA, "Woodland Comm Col"
), `TCE STATUS 10` = c(NA, NA, NA, "Posted")), row.names = c(NA,
-4L), class = c("tbl_df", "tbl", "data.frame"))