Using pivot_longer (wide to long) with more than one and irregular separators

LS_Dufner · May 19, 2022, 2:31pm

Dear community,

I want to switch a dataframe from wide to long format.
The variables look like this (more than 500 vars), e.g.:
pid, d1reputin, d1dispu, d2reputin, d2dispu, d3reputin, d3dispu, w1wiss, m1gradv, w2wiss, m2gradv, w3wiss, w3gradv usw.

So the variables are separated/identified by

a letter in the first place, followed by
a number (representing the survey period), finishing with
an individual varname.

Having just started working with R, I did not come too far with this task.
I have tried options with pivot_longer using names_pattern and names_sep...
The examples for both options usually deal with an underscore such as
names_sep = "_"
but don´t give solutions for tidying data with two separators or irregluar ones such as letters and numbers.

I managed to start in a simple way for a single set of similar vars/small subset but did not really come further than this:

longdata <- widedata %>%
  pivot_longer(names_to = "survey",
               values_to= "wiss",
               cols = 2:5,
              )

This gives me dataframe in long format for the variables w1wiss to w5wiss with the vars "pid" "survey" (w1, w2, w3, w4, w5) and "wiss" (1,2,3,4,5) but still is far from what I am looking for...

Are there any experienced users of pivot_longer around here who have some hints for me how to solve this?

Thank you!

FJCC · May 19, 2022, 2:54pm

It would help a lot if you posted code to make examples of the data before and after the reshaping. Use the function data.frame() or tibble() and make small examples, just a few columns and rows. Post those function calls and then we can all work with the same data and see exactly what result you want.

LS_Dufner · May 20, 2022, 8:17am

Okay, thanks for the suggestion. I tried to create a small dataframe to work with. This is the version in wide format that I need to transform into long format:

widedata <- structure(list(pid = c(1, 2, 3, 4, 5, 6, 7), d1reputin = c(3, 2, NA, 1, 1, NA, 4), d1dispu = c(10, 5, 2, NA, 3, 5, 3), d2reputin = c(4, 3, 2, 4, 1, 1, 3), d2dispu = c(10, 5, 2, NA, 3, 5, 3), d3reputin = c(3, 2, NA, 5, 2, 3, 4), d3dispu = c(10, 5, 2, NA, 3, 5, 6), w1wiss = c(5, 5, 4, 5, 3, 1, 5), m1gradv = c(0, NA, 1, 1, 0, 1, 0), w2wiss = c(5, 4, 4, 5, 3, 3, 5), m2gradv = c(0, 1, 1, 1, 0, NA, NA), w3wiss = c(5, 5, 4, 5, 5, 3, 4), w3gradv = c(0, NA, 1, 1, 0, NA, 1)), row.names = c(NA, 7L), class = "data.frame")

FJCC · May 20, 2022, 2:01pm

I am not sure of the output you want since you did not post an example of the final data frame. I have guessed that you want three columns in addition to the pid column. I give two examples of regular expressions to divide the column names into variable values. The first one assumes that the "survey", the letter before the number, is always one character and that the number is also always one character. The second one allows for the survey to be multiple letters and for the number to have multiple digits.

library(tidyverse)
#> Warning: package 'tibble' was built under R version 4.1.2
widedata <- structure(list(pid = c(1, 2, 3, 4, 5, 6, 7), 
                           d1reputin = c(3, 2, NA, 1, 1, NA, 4), 
                           d1dispu = c(10, 5, 2, NA, 3, 5, 3), 
                           d2reputin = c(4, 3, 2, 4, 1, 1, 3), 
                           d2dispu = c(10, 5, 2, NA, 3, 5, 3), 
                           d3reputin = c(3, 2, NA, 5, 2, 3, 4), 
                           d3dispu = c(10, 5, 2, NA, 3, 5, 6), 
                           w1wiss = c(5, 5, 4, 5, 3, 1, 5), 
                           m1gradv = c(0, NA, 1, 1, 0, 1, 0), 
                           w2wiss = c(5, 4, 4, 5, 3, 3, 5), 
                           m2gradv = c(0, 1, 1, 1, 0, NA, NA), 
                           w3wiss = c(5, 5, 4, 5, 5, 3, 4), 
                           w3gradv = c(0, NA, 1, 1, 0, NA, 1)), 
                      row.names = c(NA, 7L), class = "data.frame")
pivot_longer(widedata,cols = -pid,
             names_to = c("survey","period","var"),
             names_pattern = "(.)(.)(.+)") #(one character)(one character)(one or more characters)
#> # A tibble: 84 x 5
#>      pid survey period var     value
#>    <dbl> <chr>  <chr>  <chr>   <dbl>
#>  1     1 d      1      reputin     3
#>  2     1 d      1      dispu      10
#>  3     1 d      2      reputin     4
#>  4     1 d      2      dispu      10
#>  5     1 d      3      reputin     3
#>  6     1 d      3      dispu      10
#>  7     1 w      1      wiss        5
#>  8     1 m      1      gradv       0
#>  9     1 w      2      wiss        5
#> 10     1 m      2      gradv       0
#> # ... with 74 more rows
pivot_longer(widedata,cols = -pid,
             names_to = c("survey","period","var"),
             names_pattern = "([[:alpha:]]+)(\\d+)(.+)") #(one or more letters)(one or more numbers)(one or more characters)
#> # A tibble: 84 x 5
#>      pid survey period var     value
#>    <dbl> <chr>  <chr>  <chr>   <dbl>
#>  1     1 d      1      reputin     3
#>  2     1 d      1      dispu      10
#>  3     1 d      2      reputin     4
#>  4     1 d      2      dispu      10
#>  5     1 d      3      reputin     3
#>  6     1 d      3      dispu      10
#>  7     1 w      1      wiss        5
#>  8     1 m      1      gradv       0
#>  9     1 w      2      wiss        5
#> 10     1 m      2      gradv       0
#> # ... with 74 more rows

^{Created on 2022-05-20 by the reprex package (v2.0.1)}

LS_Dufner · May 23, 2022, 9:14am

Dear FJCC,
That already worked really well! Thank you! I used the second code and it does exactly what I am looking for with the dataframe! Where can I find explanations about this code in names_pattern to better understand how it works?

Now for my complete dataset with ALL variables I am still facing some difficulties. I get an error message about the combination of double and character variables. Is there a possibilty to include different types of variables with names_pattern or will I have to change the complete dataset to one vartype?

pivot_longer(cmpletedata,cols = -pid,
             names_to = c("theme","period","var"),
             names_pattern = "([[:alpha:]]+)(\\d+)(.+)")

Error in pivot_longer_spec():
! Can't combine x1tag and c4taet3plz .
Backtrace:

tidyr::pivot_longer(...)
tidyr:::pivot_longer.data.frame(...)
tidyr::pivot_longer_spec(...)
Error in pivot_longer_spec(data, spec, names_repair = names_repair, values_drop_na = values_drop_na, :

system · June 13, 2022, 9:15am

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.