I am working with a dataset that I need to convert from wide form to tall form.
Each row is a single subject. For each subject, the same variable was measured once daily for several months. These data are saved in columns entitled "daily_measure_X", where each part of the name means the following:
- "daily_" indicates that this is one of the variables measured daily, and is important, as some of them were also measured at other times, e.g. baseline (and would be titled "baseline_measure").
- "measure" refers to the variable name - there's multiple variables, so the same pattern applies for additional variables - thus "measure" is only one of the variables here.
- "X" is the day.
The included reprex produces the desired output - however, it is somewhat clumsy and inefficient, as it first separates "daily_measure_X" into three variables, splitting at each underscore, followed by re-uniting the first two new variables.
library(tidyr)
library(dplyr) # For select helper functions
# Input data
example_data <- data.frame(subject_ID = 1:5,
daily_measure_1 = 6:10,
daily_measure_2 = 11:15,
daily_measure_3 = 16:20,
daily_measure_4 = 21:25,
daily_measure_5 = 26:30)
example_data
#> subject_ID daily_measure_1 daily_measure_2 daily_measure_3
#> 1 1 6 11 16
#> 2 2 7 12 17
#> 3 3 8 13 18
#> 4 4 9 14 19
#> 5 5 10 15 20
#> daily_measure_4 daily_measure_5
#> 1 21 26
#> 2 22 27
#> 3 23 28
#> 4 24 29
#> 5 25 30
example_data %>%
gather(key = "full_name", value = "value", starts_with("daily_")) %>%
separate(full_name, into = c("V1", "V2", "day")) %>%
unite("variable", V1, V2)
#> subject_ID variable day value
#> 1 1 daily_measure 1 6
#> 2 2 daily_measure 1 7
#> 3 3 daily_measure 1 8
#> 4 4 daily_measure 1 9
#> 5 5 daily_measure 1 10
#> 6 1 daily_measure 2 11
#> 7 2 daily_measure 2 12
#> 8 3 daily_measure 2 13
#> 9 4 daily_measure 2 14
#> 10 5 daily_measure 2 15
#> 11 1 daily_measure 3 16
#> 12 2 daily_measure 3 17
#> 13 3 daily_measure 3 18
#> 14 4 daily_measure 3 19
#> 15 5 daily_measure 3 20
#> 16 1 daily_measure 4 21
#> 17 2 daily_measure 4 22
#> 18 3 daily_measure 4 23
#> 19 4 daily_measure 4 24
#> 20 5 daily_measure 4 25
#> 21 1 daily_measure 5 26
#> 22 2 daily_measure 5 27
#> 23 3 daily_measure 5 28
#> 24 4 daily_measure 5 29
#> 25 5 daily_measure 5 30
Another approach would be to use stringr and replace the first "" with something else. Both approaches gets the job done, but I'm interested to hear if anybody could come up with a more elegant solution that splits at the second or ideally last occurence of the "" character?
Maybe it could be done using a regex (to detect the first occurence starting from the end of the string) , but I have no prior experience with this and have not been able to produce a solution using this approach.
Thanks in advance.