Creating new rows separated for each year

viktoria.schaeff · February 23, 2021, 7:41am

Hello,

I have a little trouble with visualising my data. I have receptive measurements and want to display them in 1 graph. I figured for this I need the columns Year 1, Year 2 and Year2.

a cut down version of my current dataset looks like this:

PatientID | Score Year1| Score Year 2 | Score Year 3
1 | 99 | 55 | 11
2 | 52 | 22 | 1
3 | 23 | 75 | 89
4 | 22 | 84 | 51
...

and I want to convert it to:

Patient ID | Year | Score
1 | 1 | 99
1 | 2 | 55
1 | 3 | 11
2 | 1 | 52
2 | 2 | 22
2 | 3 | 1
3 | 1 | 23
3 | 2 | 75
3 | 3 | 89
4 | 1 | 22
4 | 2 | 81
4 | 3 | 54

I feel like there is a knot in my head on how to do this. Maybe someone can help with a little advise how to do this best with rstudio

Thank you for any kind of help here

gongcastro · February 23, 2021, 8:04am

Hi @viktoria.schaeff! I think the function pivot_longer from the package tidyr is what you're looking for. Something like this:

# d is your dataframe
d <- pivot_longer(d, cols = matches("Year"),  names_to = "Year", values_to = "Score")

Matthias · February 23, 2021, 8:05am

Your data:

structure(list(PatientID = c(1, 2, 3, 4), `Score Year1` = c(99, 
52, 23, 22), `Score Year 2` = c(55, 22, 75, 84), `Score Year 3` = c(11, 
1, 89, 51)), class = c("spec_tbl_df", "tbl_df", "tbl", "data.frame"
), row.names = c(NA, -4L), spec = structure(list(cols = list(
    PatientID = structure(list(), class = c("collector_double", 
    "collector")), `Score Year1` = structure(list(), class = c("collector_double", 
    "collector")), `Score Year 2` = structure(list(), class = c("collector_double", 
    "collector")), `Score Year 3` = structure(list(), class = c("collector_double", 
    "collector"))), default = structure(list(), class = c("collector_guess", 
"collector")), skip = 1), class = "col_spec"))

You need to convert it into the so called "long " format. For this you can use the pivot_longer() function from the tidyr package:

restructured = pivot_longer(data, 
                    cols = -PatientID,                 # column PatientID should be kept
                    values_to = "Score",               # the result should go into column "Score"
                    names_to = "Year",                 # years shoud go into column "Year"
                    names_prefix = "Score Year") %>%   # the year columns contain now some text
  mutate(Year = as.numeric(Year))

restructured

A tibble: 12 x 3

PatientID Year Score

1 1 1 99
2 1 2 55
3 1 3 11
4 2 1 52
5 2 2 22
6 2 3 1
7 3 1 23
8 3 2 75
9 3 3 89
10 4 1 22
11 4 2 84
12 4 3 51

viktoria.schaeff · February 23, 2021, 10:23am

Thank you so much for your reply. I also think that the pivot_longer is the function I am looking for. I encountered an error message saying " Error: Can't combine dtsgadt_clean and day ." because I have much more variables in my dataset which seem to cause some trouble with the picot function.

I was finally able to create a better minimal working data with only the desired variables. I could also upload a complete version including all variables if that would be better.

structure(list(PateintID = c(1000002, 1000007, 1000008, 1000016,
1000033, 1000034), Score_3y = c(12, 2, 12, 10, 11, 17), Score_2y = c(9,
9, 12, 11, 12, 26), Score_1y = c(12, 16, 12, 6, 12, 12), rating=c(44, 33, 22, 11, 99, 75)), row.names = c(NA,
6L), class = "data.frame")

lars · February 23, 2021, 11:26am

Hi Viktoria,

Maybe the following helps you forward. If you still get the "can't combine" error, can you provide us more details in that respect?

library(tidyverse)
df <- structure(list(PatientID = c(1000002, 1000007, 1000008, 1000016, 1000033, 1000034), 
                       Score_3y = c(12, 2, 12, 10, 11, 17), 
                       Score_2y = c(9, 9, 12, 11, 12, 26), 
                       Score_1y = c(12, 16, 12, 6, 12, 12),
                       rating = c(44, 33, 22, 11, 99, 75)
                       ), 
                  row.names = c(NA, 6L), 
                  class = "data.frame"
                  )

g <- df %>% 
  pivot_longer(cols = starts_with("Score_"),
               ## remove the prefix before storing the old column name as value 
               ## in the new column name as specified in `names_to`.
               names_prefix = "Score_",
               names_to = "score",
               values_to = "score_value",
               ) %>%
  ## this is one approach of using distinct colouring
  mutate(PatientID = as.factor(PatientID)) %>% 
  ggplot(aes(score, score_value, color = PatientID, group = PatientID)) +
  geom_point() +
  geom_line()
  
g

## due to overlapping, show plot for each patient separately
g + 
  facet_wrap(~ PatientID) +
  theme(legend.position = "none")

^{Created on 2021-02-23 by the reprex package (v1.0.0)}

viktoria.schaeff · February 23, 2021, 12:36pm

Thank you so much ! and also for the code for the visualisation this code did the trick for me

system · March 2, 2021, 12:36pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.