Reorganizing dataframe with various number of measurements

Lisa5 · August 17, 2022, 1:29pm

Hello everyone,
I have to export my final dataset to an excel sheet. Therefore I would like to restructure the dataframe.
The dataset includes different groups (here indicated as Patient1 and Patient2). This groups were treated (treatment_1 and treatment_2) and together with the suitable controls (control_1 for treatment_1, control_2 belonging to treatment_2) measured. All parameters were detected 4 times (including outliers, indicated as NA).

The aim is to get a dataframe showing all treatment and control groups (= Sample) in the 1st column, followed by the 4 measurements of each patient (Please see "wished output").
I already tried to use pivot_wider (tidyr) but failed to get correct output.
Would be nice if anyone can suggest a solution for this problem.

Thanks in advance.

Original dataset:

df <- data.frame(Sample = rep(c("treatment_1", "control_1",
"treatment_2", "control_2"), each = 4),
Patient1 = rep(c("1"), each = 16),
Patient2 = c(1.1, 1.8, NA, 1, 1.1, 1.45, 1.32, 1.2, 1.1,
1.4, 1.6, 1.1, NA, 1.22, 1.21, 1.2))

"wished output" should look like:

df_aim <- data.frame(Sample = rep(c("treatment_1", "control_1",
"treatment_2", "control_2")),
Patient1_1 = rep(c("1"), each = 4),
Patient1_2 = rep(c("1"), each = 4),
Patient1_3 = rep(c("1"), each = 4),
Patient1_4 = rep(c("1"), each = 4),
Patient2_1 = c(1.1, 1.1, 1.1, NA),
Patient2_2 = c(1.8, 1.45, 1.4, 1.22),
Patient2_3 = c(NA, 1.32, 1.6, 1.21),
Patient2_4 = c(1, 1.2, 1.1, 1.2))

FJCC · August 17, 2022, 2:13pm

Here is one method using both pivot_longer and pivot_wider.

library(dplyr)
library(tidyr)
df <- data.frame(Sample = rep(c("treatment_1", "control_1",
                                "treatment_2", "control_2"), each = 4),
                 Patient1 = rep(c(1), each = 16),
                 Patient2 = c(1.1, 1.8, NA, 1, 1.1, 1.45, 1.32, 1.2, 1.1,
                              1.4, 1.6, 1.1, NA, 1.22, 1.21, 1.2))
df
#>         Sample Patient1 Patient2
#> 1  treatment_1        1     1.10
#> 2  treatment_1        1     1.80
#> 3  treatment_1        1       NA
#> 4  treatment_1        1     1.00
#> 5    control_1        1     1.10
#> 6    control_1        1     1.45
#> 7    control_1        1     1.32
#> 8    control_1        1     1.20
#> 9  treatment_2        1     1.10
#> 10 treatment_2        1     1.40
#> 11 treatment_2        1     1.60
#> 12 treatment_2        1     1.10
#> 13   control_2        1       NA
#> 14   control_2        1     1.22
#> 15   control_2        1     1.21
#> 16   control_2        1     1.20

df <- df |> group_by(Sample) |> mutate(Index=row_number())
df
#> # A tibble: 16 x 4
#> # Groups:   Sample [4]
#>    Sample      Patient1 Patient2 Index
#>    <chr>          <dbl>    <dbl> <int>
#>  1 treatment_1        1     1.1      1
#>  2 treatment_1        1     1.8      2
#>  3 treatment_1        1    NA        3
#>  4 treatment_1        1     1        4
#>  5 control_1          1     1.1      1
#>  6 control_1          1     1.45     2
#>  7 control_1          1     1.32     3
#>  8 control_1          1     1.2      4
#>  9 treatment_2        1     1.1      1
#> 10 treatment_2        1     1.4      2
#> 11 treatment_2        1     1.6      3
#> 12 treatment_2        1     1.1      4
#> 13 control_2          1    NA        1
#> 14 control_2          1     1.22     2
#> 15 control_2          1     1.21     3
#> 16 control_2          1     1.2      4
df_long <- pivot_longer(df,cols = c("Patient1","Patient2"))
df_wide <- pivot_wider(df_long,names_from = c("name","Index"),values_from = "value")
df_wide
#> # A tibble: 4 x 9
#> # Groups:   Sample [4]
#>   Sample      Patient1_1 Patient2_1 Patient1_2 Patient2_2 Patient1_3 Patient2_3
#>   <chr>            <dbl>      <dbl>      <dbl>      <dbl>      <dbl>      <dbl>
#> 1 treatment_1          1        1.1          1       1.8           1      NA   
#> 2 control_1            1        1.1          1       1.45          1       1.32
#> 3 treatment_2          1        1.1          1       1.4           1       1.6 
#> 4 control_2            1       NA            1       1.22          1       1.21
#> # ... with 2 more variables: Patient1_4 <dbl>, Patient2_4 <dbl>

^{Created on 2022-08-17 by the reprex package (v2.0.1)}

system · August 24, 2022, 2:13pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.