Lisa5
August 17, 2022, 1:29pm
1
Hello everyone,
I have to export my final dataset to an excel sheet. Therefore I would like to restructure the dataframe.
The dataset includes different groups (here indicated as Patient1 and Patient2). This groups were treated (treatment_1 and treatment_2) and together with the suitable controls (control_1 for treatment_1, control_2 belonging to treatment_2) measured. All parameters were detected 4 times (including outliers, indicated as NA).
The aim is to get a dataframe showing all treatment and control groups (= Sample) in the 1st column, followed by the 4 measurements of each patient (Please see "wished output").
I already tried to use pivot_wider (tidyr) but failed to get correct output.
Would be nice if anyone can suggest a solution for this problem.
Thanks in advance.
Original dataset:
df <- data.frame(Sample = rep(c("treatment_1", "control_1",
"treatment_2", "control_2"), each = 4),
Patient1 = rep(c("1"), each = 16),
Patient2 = c(1.1, 1.8, NA, 1, 1.1, 1.45, 1.32, 1.2, 1.1,
1.4, 1.6, 1.1, NA, 1.22, 1.21, 1.2))
"wished output" should look like:
df_aim <- data.frame(Sample = rep(c("treatment_1", "control_1",
"treatment_2", "control_2")),
Patient1_1 = rep(c("1"), each = 4),
Patient1_2 = rep(c("1"), each = 4),
Patient1_3 = rep(c("1"), each = 4),
Patient1_4 = rep(c("1"), each = 4),
Patient2_1 = c(1.1, 1.1, 1.1, NA),
Patient2_2 = c(1.8, 1.45, 1.4, 1.22),
Patient2_3 = c(NA, 1.32, 1.6, 1.21),
Patient2_4 = c(1, 1.2, 1.1, 1.2))
FJCC
August 17, 2022, 2:13pm
2
Here is one method using both pivot_longer and pivot_wider.
library(dplyr)
library(tidyr)
df <- data.frame(Sample = rep(c("treatment_1", "control_1",
"treatment_2", "control_2"), each = 4),
Patient1 = rep(c(1), each = 16),
Patient2 = c(1.1, 1.8, NA, 1, 1.1, 1.45, 1.32, 1.2, 1.1,
1.4, 1.6, 1.1, NA, 1.22, 1.21, 1.2))
df
#> Sample Patient1 Patient2
#> 1 treatment_1 1 1.10
#> 2 treatment_1 1 1.80
#> 3 treatment_1 1 NA
#> 4 treatment_1 1 1.00
#> 5 control_1 1 1.10
#> 6 control_1 1 1.45
#> 7 control_1 1 1.32
#> 8 control_1 1 1.20
#> 9 treatment_2 1 1.10
#> 10 treatment_2 1 1.40
#> 11 treatment_2 1 1.60
#> 12 treatment_2 1 1.10
#> 13 control_2 1 NA
#> 14 control_2 1 1.22
#> 15 control_2 1 1.21
#> 16 control_2 1 1.20
df <- df |> group_by(Sample) |> mutate(Index=row_number())
df
#> # A tibble: 16 x 4
#> # Groups: Sample [4]
#> Sample Patient1 Patient2 Index
#> <chr> <dbl> <dbl> <int>
#> 1 treatment_1 1 1.1 1
#> 2 treatment_1 1 1.8 2
#> 3 treatment_1 1 NA 3
#> 4 treatment_1 1 1 4
#> 5 control_1 1 1.1 1
#> 6 control_1 1 1.45 2
#> 7 control_1 1 1.32 3
#> 8 control_1 1 1.2 4
#> 9 treatment_2 1 1.1 1
#> 10 treatment_2 1 1.4 2
#> 11 treatment_2 1 1.6 3
#> 12 treatment_2 1 1.1 4
#> 13 control_2 1 NA 1
#> 14 control_2 1 1.22 2
#> 15 control_2 1 1.21 3
#> 16 control_2 1 1.2 4
df_long <- pivot_longer(df,cols = c("Patient1","Patient2"))
df_wide <- pivot_wider(df_long,names_from = c("name","Index"),values_from = "value")
df_wide
#> # A tibble: 4 x 9
#> # Groups: Sample [4]
#> Sample Patient1_1 Patient2_1 Patient1_2 Patient2_2 Patient1_3 Patient2_3
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 treatment_1 1 1.1 1 1.8 1 NA
#> 2 control_1 1 1.1 1 1.45 1 1.32
#> 3 treatment_2 1 1.1 1 1.4 1 1.6
#> 4 control_2 1 NA 1 1.22 1 1.21
#> # ... with 2 more variables: Patient1_4 <dbl>, Patient2_4 <dbl>
Created on 2022-08-17 by the reprex package (v2.0.1)
system
Closed
August 24, 2022, 2:13pm
3
This topic was automatically closed 7 days after the last reply. New replies are no longer allowed. If you have a query related to it or one of the replies, start a new topic and refer back with a link.