How to organize my database for analysis
I have a database that I want to study the effect of a treatment, but I have two weights on two lines measuring for a single patient, pre and post the visit.
I want a code to separate the weights into columns, in preweight and post weight
This seems like a job for the pivot_wider() function from the tidyr package. Can you post some data so a more detailed answer can be provided? If your data frame is named DF, post the output of
dput(head(DF, 10))
Place a line with three back ticks just before and after the pasted output in your response, like this:
```
output of dput()
```
1 Like
> dput(head(dados,10))
structure(list(...1 = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10), ID = c(1,
1, 2, 2, 3, 3, 4, 4, 5, 5), TREAT = c("CONTROL", "CONTROL", "CONTROL",
"CONTROL", "CONTROL", "CONTROL", "CONTROL", "CONTROL", "CONTROL",
"CONTROL"), AGE = c(55, 55, 32, 32, 24, 24, 32, 32, 42, 42),
WEIGHT = c(84.5761825495586, 79.3926584958957, 75.266228456406,
75.1019421280918, 77.4664984122051, 75.2932810841592, 79.6836860743261,
75.7596164670626, 76.2523902602365, 76.1172594017774), LIVINGSTATUS = c("ALONE (SINGLE)",
"ALONE (SINGLE)", "ALONE (SINGLE)", "ALONE (SINGLE)", "WITH FRIENDS",
"WITH FRIENDS", "WITH PARTNER", "WITH PARTNER", "ALONE (SINGLE)",
"ALONE (SINGLE)"), SMOKESTATUS = c("SMOKER", "SMOKER", "QUIT-SMOKER",
"QUIT-SMOKER", "NON-SMOKER", "NON-SMOKER", "SMOKER", "SMOKER",
"SMOKER", "SMOKER"), GENDER = c("FEMALE", "FEMALE", "FEMALE",
"FEMALE", "MALE", "MALE", "FEMALE", "FEMALE", "MALE", "MALE"
), VISIT = c("PRE", "POST", "PRE", "POST", "PRE", "POST",
"PRE", "POST", "PRE", "POST")), row.names = c(NA, -10L), class = c("tbl_df",
"tbl", "data.frame"))
Your first column named ...1 doesn't seem to be helpful, so I dropped it, then I used pivot_wider() to rearrange the data.
dados <- structure(list(...1 = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10),
ID = c(1, 1, 2, 2, 3, 3, 4, 4, 5, 5),
TREAT = c("CONTROL", "CONTROL", "CONTROL",
"CONTROL", "CONTROL", "CONTROL",
"CONTROL", "CONTROL", "CONTROL",
"CONTROL"),
AGE = c(55, 55, 32, 32, 24, 24, 32, 32, 42, 42),
WEIGHT = c(84.5761825495586, 79.3926584958957,
75.266228456406, 75.1019421280918,
77.4664984122051, 75.2932810841592, 79.6836860743261,
75.7596164670626, 76.2523902602365, 76.1172594017774),
LIVINGSTATUS = c("ALONE (SINGLE)", "ALONE (SINGLE)", "ALONE (SINGLE)",
"ALONE (SINGLE)", "WITH FRIENDS",
"WITH FRIENDS", "WITH PARTNER", "WITH PARTNER",
"ALONE (SINGLE)", "ALONE (SINGLE)"),
SMOKESTATUS = c("SMOKER", "SMOKER", "QUIT-SMOKER",
"QUIT-SMOKER", "NON-SMOKER", "NON-SMOKER", "SMOKER", "SMOKER",
"SMOKER", "SMOKER"),
GENDER = c("FEMALE", "FEMALE", "FEMALE",
"FEMALE", "MALE", "MALE", "FEMALE", "FEMALE", "MALE", "MALE"
),
VISIT = c("PRE", "POST", "PRE", "POST", "PRE", "POST",
"PRE", "POST", "PRE", "POST")),
row.names = c(NA, -10L), class = c("tbl_df", "tbl", "data.frame"))
#######
library(tidyr)
dados$...1 <- NULL
dados_wide <- dados |>
pivot_wider(names_from = VISIT, values_from = WEIGHT)
dados_wide
#> # A tibble: 5 × 8
#> ID TREAT AGE LIVINGSTATUS SMOKESTATUS GENDER PRE POST
#> <dbl> <chr> <dbl> <chr> <chr> <chr> <dbl> <dbl>
#> 1 1 CONTROL 55 ALONE (SINGLE) SMOKER FEMALE 84.6 79.4
#> 2 2 CONTROL 32 ALONE (SINGLE) QUIT-SMOKER FEMALE 75.3 75.1
#> 3 3 CONTROL 24 WITH FRIENDS NON-SMOKER MALE 77.5 75.3
#> 4 4 CONTROL 32 WITH PARTNER SMOKER FEMALE 79.7 75.8
#> 5 5 CONTROL 42 ALONE (SINGLE) SMOKER MALE 76.3 76.1
Created on 2023-09-03 with reprex v2.0.2
1 Like
thank you, I am really gratefull,
This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.
If you have a query related to it or one of the replies, start a new topic and refer back with a link.