A complete newbie in the need of help with revising a dataset for analysis

TMP-Stewart · May 18, 2024, 1:43am

The project is through All of Us which generates a table with variables as follows:
person_id
survey question
concept_id
question
answer
concept_id answer
Person id has four entries for one participant per the four different questions asked. I need to create variables from the survey questions.
I need my variables to be person id, Celiac_Disease, Store_Access, Food_Security, Literacy
Is there a code to revise the table in such a way?

prubin · May 18, 2024, 6:12pm

Take a look at the pivot_wider function in the tidyr package, documented here.

mduvekot · May 20, 2024, 12:15am

Yes, as mentioned, pivot_wider(). Here's an example:

library(janitor)
library(magrittr)
library(tidyr)

df <- data.frame(
  person_id = c(1, 1, 1, 1, 
                2, 2, 2, 2, 
                3, 3, 3, 3),
  concept_id = c(12345, 23456, 34567, 45678, 
                 12345, 23456, 34567, 45678, 
                 12345, 23456, 34567, 45678),
  question = c("Pizza?", "Coffee?", "Pets?", "Color?",
               "Pizza?", "Coffee?", "Pets?", "Color?",
               "Pizza?", "Coffee?", "Pets?", "Color?"),
  answer = c("Margherita", "Latte", "Cats", "#2caf00", 
             "Diavolo", "Espresso", "Cats", "#3d2a40", 
             "Hawai", "Americano", "Dogs", "#df2c00")
  )

print(df)

df_wide <- df %>% 
  pivot_wider(id_cols = c(person_id), 
              names_from = question, 
              values_from = answer) %>% 
  clean_names(case = "snake")

print(df_wide)

system · August 18, 2024, 12:16am

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.