Analyzing survey data

Hi all!

I have multiple surveys with the same 3 questions, each one has 5 options for a response, from strongly disagree->strongly agree. I have combined multiple surveys from the same month into a single data frame and am trying to create a table that compares the relative frequency of each response. Below is the desired outcome:
image

I am having trouble with figuring out how to get R to count the number of unique event names and then creating the table above. I have been able to do this on a survey by survey basis, but instead of comparing Q1,Q2,Q3 results, I want to compare the results for each event's Q1, Q2, and Q3.

Here is some of my dput:

dput(cleandata[c(1,400,600,1000,2000,3000),])
structure(list(`Event Name` = c("A",
"A",
"A",
"B",
"C", "D"
), `Attendance Duration` = c("65.0 mins", "65.0 mins", "57.0 mins",
"54.0 mins", "58.0 mins", "58.0 mins"), `Survey Score` = c(14,
15, 15, 15, 14, 11), `The content was delivered in an understable manner` = c("Strongly agree",  ## this is question 1
"Strongly agree", "Strongly agree", "Strongly agree", "Strongly agree",
"Neutral"), `Overall, the content was applicable to  activities` = c("Strongly agree",  ## this is question 2
"Strongly agree", "Strongly agree", "Strongly agree", "Strongly agree",
"Agree"), `Aspects of this training can be used in the daily work ` = c("Agree",  ## this is question 3
"Strongly agree", "Strongly agree", "Strongly agree", "Agree",
"Agree")), row.names = c(NA, -6L), class = c("tbl_df", "tbl",
"data.frame"))

Hi @gtrotz,
Welcome to the RStudio Community Forum.
If I understand your question correctly, then this code should help:

suppressPackageStartupMessages(library(tidyverse))

cleandata <- structure(list(`Event Name` = c("A","A","A","B","C","D"
), `Attendance Duration` = c("65.0 mins", "65.0 mins", "57.0 mins",
"54.0 mins", "58.0 mins", "58.0 mins"), `Survey Score` = c(14,
15, 15, 15, 14, 11), `The content was delivered in an understable manner` = c("Strongly agree",  ## this is question 1
"Strongly agree", "Strongly agree", "Strongly agree", "Strongly agree",
"Neutral"), `Overall, the content was applicable to  activities` = c("Strongly agree",  ## this is question 2
"Strongly agree", "Strongly agree", "Strongly agree", "Strongly agree",
"Agree"), `Aspects of this training can be used in the daily work ` = c("Agree",  ## this is question 3
"Strongly agree", "Strongly agree", "Strongly agree", "Agree",
"Agree")), row.names = c(NA, -6L), class = c("tbl_df", "tbl",
"data.frame"))

cleandata %>% 
  mutate(across(where(is.character), as.factor)) %>% 
  rename(event = 1, duration = 2, score = 3, Q1 = 4, Q2 = 5, Q3 = 6) -> new_data
new_data
#> # A tibble: 6 x 6
#>   event duration  score Q1             Q2             Q3            
#>   <fct> <fct>     <dbl> <fct>          <fct>          <fct>         
#> 1 A     65.0 mins    14 Strongly agree Strongly agree Agree         
#> 2 A     65.0 mins    15 Strongly agree Strongly agree Strongly agree
#> 3 A     57.0 mins    15 Strongly agree Strongly agree Strongly agree
#> 4 B     54.0 mins    15 Strongly agree Strongly agree Strongly agree
#> 5 C     58.0 mins    14 Strongly agree Strongly agree Agree         
#> 6 D     58.0 mins    11 Neutral        Agree          Agree

new_data %>% 
  group_by(event, Q1) %>% 
  summarise(num = n()) %>% 
  pivot_wider(id_cols=event, names_from=Q1, values_from=num) -> q1
#> `summarise()` has grouped output by 'event'. You can override using the `.groups` argument.

new_data %>% 
  group_by(event, Q2) %>% 
  summarise(num = n()) %>% 
  pivot_wider(id_cols=event, names_from=Q2, values_from=num) -> q2
#> `summarise()` has grouped output by 'event'. You can override using the `.groups` argument.

new_data %>% 
  group_by(event, Q3) %>% 
  summarise(num = n()) %>% 
  pivot_wider(id_cols=event, names_from=Q3, values_from=num) -> q3
#> `summarise()` has grouped output by 'event'. You can override using the `.groups` argument.

# All responses and totals
bind_rows(q1,q2,q3, .id="question") %>% 
  replace(is.na(.), 0) %>% 
  mutate(total = rowSums(across(where(is.numeric))))
#> # A tibble: 12 x 6
#> # Groups:   event [4]
#>    question event `Strongly agree` Neutral Agree total
#>    <chr>    <fct>            <int>   <int> <int> <dbl>
#>  1 1        A                    3       0     0     3
#>  2 1        B                    1       0     0     1
#>  3 1        C                    1       0     0     1
#>  4 1        D                    0       1     0     1
#>  5 2        A                    3       0     0     3
#>  6 2        B                    1       0     0     1
#>  7 2        C                    1       0     0     1
#>  8 2        D                    0       0     1     1
#>  9 3        A                    2       0     1     3
#> 10 3        B                    1       0     0     1
#> 11 3        C                    0       0     1     1
#> 12 3        D                    0       0     1     1

# Proportions of total
bind_rows(q1,q2,q3, .id="question") %>% 
  replace(is.na(.), 0) %>% 
  mutate(total = rowSums(across(where(is.numeric)))) %>% 
  mutate(across(where(is.numeric), ~ .x/total))
#> # A tibble: 12 x 6
#> # Groups:   event [4]
#>    question event `Strongly agree` Neutral Agree total
#>    <chr>    <fct>            <dbl>   <dbl> <dbl> <dbl>
#>  1 1        A                1           0 0         1
#>  2 1        B                1           0 0         1
#>  3 1        C                1           0 0         1
#>  4 1        D                0           1 0         1
#>  5 2        A                1           0 0         1
#>  6 2        B                1           0 0         1
#>  7 2        C                1           0 0         1
#>  8 2        D                0           0 1         1
#>  9 3        A                0.667       0 0.333     1
#> 10 3        B                1           0 0         1
#> 11 3        C                0           0 1         1
#> 12 3        D                0           0 1         1

Created on 2022-01-29 by the reprex package (v2.0.1)

1 Like

Hi, this is closer to what I'm looking for!

My end goal is to have 3 tables, each one focused on the results of questions: 1, 2, and 3 respectively.

Here's how I'm trying to get it to look: (the percentages are nonsense but its just to give you a sense of the layout)

Q1 Relative Score Frequency
Score 1 2 3 4 5
Event A .01% .3% 10.5% 25% 50%
Event B .01% .3% 10.5% 25% 50%
Event C .01% .3% 10.5% 25% 50%
Event D .01% .3% 10.5% 25% 50%
Event E .01% .3% 10.5% 25% 50%
Q2 Relative Score Frequency
Score 1 2 3 4 5
Event A .01% .3% 10.5% 25% 50%
Event B .01% .3% 10.5% 25% 50%
Event C .01% .3% 10.5% 25% 50%
Event D .01% .3% 10.5% 25% 50%
Event E .01% .3% 10.5% 25% 50%
Q3 Relative Score Frequency
Score 1 2 3 4 5
Event A .01% .3% 10.5% 25% 50%
Event B .01% .3% 10.5% 25% 50%
Event C .01% .3% 10.5% 25% 50%
Event D .01% .3% 10.5% 25% 50%
Event E .01% .3% 10.5% 25% 50%

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.