Convert a 'Checkbox Grid' question type from Google Forms into a crosstabulation

I need to convert a 'Checkbox Grid' question type from Google Forms into a crosstabulation. However, when I download the data from Google Forms to a Google Spreadsheet or Excel, multiple responses are grouped into a single column separated by commas (,).

Below is one way to achieve the desired outcome, which uses pivot_longer() / pivot_wider() to reshape the data and separate_rows() to create a new row for each entry separated by a comma.

library(tidyverse)

# sample data
df = data.frame(
  q1 = c('Coca Cola', 'Coca Cola, Sprite', 'Fanta', 'Inca Kola', 'Coca Cola', 'Fanta', 'Sprite'),
  q2 = c('Fanta, Sprite', 'Coca Cola', 'Fanta', 'Coca Cola, Inca Kola', 'Fanta, Sprite', 'Fanta', 'Fanta, Sprite'),
  q3 = c('Fanta', 'Coca Cola, Fanta, Sprite, Inca Kola', 'Fanta, Sprite', 'Coca Cola', 'Coca Cola, Inca Kola', 'Fanta', 'Sprite'),
  q4 = c('Coca Cola, Fanta, Sprite, Inca Kola', 'Ningun', 'Sprite', 'Ningun', 'Sprite', 'Sprite', 'Fanta, Inca Kola')
)

out = df |>
  mutate(row = row_number()) |>
  pivot_longer(-'row') |> 
  separate_rows(value, sep = ', ') |> 
  count(name, value) |>
  pivot_wider(names_from = value, values_from = n) |>
  select(`Coca Cola`, Fanta, Sprite, `Inca Kola`, `Ningun`) |>
  mutate_all(~replace_na(., 0))

out
#> # A tibble: 4 × 5
#>   `Coca Cola` Fanta Sprite `Inca Kola` Ningun
#>         <int> <int>  <int>       <int>  <int>
#> 1           3     2      2           1      0
#> 2           2     5      3           1      0
#> 3           3     4      3           2      0
#> 4           1     2      4           2      2

Created on 2023-06-19 with reprex v2.0.2

1 Like

Thank you very much for your help, I was able to perform my analysis

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.