Prepare 2-way contingency table from open-ended questions

Hello, I was wondering if someone can help me prepare contingency table. I wish to have 'sample_name' in rows and words used in rows from first to fifth answer box in column. data file added here. Annotation 2020-05-22 161359
Here is code:

# reading data 
df <- read.csv("F:/first_comment_box_R.csv",
                stringsAsFactors = F)
colnames(df)[1] = "sample_name"
newdf <- table(df$sample_name, df$first_box)
contingency.table <-  aggregate(df[-c(1,2)], list(product=df[, 1]), sum, na.rm=T)
#> Error in FUN(X[[i]], ...): invalid 'type' (character) of argument

Created on 2020-05-22 by the reprex package (v0.3.0)

I use pivot_longer() and xtabs().


library(tidyverse)

# create a similar data frame for demonstration
df <- tibble(
  sample_name = c("tom", "john"),
  first_box = c("loose", "ugly"),
  second_box = c("good color", "off color")
)

df
#> # A tibble: 2 x 3
#>   sample_name first_box second_box
#>   <chr>       <chr>     <chr>     
#> 1 tom         loose     good color
#> 2 john        ugly      off color

# should include 5 columns in pivot_longer for your data
df %>% 
  pivot_longer(c(first_box, second_box), names_to = "box", values_to = "word") %>% 
  xtabs(~ sample_name + word, data = .)
#>            word
#> sample_name good color loose off color ugly
#>        john          0     0         1    1
#>        tom           1     1         0    0

Created on 2020-05-22 by the reprex package (v0.3.0)

1 Like

Thanks @enixam. Can I see it as a table, just to check if it did everything okay.
Annotation 2020-05-23 090400

Sure, this looks fine to me.

@enixam No, I was asking if I can make a table of it and see if df (data frame) in a table format. Currently, i was only able to see it in a console. What code I should use?

Do you mean a formatted table in a report? If you know how to use R Markdown then just pipe the resulting data frame to knitr::kable(), after knitting you get what you want.

I am not sure if I know markdown. I want to see it a table, like shown in image. And if I can make excel table of it so that I can manually work on it for duplicated words (I mean like people may wrote aroma, odor, smell as a separate but they actually mean similar, so I want to club them under one label).

# reading data
library(tidyverse)
#> Warning: package 'tidyverse' was built under R version 3.5.3
#> Warning: package 'ggplot2' was built under R version 3.5.3
#> Warning: package 'tibble' was built under R version 3.5.3
#> Warning: package 'tidyr' was built under R version 3.5.3
#> Warning: package 'purrr' was built under R version 3.5.3
#> Warning: package 'dplyr' was built under R version 3.5.3
#> Warning: package 'stringr' was built under R version 3.5.3
df <- read.csv("F:/Sensory/Ch2_Figures/first_comment_box_R.csv",
                stringsAsFactors = F)
colnames(df)[1] = "sample_name"
# Prepared table
df %>% 
  pivot_longer(c(first_box, second_box, third_box, fourth_box, fifth_box), 
               names_to = "box", values_to = "word") %>% 
  xtabs(~ sample_name + word, data = .)
#>                 word
#> sample_name      "By hand" texture A bit chunky A bit dry
#>   AC99330-1P/Y                   0            0         0
#>   Atlantic                       0            0         0
#>   Canela ruset                   0            1         0
#>   CO05068-1RU                    1            0         0
#>   CO99076-6R                     0            0         0
#>   Masquerade                     0            0         0
#>   POR12PG28-3                    0            0         0
#>   Purple majesty                 0            0         0
#>   Rio colorado                   0            0         0
#>   Russian banana                 0            0         1
#>   Valery                         0            0         0
#>   Vermillion                     0            0         0

knitr::kable(df, row.names = T, col.names = T)
#> Error in dimnames(x) <- dn: length of 'dimnames' [2] not equal to array extent

Created on 2020-05-25 by the reprex package (v0.3.0)
Annotation 2020-05-25 172701

Huh, in this case perhaps xtabs() isn't the best option since it doesn't result in a data frame that can be easily exported. The alternative is:

library(tidyverse)

# create a similar data frame for demonstration
df <- tibble(
  sample_name = c("tom", "john"),
  first_box = c("loose", "ugly"),
  second_box = c("good color", "off color")
)



my_table <- df %>% 
  pivot_longer(c(first_box, second_box), names_to = "box", values_to = "word") %>% 
  count(sample_name, word) %>% 
  pivot_wider(names_from = word, values_from = n, values_fill = list(n = 0))

my_table
#> # A tibble: 2 x 5
#>   sample_name `off color`  ugly `good color` loose
#>   <chr>             <int> <int>        <int> <int>
#> 1 john                  1     1            0     0
#> 2 tom                   0     0            1     1

write_excel_csv(my_table, "my_table.csv")

Created on 2020-05-25 by the reprex package (v0.3.0)

Now the global environment becomes

Also you should see the file my_table.csv under the root directory available for excel.

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.