I have data that was entered as a checklist such as this:
A B C
X X
X X X
X
X X
And I want it to end up in a frequency table that looks like this:
Var Count
A 2
B 3
C 3
I've never worked with data formatted this way so I'm not even sure how to start on counting it into a nice table without a lot of transforms. Any suggestions would be greatly appreciated.
The dplyr and tidyr packages actually make this sort of thing kind of fun.
# Load the dplyr package
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
# This just creates an example dataset
df <- tibble::tribble(
~A, ~B, ~C,
"X", "", "X",
"X", "X", "X",
"", "X", "",
"", "X", "X")
df
#> # A tibble: 4 x 3
#> A B C
#> <chr> <chr> <chr>
#> 1 X "" X
#> 2 X X X
#> 3 "" X ""
#> 4 "" X X
# convert "X"s to 1 and anything else to 0
# Then take the sum of each column
# Finally transpose the data from wide to long
df %>%
mutate_all(~ifelse(. == "X", 1, 0)) %>%
summarise_all(sum) %>%
tidyr::gather("Var", "Count")
#> # A tibble: 3 x 2
#> Var Count
#> <chr> <dbl>
#> 1 A 2.00
#> 2 B 3.00
#> 3 C 3.00
3 Likes
After a small amount of tinkering to make it play nice with missing values, it works! Thank you so much!
1 Like
Sounds like you've got it worked out, but just wanted to note you could skip the mutate step and go straight to summarise in this case.
library(tidyverse)
df <- tribble(
~A, ~B, ~C,
"X", "", "X",
"X", "X", "X",
"", "X", "",
"", "X", "X"
)
df %>%
summarise_all(~sum(. == "X")) %>%
gather("Var", "Count")
#> # A tibble: 3 x 2
#> Var Count
#> <chr> <int>
#> 1 A 2
#> 2 B 3
#> 3 C 3
Created on 2018-04-16 by the reprex package (v0.2.0).
3 Likes