Multiple column with multiple values

JackDavison · December 21, 2021, 11:31am

Welcome to RStudio Community.

Thank you for providing a description of your data, but there's nothing better than a reproducible example. The easiest way to provide one is providing the output of dput(your_data) (or dput(head(your_data, 100)) if your data is huge).

Without having access to your data, I think you could probably use dplyr::count() for all of your questions. Here's an example using some data built-in to the tidyverse.

library(tidyverse)

glimpse(diamonds)
#> Rows: 53,940
#> Columns: 10
#> $ carat   <dbl> 0.23, 0.21, 0.23, 0.29, 0.31, 0.24, 0.24, 0.26, 0.22, 0.23, 0.~
#> $ cut     <ord> Ideal, Premium, Good, Premium, Good, Very Good, Very Good, Ver~
#> $ color   <ord> E, E, E, I, J, J, I, H, E, H, J, J, F, J, E, E, I, J, J, J, I,~
#> $ clarity <ord> SI2, SI1, VS1, VS2, SI2, VVS2, VVS1, SI1, VS2, VS1, SI1, VS1, ~
#> $ depth   <dbl> 61.5, 59.8, 56.9, 62.4, 63.3, 62.8, 62.3, 61.9, 65.1, 59.4, 64~
#> $ table   <dbl> 55, 61, 65, 58, 58, 57, 57, 55, 61, 61, 55, 56, 61, 54, 62, 58~
#> $ price   <int> 326, 326, 327, 334, 335, 336, 336, 337, 337, 338, 339, 340, 34~
#> $ x       <dbl> 3.95, 3.89, 4.05, 4.20, 4.34, 3.94, 3.95, 4.07, 3.87, 4.00, 4.~
#> $ y       <dbl> 3.98, 3.84, 4.07, 4.23, 4.35, 3.96, 3.98, 4.11, 3.78, 4.05, 4.~
#> $ z       <dbl> 2.43, 2.31, 2.31, 2.63, 2.75, 2.48, 2.47, 2.53, 2.49, 2.39, 2.~

count(diamonds, cut)
#> # A tibble: 5 x 2
#>   cut           n
#>   <ord>     <int>
#> 1 Fair       1610
#> 2 Good       4906
#> 3 Very Good 12082
#> 4 Premium   13791
#> 5 Ideal     21551

count(diamonds, color)
#> # A tibble: 7 x 2
#>   color     n
#>   <ord> <int>
#> 1 D      6775
#> 2 E      9797
#> 3 F      9542
#> 4 G     11292
#> 5 H      8304
#> 6 I      5422
#> 7 J      2808

count(diamonds, cut, color)
#> # A tibble: 35 x 3
#>    cut   color     n
#>    <ord> <ord> <int>
#>  1 Fair  D       163
#>  2 Fair  E       224
#>  3 Fair  F       312
#>  4 Fair  G       314
#>  5 Fair  H       303
#>  6 Fair  I       175
#>  7 Fair  J       119
#>  8 Good  D       662
#>  9 Good  E       933
#> 10 Good  F       909
#> # ... with 25 more rows

For your columns that have lots of values in, you may want to separate them using tidyr::separate_rows():

library(tidyverse)

tibble(x = "TV, Movie, Book, Video Game")
#> # A tibble: 1 x 1
#>   x                          
#>   <chr>                      
#> 1 TV, Movie, Book, Video Game

tibble(x = "TV, Movie, Book, Video Game") %>% 
  separate_rows(x, sep = ", ")
#> # A tibble: 4 x 1
#>   x         
#>   <chr>     
#> 1 TV        
#> 2 Movie     
#> 3 Book      
#> 4 Video Game

^{Created on 2021-12-21 by the reprex package (v2.0.1)}