I have a dataframe about book bans with an original column titled "reason" but the values from the dataset are inconsistently written. I have some code written that is creating a new column "specific reason" that is aggregating all the "reason" values that include certain words to make categories (ie: "race" and "racial" could be renamed as "racism" in the new column). How can I add more words/categories to my new column ("specific reason")? The code I'm working with now is below, but I'm not sure how to add to it. I'd also like to convert all the blank values in the column to be named "NA".
This is one solution. I've added some sample data, and I am not operating within a function, but the method should be the same. You could transfer this to a function if you wish.
# package library
library(tidyverse)
#> Warning: package 'tidyverse' was built under R version 4.2.2
#> Warning: package 'ggplot2' was built under R version 4.2.3
#> Warning: package 'tibble' was built under R version 4.2.3
#> Warning: package 'tidyr' was built under R version 4.2.2
#> Warning: package 'readr' was built under R version 4.2.2
#> Warning: package 'purrr' was built under R version 4.2.2
#> Warning: package 'dplyr' was built under R version 4.2.3
#> Warning: package 'stringr' was built under R version 4.2.2
#> Warning: package 'forcats' was built under R version 4.2.2
#> Warning: package 'lubridate' was built under R version 4.2.2
library(janitor)
#> Warning: package 'janitor' was built under R version 4.2.2
#>
#> Attaching package: 'janitor'
#> The following objects are masked from 'package:stats':
#>
#> chisq.test, fisher.test
# sample data
sam_dat <- tibble(
reason_messy = sample(
x = c("nude", "sex", "nudity", "race", "racial", "violent", "violence", "", "blue"),
size = 25,
replace = TRUE
),
book = as.character(seq(1, 25, 1))
)
# create new variable reason_specific
sam_dat <- sam_dat %>%
mutate(
reason_specific = factor(case_when(
# sex
str_detect(
string = reason_messy,
pattern = "nude|sex|nudity"
) ~ "sex",
# race
str_detect(
string = reason_messy,
pattern = "race|racial"
) ~ "race",
# violence
str_detect(
string = reason_messy,
pattern = "violent|violence"
) ~ "violence",
# convert all the blank values in the column to be named "NA".
reason_messy == "" ~NA_character_,
# other
TRUE ~ "other"
))
)
# frequency table
sam_dat %>%
tabyl(reason_specific)
#> reason_specific n percent valid_percent
#> other 3 0.12 0.1304348
#> race 4 0.16 0.1739130
#> sex 12 0.48 0.5217391
#> violence 4 0.16 0.1739130
#> <NA> 2 0.08 NA