Welcome @Sophialai!
First of all -- thanks for making a reprex -- I was able to copy your example well!
Second of all - it's worthwhile to note that the message you got was an "warning", and not an "error". The important difference is that your code did "work"! That means that rel_by_religion
is a data frame that you can use without a problem - so, if you want, you can ignore that warning message.
The message itself discusses the function fct_explicit_na
(from the forcats
) package.
Specifically, the religion
variable in your data has 5 levels (Protestant, Catholic, Jewish, None and Other). However, there are 18 rows that have none of those levels -- they are just NA. In certain modeling/plotting functions, this could mean that those rows would be silently dropped or ignored, which may not be what you want.
The referenced function turns all those missing into a new factor, called "(Missing)" -- so they don't get silently dropped.
library(socviz)
library(dplyr, warn.conflicts = FALSE)
library(forcats)
gss_sm %>%
group_by(bigregion, religion) %>%
summarize(N = n()) %>%
mutate(freq = N / sum(N),
pct = round((freq*100), 0))
#> Warning: Factor `religion` contains implicit NA, consider using
#> `forcats::fct_explicit_na`
#> # A tibble: 24 x 5
#> # Groups: bigregion [4]
#> bigregion religion N freq pct
#> <fct> <fct> <int> <dbl> <dbl>
#> 1 Northeast Protestant 158 0.324 32
#> 2 Northeast Catholic 162 0.332 33
#> 3 Northeast Jewish 27 0.0553 6
#> 4 Northeast None 112 0.230 23
#> 5 Northeast Other 28 0.0574 6
#> 6 Northeast <NA> 1 0.00205 0
#> 7 Midwest Protestant 325 0.468 47
#> 8 Midwest Catholic 172 0.247 25
#> 9 Midwest Jewish 3 0.00432 0
#> 10 Midwest None 157 0.226 23
#> # … with 14 more rows
gss_sm %>%
mutate(religion = forcats::fct_explicit_na(religion)) %>%
group_by(bigregion, religion) %>%
summarize(N = n()) %>%
mutate(freq = N / sum(N),
pct = round((freq*100), 0))
#> # A tibble: 24 x 5
#> # Groups: bigregion [4]
#> bigregion religion N freq pct
#> <fct> <fct> <int> <dbl> <dbl>
#> 1 Northeast Protestant 158 0.324 32
#> 2 Northeast Catholic 162 0.332 33
#> 3 Northeast Jewish 27 0.0553 6
#> 4 Northeast None 112 0.230 23
#> 5 Northeast Other 28 0.0574 6
#> 6 Northeast (Missing) 1 0.00205 0
#> 7 Midwest Protestant 325 0.468 47
#> 8 Midwest Catholic 172 0.247 25
#> 9 Midwest Jewish 3 0.00432 0
#> 10 Midwest None 157 0.226 23
#> # … with 14 more rows
Created on 2019-08-27 by the reprex package (v0.3.0)
For a more explicit example on how the two are treated differently:
library(socviz)
library(forcats)
table(gss_sm$religion)
#>
#> Protestant Catholic Jewish None Other
#> 1371 649 51 619 159
table(fct_explicit_na(gss_sm$religion))
#>
#> Protestant Catholic Jewish None Other (Missing)
#> 1371 649 51 619 159 18
Created on 2019-08-27 by the reprex package (v0.3.0)