Hello. I would appreciate it if someone would please explain why it is problematic to use NULL (beyond a misuse of the concept of what a NULL should be (?) ) to replacing existing factor levels and observation values ? -- Instead of doing this with na_if()
, for example:
data(gss_cat, package="forcats")
table(gss_cat$partyid)
gss_cat<- gss_cat %>%
mutate(partyid2 = partyid) %>%
mutate(partyid2 = na_if(partyid2,"No answer")) %>%
mutate(partyid2 = na_if(partyid2,"Don't know"))
table(gss_cat$partyid2)
summary(gss_cat$partyid2)
Then if necessary one could drop the unused levels:
gss_cat<-gss_cat %>%
mutate(partyid2 = fct_drop(partyid2))
table(gss_cat$partyid2)
But why would it be problematic for most analytical situations -- using survey
package functions for example --- to simply NULL out the two levels?
gss_cat<-gss_cat %>%
mutate(partyid3 = partyid) %>%
mutate(partyid3 = fct_recode(partyid,
NULL = "No answer",
NULL = "Don't know"))
The results seem to be the same. And in other dataframes I have not found problems with the different survey
package functions.
table(gss_cat$partyid3)
Thank you!