Use of NULL to replace a factor level and value in place of using NA in a dataframe

odin · March 23, 2023, 5:11pm

Hello. I would appreciate it if someone would please explain why it is problematic to use NULL (beyond a misuse of the concept of what a NULL should be (?) ) to replacing existing factor levels and observation values ? -- Instead of doing this with na_if() , for example:

data(gss_cat, package="forcats")

table(gss_cat$partyid)

gss_cat<- gss_cat %>%
  mutate(partyid2 = partyid) %>%
  mutate(partyid2 = na_if(partyid2,"No answer")) %>%
  mutate(partyid2 = na_if(partyid2,"Don't know"))  

table(gss_cat$partyid2)

summary(gss_cat$partyid2)

Then if necessary one could drop the unused levels:

gss_cat<-gss_cat %>%
  mutate(partyid2 = fct_drop(partyid2))

table(gss_cat$partyid2)

But why would it be problematic for most analytical situations -- using survey package functions for example --- to simply NULL out the two levels?

gss_cat<-gss_cat %>%
    mutate(partyid3 = partyid) %>%
    mutate(partyid3 = fct_recode(partyid,
           NULL = "No answer",
           NULL = "Don't know"))

The results seem to be the same. And in other dataframes I have not found problems with the different survey package functions.

table(gss_cat$partyid3)

Thank you!

nirgrahamuk · March 23, 2023, 5:22pm

Its not clear to me what you are asking.
You seem to show some fairly straightforward dplyr transformations, involving forcats, but asking why things dont work in survey package functions .... ?
maybe an example of you doing something with survey and a problem arising from it would show some light on what you want to discuss ?

odin · March 23, 2023, 6:00pm

Thank you for responding, and sorry for not being clearer. So I am asking why this syntax below --- using a NULL in a fct_recode() function is considered 'wrong'? That is, from what I have seen posted elsewhere, a NULL should be reserved for use on objects --- but not for assigning a missing value to a particular factor level in mutate()?

I don't see why it matters -- using na_if() or NULL in fct_recode()?

system · May 4, 2023, 6:01pm

This topic was automatically closed 42 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.