mutate across conditioned on number of NAs

Hi
I want a new column based on number of NAs or empty occurences accross the column range fes1 to fes 12.

This works:

df_new <- df %>%
 mutate (ID, 
        fes_cat1 = rowMeans(across(fes1:fes12))
        )

I'd however like the additional conditions:

  • Only calculate new column value (mean) when number of NAs or empty values in the range is less than 4.
  • Calculate value (mean) based on number of non-NAs.

Anyone?

Why not add a field counting the number of NAs for your first condition?

count_na <- function(x) sum(is.na(x))
req_cols <- paste0("fes",1:12)
df %>% mutate(na_count = apply(.[req_cols], 1, count_na) %>% filter(na_count < 4) %>% mutate(fes_cat1 = rowMeans(across(fes1:fes12)))

Idea taken from SE post: r - Add a column with count of NAs and Mean - Stack Overflow

1 Like

RamRS: Nice!
However, I get the error msg:

Error in mutate():
:information_source: In argument: na_count = %>%(...).
Caused by error in UseMethod():
! no applicable method for 'filter' applied to an object of class "c('integer', 'numeric')"

sapply(df, mode) show that colums are numerical.

Doing it in two steps however worked:

df <- df %>% 
          mutate(na_count = apply(.[req_cols], 1, count_na))
df <- df  %>% 
          filter(na_count < 4) %>% 
          mutate(fes_cat1 = rowMeans(across(fes1_t1:fes12_t1)))

But the above

  1. removes rows with na_count > 3 and
  2. does not calculate rowMeans when na_count > 0.

(2) seems to be an issue with rowMeans, fixed with argument na.rm=TRUE
(1) fixed by using ifelse instead of filter:

  mutate(fes_cat1 = ifelse(na_count < 4, 
                                rowMeans(across(fes1_t1:fes12_t1), na.rm=TRUE), 
                                NA))

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.