jarle
January 3, 2024, 4:07pm
1
Hi
I want a new column based on number of NAs or empty occurences accross the column range fes1 to fes 12.
This works:
df_new <- df %>%
mutate (ID,
fes_cat1 = rowMeans(across(fes1:fes12))
)
I'd however like the additional conditions:
Only calculate new column value (mean) when number of NAs or empty values in the range is less than 4.
Calculate value (mean) based on number of non-NAs.
Anyone?
RamRS
January 3, 2024, 9:46pm
2
Why not add a field counting the number of NAs for your first condition?
count_na <- function(x) sum(is.na(x))
req_cols <- paste0("fes",1:12)
df %>% mutate(na_count = apply(.[req_cols], 1, count_na) %>% filter(na_count < 4) %>% mutate(fes_cat1 = rowMeans(across(fes1:fes12)))
Idea taken from SE post: r - Add a column with count of NAs and Mean - Stack Overflow
1 Like
jarle
January 4, 2024, 9:53am
3
RamRS: Nice!
However, I get the error msg:
Error in mutate()
:
In argument: na_count =
%>%(...)
.
Caused by error in UseMethod()
:
! no applicable method for 'filter' applied to an object of class "c('integer', 'numeric')"
sapply(df, mode)
show that colums are numerical.
jarle
January 4, 2024, 10:14am
4
Doing it in two steps however worked:
df <- df %>%
mutate(na_count = apply(.[req_cols], 1, count_na))
df <- df %>%
filter(na_count < 4) %>%
mutate(fes_cat1 = rowMeans(across(fes1_t1:fes12_t1)))
But the above
removes rows with na_count
> 3 and
does not calculate rowMeans
when na_count
> 0.
(2) seems to be an issue with rowMeans
, fixed with argument na.rm=TRUE
(1) fixed by using ifelse
instead of filter
:
mutate(fes_cat1 = ifelse(na_count < 4,
rowMeans(across(fes1_t1:fes12_t1), na.rm=TRUE),
NA))
system
Closed
January 11, 2024, 10:15am
5
This topic was automatically closed 7 days after the last reply. New replies are no longer allowed. If you have a query related to it or one of the replies, start a new topic and refer back with a link.