Modern/updated dplyr way to remove columns with NA values?

pathos · October 20, 2022, 6:25am

I saw online with many similar guides as the above, but they use the deprecated functions such as select_if() or where().

What is the updated way to remove all columns with any NA values? I tried some with select(across()) or select(if_any()), but I think I'm missing the nuance.

df = data.frame(abc = c(1, 2, 3),
                def = c(4, 5, NA),
                ghi = c(NA, NA, NA))

# DOES NOT WORK -- tells me if_any needs to be in a dplyr verb...
df |>
  select(if_any(colSums(is.na(.) > 0)))

technocrat · October 20, 2022, 6:34am

Not very modern, but less syntax to deal with

DF = data.frame(abc = c(1, 2, 3),
                def = c(4, 5, NA),
                ghi = c(NA, NA, NA))

na.omit(DF)
#> [1] abc def ghi
#> <0 rows> (or 0-length row.names)

(Each column contains at least one NA, so all are excluded.)

pathos · October 20, 2022, 6:49am

Thanks, but that removes rows, not columns.

Flm · October 20, 2022, 6:59am

I also actually use the same method

dt <- function(x) { sum(!is.na(x)) > 0 }
data <- data %>% select_if(dt)

technocrat · October 20, 2022, 8:16am

You're right. I fooled myself because the empty return

pathos · October 20, 2022, 8:17am

select_if is deprecated. For example, it's not in tidytable.

nirgrahamuk · October 20, 2022, 9:48am


df = data.frame(abc = c(1, 2, 3),
                def = c(4, 5, NA),
                ghi = c(NA, NA, NA))

df %>% select_if(~ !any(is.na(.)))
df %>% select(where(~ !any(is.na(.))))

technocrat · October 20, 2022, 9:55pm

I got lulled into complacency because it returned what I expected (which was wrong)

DF = data.frame(abc = c(1, 2, 3),
                def = c(4, 5, NA),
                ghi = c(NA, NA, NA))
DF[is.na(colMeans(DF))]
#>   def ghi
#> 1   4  NA
#> 2   5  NA
#> 3  NA  NA

system · October 27, 2022, 9:55pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.