How can I drop variables with more than 20% missing values?


Hi, I currently have a dataset 'data' contains more than 100 variables. I am doing some data cleaning right now, and I would like to drop all variables with more than 20% missing values.

Right now I have the following code,

data2 <- data[!map_lgl(data, (]

But I am getting this error message: Error: Can't convert a logical vector to function
Call rlang::last_error() to see a backtrace

Is there a better/correct way to do this? Thanks!

1 Like

Welcome to the community!

Does this work for you?

is_column_with_at_least_eighty_percent_non_missing <- function(t)
  mean(x = = t)) < 0.20

Filter(f = is_column_with_at_least_eighty_percent_non_missing,
       x = dataset)
1 Like

This works! I also solved by doing this:

data2 <- data[, which(colMeans( > 0.5)]

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.