pathos
October 20, 2022, 6:25am
1
This tutorial explains how to remove columns with any NA values in R, including several examples.
Est. reading time: 2 minutes
I saw online with many similar guides as the above, but they use the deprecated functions such as select_if()
or where()
.
What is the updated way to remove all columns with any NA values? I tried some with select(across())
or select(if_any())
, but I think I'm missing the nuance.
df = data.frame(abc = c(1, 2, 3),
def = c(4, 5, NA),
ghi = c(NA, NA, NA))
# DOES NOT WORK -- tells me if_any needs to be in a dplyr verb...
df |>
select(if_any(colSums(is.na(.) > 0)))
Not very modern, but less syntax to deal with
DF = data.frame(abc = c(1, 2, 3),
def = c(4, 5, NA),
ghi = c(NA, NA, NA))
na.omit(DF)
#> [1] abc def ghi
#> <0 rows> (or 0-length row.names)
(Each column contains at least one NA, so all are excluded.)
1 Like
pathos
October 20, 2022, 6:49am
3
Thanks, but that removes rows, not columns.
Flm
October 20, 2022, 6:59am
4
I also actually use the same method
dt <- function(x) { sum(!is.na(x)) > 0 }
data <- data %>% select_if(dt)
2 Likes
You're right. I fooled myself because the empty return
1 Like
pathos
October 20, 2022, 8:17am
6
select_if
is deprecated. For example, it's not in tidytable
.
df = data.frame(abc = c(1, 2, 3),
def = c(4, 5, NA),
ghi = c(NA, NA, NA))
df %>% select_if(~ !any(is.na(.)))
df %>% select(where(~ !any(is.na(.))))
3 Likes
I got lulled into complacency because it returned what I expected (which was wrong)
DF = data.frame(abc = c(1, 2, 3),
def = c(4, 5, NA),
ghi = c(NA, NA, NA))
DF[is.na(colMeans(DF))]
#> def ghi
#> 1 4 NA
#> 2 5 NA
#> 3 NA NA
system
Closed
October 27, 2022, 9:55pm
9
This topic was automatically closed 7 days after the last reply. New replies are no longer allowed. If you have a query related to it or one of the replies, start a new topic and refer back with a link.