I have a set up where I will have an uncertain number of columns that follow a naming convention in a data frame and I want to filter out any row where there is an NA
in at least one of these columns.
I have found most of a solution to my problem using the following set up. I want test_filter
to only contain row 1 which is the only row in test_df
that has values for both comp_1
and comp_2
test_df = matrix(c(1,2,3,4,5,10,NA,30,NA,0,15,25,NA,NA,NA), ncol = 3) %>%
as.tibble() %>%
rename(id = V1,
comp_1 = V2,
comp_2 = V3)
test_filter = test_df %>%
filter(!is.na(if_all(starts_with('comp_'))))
The above code does what I want except when there is a zero in one of the comp_
columns. As written above test_filter
incorrectly includes the row with id 5. However, if I change value in comp_1
from 0 to any other number it then returns the correct information.
This only happens when one of the values is zero, and I don't know why. Maybe something wierd is happening under the hood with if_all
?
Also I am open to other ways to implement this filtering beside what I wrote above. Some additional considerations for alternative solutions:
- In my real data frame there will be other columns besides
id
,comp_1
andcomp_2
and the positions ofcomp_1
andcomp_2
may change relative to each other which is why I choose to ID them based on name. - The number of
comp_
columns is able to change so I want this filter to work whether I have one column or multiple - I do eventually pivot the data longer, and I thought about filtering after the pivot. However, I realized that wouldn't work for me because I need to remove any info associated with an ID if at least one of the
comp_
columns is empty which is why I decided to filter before the pivot.