filter failing to remove NA

katchamp · March 5, 2024, 9:52pm

I have a set up where I will have an uncertain number of columns that follow a naming convention in a data frame and I want to filter out any row where there is an NA in at least one of these columns.

I have found most of a solution to my problem using the following set up. I want test_filter to only contain row 1 which is the only row in test_df that has values for both comp_1 and comp_2

test_df = matrix(c(1,2,3,4,5,10,NA,30,NA,0,15,25,NA,NA,NA), ncol = 3) %>% 
  as.tibble() %>% 
  rename(id = V1,
         comp_1 = V2,
         comp_2 = V3)

test_filter = test_df %>% 
  filter(!is.na(if_all(starts_with('comp_'))))

The above code does what I want except when there is a zero in one of the comp_ columns. As written above test_filter incorrectly includes the row with id 5. However, if I change value in comp_1 from 0 to any other number it then returns the correct information.

This only happens when one of the values is zero, and I don't know why. Maybe something wierd is happening under the hood with if_all?

Also I am open to other ways to implement this filtering beside what I wrote above. Some additional considerations for alternative solutions:

In my real data frame there will be other columns besides id, comp_1 and comp_2 and the positions of comp_1 and comp_2 may change relative to each other which is why I choose to ID them based on name.
The number of comp_ columns is able to change so I want this filter to work whether I have one column or multiple
I do eventually pivot the data longer, and I thought about filtering after the pivot. However, I realized that wouldn't work for me because I need to remove any info associated with an ID if at least one of the comp_ columns is empty which is why I decided to filter before the pivot.

prubin · March 5, 2024, 11:03pm

Your function calls in the filter are out of order. Try this:

test_df %>% filter(if_all(starts_with('comp_'), ~ !is.na(.x)))

The following also works (not sure if it is better, worse or a toss-up in terms of performance):

ix <- test_df |> select(starts_with("comp_")) |> complete.cases() |> which() test_filter <- test_df[ix, ]

system · March 12, 2024, 11:04pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.