Find all rows where ANY numeric variable is greater than zero

Khundiman · February 2, 2021, 4:36pm

Help me understand the below code by responding with a base R approach, at the moment I am failing to understand the tidyverse approach.

Thanks so much.

library(tidyverse)
df <- tibble(x = c("a", "b"), y = c(1, 1), z = c(-1, 1))

# Find all rows where ANY numeric variable is greater than zero
rowAny <- function(x) rowSums(x) > 0
df %>% 
  filter(rowAny(across(where(is.numeric), ~ .x > 0)))
#> # A tibble: 2 x 3
#>   x         y     z
#>   <chr> <dbl> <dbl>
#> 1 a         1    -1
#> 2 b         1     1

^{Created on 2021-02-02 by the reprex package (v0.3.0)}

FJCC · February 2, 2021, 5:07pm

Does this help?

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

#Notice I added a row with two -1 values
df <- tibble(x = c("a", "b", "c"), y = c(1, 1, -1), z = c(-1, 1, -1))
rowAny <- function(x) rowSums(x) > 0

#Look at the result of the across() function with mutate. It is TRUE or 
#FALSE for each value of y and z
df %>% mutate(New = across(where(is.numeric), ~ .x > 0))
#> # A tibble: 3 x 4
#>   x         y     z New$y $z   
#>   <chr> <dbl> <dbl> <lgl> <lgl>
#> 1 a         1    -1 TRUE  FALSE
#> 2 b         1     1 TRUE  TRUE 
#> 3 c        -1    -1 FALSE FALSE

#rowAny() sums TRUE and False with TRUE = 1 and FALSE = 0. If there is any 
#TRUE value, the sum will be greater than 0 and that is taken as TRUE
df %>% mutate(New = rowAny(across(where(is.numeric), ~ .x > 0)))
#> # A tibble: 3 x 4
#>   x         y     z New  
#>   <chr> <dbl> <dbl> <lgl>
#> 1 a         1    -1 TRUE 
#> 2 b         1     1 TRUE 
#> 3 c        -1    -1 FALSE

#filtering on those TRUE and FALSE values keeps the TRUE rows.
df %>% 
  filter(rowAny(across(where(is.numeric), ~ .x > 0)))
#> # A tibble: 2 x 3
#>   x         y     z
#>   <chr> <dbl> <dbl>
#> 1 a         1    -1
#> 2 b         1     1

^{Created on 2021-02-02 by the reprex package (v0.3.0)}

Yarnabrina · February 2, 2021, 5:28pm

Given this, a base-r tag may be more useful than dplyr.

FJCC explained the code you shared above. To directly give you a base R alternative, you can consider this (using example from FJCC):

> df <- data.frame(x = c("a", "b", "c"), y = c(1, 1, -1), z = c(-1, 1, -1))
> df[rowSums(x = Filter(f = is.numeric, x = df) > 0) > 0,]
  x y  z
1 a 1 -1
2 b 1  1

Steps are these:

Choose only the numeric columns. I used base::Filter, which is equivalent to where in your example.
Determine whether each elements are positive or not. This is done by the first > 0 check, inside rowSums. In your code, it is this part: ~ .x > 0.
Check whether a row contains any positive or not. In both your way, and my base equivalent, it's done using rowSums and one more check with > 0. If any element had been postive, step 2 will ensure it has TRUE, and hence the sum (after type casting) is positive.
Display only filtered rows. I did subsetting using `[`, and the ,, and in the dplyr way, you use filter.

Hope this helps.

Khundiman · February 2, 2021, 6:21pm

Thank you so much. This also adds to my understanding. I was mainly trying to understand the tidyverse approach but I thought maybe if I see the base-R approach it would help to understand the code.

Both solutions are really helpful.

andresrcs · February 3, 2021, 1:39am

On the latest version of dplyr there is also the if_any() function that is specific for this kind of thing.

library(dplyr)

df <- tibble(x = c("a", "b", "c"),
             y = c(1, 1, -1),
             z = c(-1, 1, -1)
             )

df %>% 
    filter(if_any(where(is.numeric), ~ .x > 0))
#> # A tibble: 2 x 3
#>   x         y     z
#>   <chr> <dbl> <dbl>
#> 1 a         1    -1
#> 2 b         1     1

^{Created on 2021-02-03 by the reprex package (v1.0.0)}

Khundiman · February 3, 2021, 2:53pm

Thank you for bringing up the if_any function it is so convinient.

system · February 10, 2021, 2:53pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.