Find all rows where ANY numeric variable is greater than zero

Help me understand the below code by responding with a base R approach, at the moment I am failing to understand the tidyverse approach.

Thanks so much.

library(tidyverse)
df <- tibble(x = c("a", "b"), y = c(1, 1), z = c(-1, 1))

# Find all rows where ANY numeric variable is greater than zero
rowAny <- function(x) rowSums(x) > 0
df %>% 
  filter(rowAny(across(where(is.numeric), ~ .x > 0)))
#> # A tibble: 2 x 3
#>   x         y     z
#>   <chr> <dbl> <dbl>
#> 1 a         1    -1
#> 2 b         1     1

Created on 2021-02-02 by the reprex package (v0.3.0)

Does this help?

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

#Notice I added a row with two -1 values
df <- tibble(x = c("a", "b", "c"), y = c(1, 1, -1), z = c(-1, 1, -1))
rowAny <- function(x) rowSums(x) > 0

#Look at the result of the across() function with mutate. It is TRUE or 
#FALSE for each value of y and z
df %>% mutate(New = across(where(is.numeric), ~ .x > 0))
#> # A tibble: 3 x 4
#>   x         y     z New$y $z   
#>   <chr> <dbl> <dbl> <lgl> <lgl>
#> 1 a         1    -1 TRUE  FALSE
#> 2 b         1     1 TRUE  TRUE 
#> 3 c        -1    -1 FALSE FALSE

#rowAny() sums TRUE and False with TRUE = 1 and FALSE = 0. If there is any 
#TRUE value, the sum will be greater than 0 and that is taken as TRUE
df %>% mutate(New = rowAny(across(where(is.numeric), ~ .x > 0)))
#> # A tibble: 3 x 4
#>   x         y     z New  
#>   <chr> <dbl> <dbl> <lgl>
#> 1 a         1    -1 TRUE 
#> 2 b         1     1 TRUE 
#> 3 c        -1    -1 FALSE

#filtering on those TRUE and FALSE values keeps the TRUE rows.
df %>% 
  filter(rowAny(across(where(is.numeric), ~ .x > 0)))
#> # A tibble: 2 x 3
#>   x         y     z
#>   <chr> <dbl> <dbl>
#> 1 a         1    -1
#> 2 b         1     1

Created on 2021-02-02 by the reprex package (v0.3.0)

2 Likes

Given this, a base-r tag may be more useful than dplyr.


FJCC explained the code you shared above. To directly give you a base R alternative, you can consider this (using example from FJCC):

> df <- data.frame(x = c("a", "b", "c"), y = c(1, 1, -1), z = c(-1, 1, -1))
> df[rowSums(x = Filter(f = is.numeric, x = df) > 0) > 0,]
  x y  z
1 a 1 -1
2 b 1  1

Steps are these:

  1. Choose only the numeric columns. I used base::Filter, which is equivalent to where in your example.
  2. Determine whether each elements are positive or not. This is done by the first > 0 check, inside rowSums. In your code, it is this part: ~ .x > 0.
  3. Check whether a row contains any positive or not. In both your way, and my base equivalent, it's done using rowSums and one more check with > 0. If any element had been postive, step 2 will ensure it has TRUE, and hence the sum (after type casting) is positive.
  4. Display only filtered rows. I did subsetting using `[`, and the ,, and in the dplyr way, you use filter.

Hope this helps.

2 Likes

Thank you so much. This also adds to my understanding. I was mainly trying to understand the tidyverse approach but I thought maybe if I see the base-R approach it would help to understand the code.

Both solutions are really helpful.

On the latest version of dplyr there is also the if_any() function that is specific for this kind of thing.

library(dplyr)

df <- tibble(x = c("a", "b", "c"),
             y = c(1, 1, -1),
             z = c(-1, 1, -1)
             )

df %>% 
    filter(if_any(where(is.numeric), ~ .x > 0))
#> # A tibble: 2 x 3
#>   x         y     z
#>   <chr> <dbl> <dbl>
#> 1 a         1    -1
#> 2 b         1     1

Created on 2021-02-03 by the reprex package (v1.0.0)

2 Likes

Thank you for bringing up the if_any function it is so convinient.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.