Help me understand the below code by responding with a base R approach, at the moment I am failing to understand the tidyverse approach.
Thanks so much.
library(tidyverse)
df <- tibble(x = c("a", "b"), y = c(1, 1), z = c(-1, 1))
# Find all rows where ANY numeric variable is greater than zero
rowAny <- function(x) rowSums(x) > 0
df %>%
filter(rowAny(across(where(is.numeric), ~ .x > 0)))
#> # A tibble: 2 x 3
#> x y z
#> <chr> <dbl> <dbl>
#> 1 a 1 -1
#> 2 b 1 1
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
#Notice I added a row with two -1 values
df <- tibble(x = c("a", "b", "c"), y = c(1, 1, -1), z = c(-1, 1, -1))
rowAny <- function(x) rowSums(x) > 0
#Look at the result of the across() function with mutate. It is TRUE or
#FALSE for each value of y and z
df %>% mutate(New = across(where(is.numeric), ~ .x > 0))
#> # A tibble: 3 x 4
#> x y z New$y $z
#> <chr> <dbl> <dbl> <lgl> <lgl>
#> 1 a 1 -1 TRUE FALSE
#> 2 b 1 1 TRUE TRUE
#> 3 c -1 -1 FALSE FALSE
#rowAny() sums TRUE and False with TRUE = 1 and FALSE = 0. If there is any
#TRUE value, the sum will be greater than 0 and that is taken as TRUE
df %>% mutate(New = rowAny(across(where(is.numeric), ~ .x > 0)))
#> # A tibble: 3 x 4
#> x y z New
#> <chr> <dbl> <dbl> <lgl>
#> 1 a 1 -1 TRUE
#> 2 b 1 1 TRUE
#> 3 c -1 -1 FALSE
#filtering on those TRUE and FALSE values keeps the TRUE rows.
df %>%
filter(rowAny(across(where(is.numeric), ~ .x > 0)))
#> # A tibble: 2 x 3
#> x y z
#> <chr> <dbl> <dbl>
#> 1 a 1 -1
#> 2 b 1 1
Given this, a base-r tag may be more useful than dplyr.
FJCC explained the code you shared above. To directly give you a base R alternative, you can consider this (using example from FJCC):
> df <- data.frame(x = c("a", "b", "c"), y = c(1, 1, -1), z = c(-1, 1, -1))
> df[rowSums(x = Filter(f = is.numeric, x = df) > 0) > 0,]
x y z
1 a 1 -1
2 b 1 1
Steps are these:
Choose only the numeric columns. I used base::Filter, which is equivalent to where in your example.
Determine whether each elements are positive or not. This is done by the first > 0 check, inside rowSums. In your code, it is this part: ~ .x > 0.
Check whether a row contains any positive or not. In both your way, and my base equivalent, it's done using rowSums and one more check with > 0. If any element had been postive, step 2 will ensure it has TRUE, and hence the sum (after type casting) is positive.
Display only filtered rows. I did subsetting using `[`, and the ,, and in the dplyr way, you use filter.
Thank you so much. This also adds to my understanding. I was mainly trying to understand the tidyverse approach but I thought maybe if I see the base-R approach it would help to understand the code.