Hi,
I know how to use across if I want to summarise some variables.
The issue is I don't know how to summarise a function over a third one.
For example
Here I try, using the dataset from the survey package named "api", to obtain the sum of the variable fpc grouping by cname across the dummies s1 to s3
Thanks, StatSteph.
I already knew how to do the task using srvyr, but I still don't know the way to process the task using dplyr and across. The only solution was modifying the dumies.
Thanks for your code by the way.
I see, to get what your 3 filter statements do. I did it in one step to create the object below called solution and then show that it's nearly the same as those 3 filter statements you had:
library(survey)
#> Loading required package: grid
#> Loading required package: Matrix
#> Loading required package: survival
#>
#> Attaching package: 'survey'
#> The following object is masked from 'package:graphics':
#>
#> dotchart
library(tidyverse)
data(api)
apistrat_mod <- apistrat %>%
mutate(s1=if_else(comp.imp=="Yes",1,0),
s2=if_else(stype=="E",1,0),
s3=if_else(dnum>=500,1,0)) %>%
select(fpc, cname, s1:s3)
summary(apistrat_mod)
#> fpc cname s1 s2
#> Min. : 755.0 Length:200 Min. :0.00 Min. :0.0
#> 1st Qu.: 952.2 Class :character 1st Qu.:0.00 1st Qu.:0.0
#> Median :2719.5 Mode :character Median :1.00 Median :0.5
#> Mean :2653.8 Mean :0.58 Mean :0.5
#> 3rd Qu.:4421.0 3rd Qu.:1.00 3rd Qu.:1.0
#> Max. :4421.0 Max. :1.00 Max. :1.0
#> s3
#> Min. :0.000
#> 1st Qu.:0.000
#> Median :0.000
#> Mean :0.425
#> 3rd Qu.:1.000
#> Max. :1.000
solution <- apistrat_mod %>%
group_by(cname) %>%
summarise(across(s1:s3,~sum(fpc*.))) #need to multiply by s1:s3 which are indicators
# this was your attempts
t1 <- apistrat_mod %>%
filter(s1==1) %>%
group_by(cname) %>%
summarise(s1=sum(fpc))
t2 <- apistrat_mod %>%
filter(s2==1) %>%
group_by(cname) %>%
summarise(s2=sum(fpc))
t3 <-apistrat_mod %>%
filter(s3==1) %>%
group_by(cname) %>%
summarise(s3=sum(fpc))
solution_filter <- t1 %>%
full_join(t2, by="cname") %>%
full_join(t3, by="cname") %>%
replace_na(list(s1=0, s2=0, s3=0))
# these are the same except my solution has some rows yours doesn't
solution %>%
full_join(solution_filter, by="cname") %>%
filter(!near(s1.x, s1.y)|
is.na(near(s1.x, s1.y)),
!near(s2.x, s2.y)|
is.na(near(s2.x, s2.y)),
!near(s1.x, s1.y)|
is.na(near(s3.x, s3.y)),
)
#> # A tibble: 6 x 7
#> cname s1.x s2.x s3.x s1.y s2.y s3.y
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 Amador 0 0 0 NA NA NA
#> 2 Butte 0 0 0 NA NA NA
#> 3 Humboldt 0 0 0 NA NA NA
#> 4 Mariposa 0 0 0 NA NA NA
#> 5 Napa 0 0 0 NA NA NA
#> 6 Stanislaus 0 0 0 NA NA NA