Advice on using group_map instead of do

I am trying to create a plot of confidence intervals using the following code, which works. However, help(do, dplyr) says that do() "may be advantageously replaced by group_map()". I tried changing the script to "group_map(~ t.test( ~ Price, data=.x) %>% tidy)" (which also has an extra "~" and "x") but now I get "object 'conf.low' not found".
The output of do() is "A tibble: 6 x 9". The output of group_map() is "A tibble: 1 x 8" eight times.
This is with dplyr 0.8.1, and I see that the group_map function has been updated, and probably the documentation just needs to give me more of a clue to how I should proceed.

library(MASS)
library(mosaic)
Cars93 %>% group_by(Type) %>% do(t.test( ~ Price, data=.) %>% tidy) %>%
  gf_pointrange(estimate + conf.low + conf.high ~ Type) %>%
  gf_labs(y="Mean price ($1000) with 95% CI")

Hi @jhitchcock

Things are working fine with using your example and group_map or group_modify. It requires last dplyr version 0.8.1 where those function have changed : see announcement. group_map from 0.8.1 is now group_modify. The difference is group_map will return a list, group_modify will return a tibble, so the .f argument must return a tibble. As said in the doc,

group_modify() is an evolution of do(), if you have used that before.

You should look at the doc of ?group_modify for examples. I don't know about your conf.low error.

library(MASS)
library(mosaic)
library(dplyr)
Cars93 %>% 
  group_by(Type) %>% 
  do(t.test( ~ Price, data=.) %>% broom::tidy())
#> # A tibble: 6 x 9
#> # Groups:   Type [6]
#>   Type  estimate statistic  p.value parameter conf.low conf.high method
#>   <fct>    <dbl>     <dbl>    <dbl>     <dbl>    <dbl>     <dbl> <chr> 
#> 1 Comp~     18.2     10.9  1.60e- 8        15    14.6       21.8 One S~
#> 2 Large     24.3     12.7  1.69e- 7        10    20.0       28.6 One S~
#> 3 Mids~     27.2     10.4  9.56e-10        21    21.8       32.7 One S~
#> 4 Small     10.2     23.9  3.65e-16        20     9.28      11.1 One S~
#> 5 Spor~     19.4      9.10 5.32e- 7        13    14.8       24.0 One S~
#> 6 Van       19.1     30.5  1.45e- 9         8    17.7       20.5 One S~
#> # ... with 1 more variable: alternative <chr>

Cars93 %>% 
  group_by(Type) %>% 
  group_modify(~ t.test( ~ Price, data=.) %>% broom::tidy())
#> # A tibble: 6 x 9
#> # Groups:   Type [6]
#>   Type  estimate statistic  p.value parameter conf.low conf.high method
#>   <fct>    <dbl>     <dbl>    <dbl>     <dbl>    <dbl>     <dbl> <chr> 
#> 1 Comp~     18.2     10.9  1.60e- 8        15    14.6       21.8 One S~
#> 2 Large     24.3     12.7  1.69e- 7        10    20.0       28.6 One S~
#> 3 Mids~     27.2     10.4  9.56e-10        21    21.8       32.7 One S~
#> 4 Small     10.2     23.9  3.65e-16        20     9.28      11.1 One S~
#> 5 Spor~     19.4      9.10 5.32e- 7        13    14.8       24.0 One S~
#> 6 Van       19.1     30.5  1.45e- 9         8    17.7       20.5 One S~
#> # ... with 1 more variable: alternative <chr>

Cars93 %>% 
  group_by(Type) %>% 
  group_map(~ t.test( ~ Price, data=.) %>% broom::tidy()) %>%
  str(1)
#> List of 6
#>  $ :Classes 'tbl_df', 'tbl' and 'data.frame':    1 obs. of  8 variables:
#>  $ :Classes 'tbl_df', 'tbl' and 'data.frame':    1 obs. of  8 variables:
#>  $ :Classes 'tbl_df', 'tbl' and 'data.frame':    1 obs. of  8 variables:
#>  $ :Classes 'tbl_df', 'tbl' and 'data.frame':    1 obs. of  8 variables:
#>  $ :Classes 'tbl_df', 'tbl' and 'data.frame':    1 obs. of  8 variables:
#>  $ :Classes 'tbl_df', 'tbl' and 'data.frame':    1 obs. of  8 variables:

Created on 2019-05-19 by the reprex package (v0.3.0.9000)

Cars93 %>% 
  group_by(Type) %>% 
  group_modify(~ t.test( ~ Price, data=.) %>% broom::tidy()) %>%
  gf_pointrange(estimate + conf.low + conf.high ~ Type) %>%
  gf_labs(y="Mean price ($1000) with 95% CI")

Created on 2019-05-19 by the reprex package (v0.3.0.9000)

Hope it helps

2 Likes

That is very helpful, and you replied very quickly. :smile:
I think that gf_pointrange() expects a data frame, and therefore I need to use group_modify() as you have shown.
Therefore I guess the help page for dplyr::do should suggest the use of group_modify().

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.