how to run a boolean test on a list using purrr?

von_olaf · May 14, 2019, 12:27pm

Consider this simple example

tibble(mytext = list(list('rstudio is nice', 'rstats is cool'),
                  list('this stuff is incredible', 'hello world'))) 
# A tibble: 2 x 1
  mytext    
  <list>    
1 <list [2]>
2 <list [2]>

For each row, I would like to do two things

only keep the elements of the the mytext list that contain rstudio
create a variable that is TRUE is any of these mytext elements contain rstudio.

I am able to do 1. with purrr::keep but the second one fails with purrr::some.

tibble(mytext = list(list('rstudio is nice', 'rstats is cool'),
                  list('this stuff is incredible', 'hello world'))) %>% 
  mutate(subsample = map(mytext, ~purrr::keep(.x, str_detect(.x,'rstudio')))) %>% 
  mutate(flag = purrr::some(mytext, ~str_detect(.x,'rstudio'))) 

# A tibble: 2 x 3
  mytext     subsample  flag 
  <list>     <list>     <lgl>
1 <list [2]> <list [1]> TRUE 
2 <list [2]> <list [0]> TRUE

As you can see, subsample correctly subsets the lists, while flag returns TRUE for all rows while it should only be true for row 1 ... Indeed, only the first row of the variable mytext contains a list that contains the string rstudio.

What am I missing here?
Thanks!

FJCC · May 15, 2019, 3:51am

I get tangled up nesting too many functions. How about this?

library(purrr)
#> Warning: package 'purrr' was built under R version 3.5.3
library(tibble)
library(stringr)
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

FindIt <- function(l) {
  purrr::some(l, ~str_detect(., 'rstudio'))
}

tibble(mytext = list(list('rstudio is nice', 'rstats is cool'),
                     list('this stuff is incredible', 'hello world'))) %>% 
  mutate(subsample = map(mytext, ~purrr::keep(.x, str_detect(.x,'rstudio')))) %>% 
  mutate(flag = map_lgl(mytext, FindIt))
#> # A tibble: 2 x 3
#>   mytext     subsample  flag 
#>   <list>     <list>     <lgl>
#> 1 <list [2]> <list [1]> TRUE 
#> 2 <list [2]> <list [0]> FALSE

^{Created on 2019-05-14 by the reprex package (v0.2.1)}

von_olaf · May 15, 2019, 3:24pm

well this is extremely weird because I also tried this solution:


tibble(mytext = list(list('rstudio is nice', 'rstats is cool'),
                     list('this stuff is incredible', 'hello world'))) %>% 
  mutate(subsample = map(mytext, ~purrr::keep(.x, str_detect(.x,'rstudio')))) %>% 
  mutate(flag = map_dbl(mytext, ~purrr::some(mytext, ~str_detect(.x,'rstudio'))))

# A tibble: 2 x 3
  mytext     subsample   flag
  <list>     <list>     <dbl>
1 <list [2]> <list [1]>     1
2 <list [2]> <list [0]>     1

Why is it working for you when the call is wrapped into a function? Is this a bug?

Thanks!

FJCC · May 15, 2019, 4:31pm

Notice that you are passing mytext to some(). What you need to pass is the element of mytext which map_dbl() is processing. Try this:

tibble(mytext = list(list('rstudio is nice', 'rstats is cool'),
                     list('this stuff is incredible', 'hello world'))) %>% 
  mutate(subsample = map(mytext, ~purrr::keep(.x, str_detect(.x,'rstudio')))) %>% 
  mutate(flag = map_dbl(mytext, ~purrr::some(., ~str_detect(.x,'rstudio'))))

tbradley · May 15, 2019, 4:33pm

It is discouraged to @name reference a user who has not engaged in a thread on their own. Please see the community faq about @name usage

von_olaf · May 15, 2019, 4:38pm

hi bradley I referenced mara because she edited the question. That does not count?

system · May 22, 2019, 4:38pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.