Hi there,
Introduction
I have data with coordinates of objects on multiple images. I want to count the number of neighbors in the specific area near each of the objects (e.g. in the box 30 px × 30 px to the left of each object). To achieve this, I simply apply the filter()
function to the coordinates relative to the object of interest. Given the size of my data, it takes way too long to be practical. However, I found a few posts that stated the {data.table}
works much faster with filtering, so I tried to re-write my code using {dtplyr}
instead of {dplyr}
.
The problem
The problem is that with data.table when I nest map()
functions, it sees only variables defined in the "inner" map()
but not in "outer" map()
, while it still sees variables defined in the global env. I don't have such a problem with tibble.
Example
Example 1: Error: object 'my_data' not found
suppressWarnings({
library(tidyverse)
library(data.table, warn.conflicts = FALSE)
library(tidyfast)
library(dtplyr)
})
mpg %>%
as.data.table() %>%
dt_nest(manufacturer) %>%
# Outer mutate to apply the inner function to each row of the list-column
mutate(data = map(
.x = data,
.f = function(my_data){
# This example has no sense, but conveys my intention -
# count number of rows which matches the filtering condition
# which depends on the values from the given row
my_data %>%
mutate(the_n = map2_dbl(
.x = cty,
.y = hwy,
.f = function(x, y){
my_data %>%
filter(x + 1 > 15 & y - 2 > 25) %>%
nrow()
}))
})) %>%
# Series of steps to unnest the results
as.data.table() %>%
mutate(data = map(.x = data,
.f = ~as.data.table(.x))) %>%
as.data.table() %>%
dt_unnest(col = data) %>%
as_tibble()
#> Error in filter(., x + 1 > 15 & y - 2 > 25): object 'my_data' not found
Example 2: Works fine
if avoid trying to access my_data in the inner map()
suppressWarnings({
library(tidyverse)
library(data.table, warn.conflicts = FALSE)
library(tidyfast)
library(dtplyr)
})
mpg %>%
as.data.table() %>%
dt_nest(manufacturer) %>%
# Outer mutate to apply the inner function to each row of the list-column
mutate(data = map(.x = data,
.f = function(my_data){
# Here I just simply calculate the avarage of two values
# This does not requires any extra values outside of
# `map2_dbl()`
my_data %>%
mutate(the_n = map2_dbl(.x = cty,
.y = hwy,
.f = ~mean(c(.x, .y))))
})) %>%
# Series of steps to unnest the results
as.data.table() %>%
mutate(data = map(.x = data,
.f = ~as.data.table(.x))) %>%
as.data.table() %>%
dt_unnest(col = data) %>%
as_tibble()
#> # A tibble: 234 x 12
#> manufacturer model displ year cyl trans drv cty hwy fl class
#> <chr> <chr> <dbl> <int> <int> <chr> <chr> <int> <int> <chr> <chr>
#> 1 audi a4 1.8 1999 4 auto~ f 18 29 p comp~
#> 2 audi a4 1.8 1999 4 manu~ f 21 29 p comp~
#> 3 audi a4 2 2008 4 manu~ f 20 31 p comp~
#> 4 audi a4 2 2008 4 auto~ f 21 30 p comp~
#> 5 audi a4 2.8 1999 6 auto~ f 16 26 p comp~
#> 6 audi a4 2.8 1999 6 manu~ f 18 26 p comp~
#> 7 audi a4 3.1 2008 6 auto~ f 18 27 p comp~
#> 8 audi a4 quattro 1.8 1999 4 manu~ 4 18 26 p comp~
#> 9 audi a4 quattro 1.8 1999 4 auto~ 4 16 25 p comp~
#> 10 audi a4 quattro 2 2008 4 manu~ 4 20 28 p comp~
#> # ... with 224 more rows, and 1 more variable: the_n <dbl>
Example 3: Works fine with tibble.
Same logic as in example 1, but with tibble instead of data.table
suppressWarnings({
library(tidyverse)
library(data.table, warn.conflicts = FALSE)
library(tidyfast)
library(dtplyr)
})
mpg %>%
group_by(manufacturer) %>%
nest() %>%
mutate(data = map(.x = data,
.f = function(my_data){
my_data %>%
mutate(the_n = map2_dbl(.x = cty,
.y = hwy,
.f = function(x, y){
my_data %>%
filter(x + 1 > 15 & y - 2 > 25) %>%
nrow()
}))
})) %>%
unnest(cols = data) %>%
ungroup()
#> # A tibble: 234 x 12
#> manufacturer model displ year cyl trans drv cty hwy fl class
#> <chr> <chr> <dbl> <int> <int> <chr> <chr> <int> <int> <chr> <chr>
#> 1 audi a4 1.8 1999 4 auto~ f 18 29 p comp~
#> 2 audi a4 1.8 1999 4 manu~ f 21 29 p comp~
#> 3 audi a4 2 2008 4 manu~ f 20 31 p comp~
#> 4 audi a4 2 2008 4 auto~ f 21 30 p comp~
#> 5 audi a4 2.8 1999 6 auto~ f 16 26 p comp~
#> 6 audi a4 2.8 1999 6 manu~ f 18 26 p comp~
#> 7 audi a4 3.1 2008 6 auto~ f 18 27 p comp~
#> 8 audi a4 quattro 1.8 1999 4 manu~ 4 18 26 p comp~
#> 9 audi a4 quattro 1.8 1999 4 auto~ 4 16 25 p comp~
#> 10 audi a4 quattro 2 2008 4 manu~ 4 20 28 p comp~
#> # ... with 224 more rows, and 1 more variable: the_n <dbl>
Example 4: Here I just checking which variables are available in the inner map()
Surprisingly, variable c
explicitly defined in the outer map()
is accessible, but neither my_data
nor my_data2
suppressWarnings({
library(tidyverse)
library(data.table, warn.conflicts = FALSE)
library(tidyfast)
library(dtplyr)
})
# variable in the global env
a <- 1
test <- mpg %>%
as.data.table() %>%
dt_nest(manufacturer) %>%
# Outer mutate to apply the inner function to each row of the list-column
mutate(data = map(
.x = data,
.f = function(my_data){
c <- 2
my_data2 <- my_data
# This example has no sense, but conveys my intention -
# count number of rows which matches the filtering condition
# which depends on the values from the given row
my_data %>%
mutate(the_n = map2_dbl(
.x = cty,
.y = hwy,
.f = function(x, y){
# Check existance of several variables
print(c(
# variable `a` defined in the global env
"a" = exists("a"),
# variable `b` not defined
"b" = exists("b"),
# variable `c` defined in the outer `map()`
"c" = exists("c"),
# my_data, which implicitly defined by the outer `map()`
"my_data" = exists("my_data"),
# my_data2, which explicitlt defined by the outer `map()`
"my_data2" = exists("my_data2")
))
# Just to remain code valid return number
return(1)
}))
})) %>%
# Series of steps to unnest the results
as.data.table() %>%
mutate(data = map(.x = data,
.f = ~as.data.table(.x))) %>%
as.data.table() %>%
dt_unnest(col = data) %>%
as_tibble()
#> a b c my_data my_data2
#> TRUE FALSE TRUE FALSE FALSE
#> TRANCATED...
Question
I suspect that there is some issue caused by the lazy evaluation, but can't understand where it appears and how to deal with it. Any suggestions?