Check NULL value without evaluating it (monadic bind) in R

torvaney · February 2, 2022, 3:04pm

Hello all,

I have a slightly odd (and possibly dumb) question:

Is there a way to check if a value is null, without evaluating it in the non-null case?

This is perhaps better explained by way of example. I want to do something like:

if (is.null(x)) {
  NULL
} else {
  f(x)
}

However, I don't want to evaluate x in the first case.

For example, if the function f is enquo, then x may be a variable which doesn't exist in the current environment.

Does this question make sense? Is there a way to do this in R?

Thanks

valentingar · February 2, 2022, 3:24pm

Hi there,

you could check first whether or not the object exists and only checking for null if it does.

exists("x")
#> [1] FALSE
x <- 1
exists("x")
#> [1] TRUE

^{Created on 2022-02-02 by the reprex package (v2.0.1)}

Make sure to quote your variable here ("x" instead of x).

Maybe a better idea would be, however, to initialise your value beforehand to NULL so that it will definitely exist at that point. I suppose you wanted this, because is.null(x)would result in an error if x does not exist. For exception handling, including errors, also have a look at try()and tryCatch().

I hope this helps!

Best,
Valentin

torvaney · February 2, 2022, 4:39pm

Thanks, Valentin!

I think I have misled you slightly with my example (or maybe I am confusing myself). The variable x may exist in this example (either initialised to NULL or some other value). However, evaluating a non-null x may cause an unwanted side-effect like throwing an error.

In the case where the failure mode is that variable does not exist, then I think you're right that exists("x") && is.null(x) would work.

There is also the rlang function quo_is_null which simplifies cases where x is either NULL or ought to be (en)quo'd. So maybe this is a non-issue with these workarounds.

valentingar · February 2, 2022, 7:39pm

Hi

maybe it would help, if you could give a short reprex for a case where is.null() would throw an error. On the spot I can only think of if the object wouldn't exist, but I am not sure what other cases there may be.

Btw.: I didn't mean exists("x") && is.null(x), because then is.null would be evaluated, but rather:

if (exists("x")) {
is.null(x)
}

This way, is.null() is only called if it can be evaluated.
Best,
Valentin

torvaney · February 3, 2022, 3:51pm

maybe it would help, if you could give a short reprex for a case where is.null() would throw an error. On the spot I can only think of if the object wouldn't exist, but I am not sure what other cases there may be.

Sure!

My example looked vaguely like this. I wanted to supply two expressions to a dplyr function. The second of these expressions was optional; if missing, the first expression should be used in both cases:

My initial thought was that we could do something like this:

library(tidyverse)

apply_if_not_null <- function(f, x) {
  if (is.null(x)) {
    x
  } else {
    f(x)
  }
}

agg_example1 <- function(data, col1, col2 = NULL) {
  col1 <- enquo(col1) 
  col2 <- apply_if_not_null(enquo, col2) %||% col1
  
  data %>%
    mutate(bin = cut_number(!!col2, 5)) %>% 
    group_by(bin) %>% 
    summarise(
      bin_avg = mean(!!col1),
      mass_avg = mean(mass, na.rm = TRUE),
      n = n()
    )
}

This works fine for the NULL case

agg_example1(starwars, height)
#> `summarise()` ungrouping output (override with `.groups` argument)
#> # A tibble: 6 x 4
#>   bin       bin_avg mass_avg     n
#>   <fct>       <dbl>    <dbl> <int>
#> 1 [66,165]     128.     41.9    19
#> 2 (165,175]    171.    187.     14
#> 3 (175,183]    181.     79.2    18
#> 4 (183,193]    189.     80.2    14
#> 5 (193,264]    213.    106.     16
#> 6 <NA>          NA     NaN       6

But not for the case where both arguments are provided:

agg_example1(starwars, height, birth_year)
#> Error in bind(enquo, col2): object 'birth_year' not found

As alluded to earlier, this specific issue can be solved using the quo_is_null function:

agg_example2 <- function(data, col1, col2 = NULL) {
  col1 <- enquo(col1) 
  col2 <- enquo(col2)
  col2 <- if (rlang::quo_is_null(col2)) col1 else col2
  
  data %>%
    mutate(bin = cut_number(!!col2, 5)) %>% 
    group_by(bin) %>% 
    summarise(
      bin_avg = mean(!!col1),
      mass_avg = mean(mass, na.rm = TRUE),
      n = n()
    )
}

agg_example2(starwars, height)
#> `summarise()` ungrouping output (override with `.groups` argument)
#> # A tibble: 6 x 4
#>   bin       bin_avg mass_avg     n
#>   <fct>       <dbl>    <dbl> <int>
#> 1 [66,165]     128.     41.9    19
#> 2 (165,175]    171.    187.     14
#> 3 (175,183]    181.     79.2    18
#> 4 (183,193]    189.     80.2    14
#> 5 (193,264]    213.    106.     16
#> 6 <NA>          NA     NaN       6
agg_example2(starwars, height, birth_year)
#> `summarise()` ungrouping output (override with `.groups` argument)
#> # A tibble: 6 x 4
#>   bin         bin_avg mass_avg     n
#>   <fct>         <dbl>    <dbl> <int>
#> 1 [8,31.2]       168.     76.2     9
#> 2 (31.2,45.6]    170.     77.0     8
#> 3 (45.6,57.2]    175.     78.9     9
#> 4 (57.2,82]      179      73.6     9
#> 5 (82,896]       174.    259       8
#> 6 <NA>            NA      74.0    44

^{Created on 2022-02-03 by the reprex package (v0.3.0)}

I suppose this won't work because once you enter lazy evaluation, you have to do everything within lazy evaluation.

Btw.: I didn't mean exists("x") && is.null(x) , because then is.null would be evaluated

Fwiw, R evaluates expressions either side of && lazily, so in this case x is not evaluated:

exists("x") && x
#> [1] FALSE

x <- TRUE
exists("x") && x
#> [1] TRUE

^{Created on 2022-02-03 by the reprex package (v0.3.0)}

valentingar · February 3, 2022, 8:45pm

Hi!

Thanks for the reprex - it was immensely helpful. Less so: me, as this is way above my pay grade.
However with the reprex it is now much clearer what the problem is you are facing. Basically your problem is that at the time you check whether or not is.null(col2) the promise is evaluated and will lead to problems if the promise can't be fulfilled at that point. So one idea would be to check first, whether col2 is a promise first and only check is.null() then. However, I don't think there is a way to do that (in base-r). Maybe you will find this helpful:

I am sorry I can't be of more use this time. It was very interesting looking into the deeper mechanics of R though.
Best,
Valentin

system · February 24, 2022, 8:46pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.