Possible solutions for "no symbol named 'x' in scope"

KenWilliams · July 25, 2019, 6:14pm

It's well-known that when writing code like the following, RStudio (and R CMD check, and presumably several other things) will complain about no symbol named 'lat' in scope and no symbol named 'lon' in scope:

weather_points <- weather_data %>%
    distinct(lat, lon)

That's of course just an example using distinct(), it happens frequently with any of the common functions like mutate() or filter() or whatever - anything that uses non-standard evaluation to place the column names of the data into scope as variable names.

The result is that these warnings generally go ignored, and the noise builds up so that other legitimate warnings also go ignored, and that leads to bugs.

One solution is to disambiguate by explicitly using .$foo syntax:

weather_points <- weather_data %>%
    distinct(.$lat, .$lon)

That has some disadvantages:

All the mentions of variables will need to be changed in this way;
Some tools will still complain about . being undefined (it looks like R CMD check still will, and RStudio won't?);
Most importantly - it changes the behavior when one of the variables is typoed. In the original code, a fatal runtime exception is thrown, but when using .$foo it will silently return NULL. It will also resolve .$la to .$lat, which is different from the original, which requires exact name matching.

So while this gets rid of a warning, it's actually less safe in some important ways than the original.

Another option would be to have a function that merely asserts the existence of columns by name, essentially "declaring" them for use later in the pipeline:

weather_points <- weather_data %>%
    vars(lat, lon) %>%
    distinct(lat, lon)

The idea is that it would throw a runtime exception if lat and lon weren't present in weather_data (the same way that the existing distinct() call would have), but also that tools like RStudio could easily parse the vars() call to know that lat and lon are legitimate variables later in the pipeline (and future pipelines based on the result of this pipeline, etc.).

One slight advantage in the exception-throwing part is that it can explicitly check that the variables are present in the data table rather than just as ambient variables in the namespace, which seems like it could help avoid some errors too.

Thoughts? Any other existing technique that I haven't thought of? I know a lot of other people have thought about this too, so let me know if I'm missing something.

system · August 15, 2019, 6:14pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.