Is there a way to extract the names of all functions in an r script?

prenumerant · February 6, 2020, 4:52pm

I have an R markdown file with several function calls in multiple chunks. Now I want to extract a list of the names of all functions called in all chunks.

I am aware of the possibility to use knitr::purl() to extract all code from the code chunks into a standard R script which feels like a good first step, so I'm thinking there might be a method to parse an R script and return a list of function calls?

Many thanks in advance!

nirgrahamuk · February 6, 2020, 6:15pm

I think if the script was evaluated, so that the functions were available in the environment through ls() , they could be captured ?


myfunction <- function () return (42)

results1 <- purrr::map_dfr(ls(),
                    ~data.frame(name=.,
                                is_func = is.function(get(.))))
results2 <- filter(results1,is_func)

> results2
        name is_func
1 myfunction    TRUE

nwerth · February 6, 2020, 9:44pm

To start: this was fun for me. Thanks for this cool problem!

You can use the parse function to convert a script into an expression. Then you can go through the expression, collect any functions called, flatten subexpressions, and repeat until nothing's left to flatten.

We can identify expressions because they have a length: the number of subexpressions plus tokens they contain. Tokens are the smallest unit of a language. For example, 1 + 2 has three tokens: 1, +, and 2.

get_calls <- function(filepath) {
  code <- parse(filepath)
  tokens <- as.list(code)
  calls <- c()
  while (TRUE) {
    any_unpacked <- FALSE
    for (ii in seq_along(tokens)) {
      part <- tokens[[ii]]
      # Calls always have the function name as the first element
      if (is.call(part)) {
        fun_token <- part[[1]]
        calls <- c(calls, deparse(fun_token))
      }
      # Expressions have a length
      if (length(part) > 1) {
        tokens[[ii]] <- as.list(part)
        any_unpacked <- TRUE
      }
    }
    tokens <- unlist(tokens)
    if (!any_unpacked) break
  }
  unique(calls)
}

Here's it run against an example script: ~/example.R:

# ~/example.R
library(dplyr)

iris_plot <- iris %>%
  mutate(id = sample(c(1:10, 99), n(), replace = TRUE)) %>%
  rename_all(tolower) %>%
  rename_all(stringr::str_replace, pattern = ".", replacement = "_")

p <- print
p("Hello, world!")
getFunction("message")("Hello, again!")

The result:

get_calls("~/example.R")
#  [1] "library"                  "<-"                      
#  [3] "p"                        "getFunction(\"message\")"
#  [5] "%>%"                      "getFunction"             
#  [7] "rename_all"               "mutate"                  
#  [9] "sample"                   "c"                       
# [11] "n"                        ":"

Where the function fails:

Functions as objects (it didn't pick up tolower or gsub)
Functions going by other names (it didn't pick up print)
Functions retrieved dynamically (it didn't pick up message)
Probably a bunch of other edge cases

This would only find functions defined in the script, not the ones used. But I like the idea of running the script to create the rats nest of environments. Then maybe we could pair up parsed expressions with the environments they're run in.

Definitely a lot of ways to approach this.

prenumerant · February 7, 2020, 9:14am

Wow, thank you so much for taking the time to write such a thorough solution! That's more or less exactly what I needed. I was looking at parse() but couldn't quite understand how to use it - this has definitely helped med improve my understanding of it!

system · February 14, 2020, 9:14am

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.