I have my data organized into nested lists of data frames. I'd like to be able to apply a function to each of the data frames and return the updated data frames in the same nested list structure. Currently I am using nested calls to lapply()
. This works but is difficult to read. I was hopeful that rapply()
could solve my problem by recursively applying a function to all list elements. But since data frames are technically lists, rapply()
wants to operate at the level of the data frame columns instead of the entire data frame.
I did some searching online, but I couldn't find anything that exactly addressed the problem in a succinct way. For example, here's a StackOverflow answer that recommends the repeated lapply()
strategy that I am already using.
Is there any way to get rapply()
to apply functions to the entire data frame? Or is there another option for recursively applying functions?
Below is a reprex for a doubly nested list of data frames. The situation is of course worse for more highly nested lists.
set.seed(20200618)
# Create example data frame
newDF <- function() {
observations <- rpois(1, lambda = 10)
data.frame(group = sample(letters[1:3], observations, replace = TRUE),
measure= rnorm(observations))
}
newDF()
#> group measure
#> 1 a 0.67623031
#> 2 c -0.97297851
#> 3 c -1.35261998
#> 4 b 0.69287941
#> 5 a -0.15351746
#> 6 c -0.62919316
#> 7 a 0.02225157
#> 8 c -1.22776690
# Example nested list
ex1 <- list(
a1 = list(
b1 = newDF()
),
a2 = list(
b2 = newDF(),
b3 = newDF(),
b4 = newDF()
),
a3 = list(
b5 = newDF()
)
)
str(ex1)
#> List of 3
#> $ a1:List of 1
#> ..$ b1:'data.frame': 8 obs. of 2 variables:
#> .. ..$ group : chr [1:8] "a" "a" "a" "b" ...
#> .. ..$ measure: num [1:8] -0.537 0.458 -2.297 -0.248 -1.514 ...
#> $ a2:List of 3
#> ..$ b2:'data.frame': 12 obs. of 2 variables:
#> .. ..$ group : chr [1:12] "a" "b" "a" "b" ...
#> .. ..$ measure: num [1:12] -0.234 -0.871 -0.882 -1.164 -0.999 ...
#> ..$ b3:'data.frame': 19 obs. of 2 variables:
#> .. ..$ group : chr [1:19] "a" "c" "a" "a" ...
#> .. ..$ measure: num [1:19] 1.607 -1.653 -0.79 0.389 -1.645 ...
#> ..$ b4:'data.frame': 7 obs. of 2 variables:
#> .. ..$ group : chr [1:7] "c" "c" "b" "b" ...
#> .. ..$ measure: num [1:7] 0.503 1.325 0.468 -0.542 -0.504 ...
#> $ a3:List of 1
#> ..$ b5:'data.frame': 9 obs. of 2 variables:
#> .. ..$ group : chr [1:9] "c" "c" "b" "a" ...
#> .. ..$ measure: num [1:9] -1.278 -0.136 0.295 -0.217 -0.771 ...
# Example function that needs to be applied to the entire data frame, not its columns
removeNegative <- function(x) x[x[["measure"]] >= 0, ]
# Double lapply() works but is unwieldy
result <- lapply(ex1, function(x) lapply(x, removeNegative))
head(result[[1]][[1]])
#> group measure
#> 2 a 0.4579189
#> 6 c 0.6518109
#> 7 b 0.9167038
# This throws an error because it attempts to apply the function to a single column
rapply(ex1, removeNegative, how = "replace")
#> Error in x[["measure"]]: subscript out of bounds
# This does nothing because none of the columns have class "data.frame"
result <- rapply(ex1, removeNegative, classes = "data.frame", how = "replace")
head(result[[1]][[1]])
#> group measure
#> 1 a -0.5373990
#> 2 a 0.4579189
#> 3 a -2.2969573
#> 4 b -0.2479370
#> 5 c -1.5138069
#> 6 c 0.6518109
Created on 2020-06-18 by the reprex package (v0.3.0)
Session info
devtools::session_info()
#> - Session info ---------------------------------------------------------------
#> setting value
#> version R version 4.0.1 (2020-06-06)
#> os Windows 10 x64
#> system x86_64, mingw32
#> ui RTerm
#> language (EN)
#> collate English_United States.1252
#> ctype English_United States.1252
#> tz America/New_York
#> date 2020-06-18
#>
#> - Packages -------------------------------------------------------------------
#> package * version date lib source
#> assertthat 0.2.1 2019-03-21 [1] CRAN (R 4.0.0)
#> backports 1.1.7 2020-05-13 [1] CRAN (R 4.0.0)
#> callr 3.4.3 2020-03-28 [1] CRAN (R 4.0.0)
#> cli 2.0.2 2020-02-28 [1] CRAN (R 4.0.0)
#> crayon 1.3.4 2017-09-16 [1] CRAN (R 4.0.0)
#> desc 1.2.0 2018-05-01 [1] CRAN (R 4.0.0)
#> devtools 2.3.0 2020-04-10 [1] CRAN (R 4.0.0)
#> digest 0.6.25 2020-02-23 [1] CRAN (R 4.0.0)
#> ellipsis 0.3.1 2020-05-15 [1] CRAN (R 4.0.0)
#> evaluate 0.14 2019-05-28 [1] CRAN (R 4.0.0)
#> fansi 0.4.1 2020-01-08 [1] CRAN (R 4.0.0)
#> fs 1.4.1 2020-04-04 [1] CRAN (R 4.0.0)
#> glue 1.4.1 2020-05-13 [1] CRAN (R 4.0.0)
#> highr 0.8 2019-03-20 [1] CRAN (R 4.0.0)
#> htmltools 0.4.0 2019-10-04 [1] CRAN (R 4.0.0)
#> knitr 1.28 2020-02-06 [1] CRAN (R 4.0.0)
#> magrittr 1.5 2014-11-22 [1] CRAN (R 4.0.0)
#> memoise 1.1.0 2017-04-21 [1] CRAN (R 4.0.0)
#> pkgbuild 1.0.8 2020-05-07 [1] CRAN (R 4.0.0)
#> pkgload 1.1.0 2020-05-29 [1] CRAN (R 4.0.0)
#> prettyunits 1.1.1 2020-01-24 [1] CRAN (R 4.0.0)
#> processx 3.4.2 2020-02-09 [1] CRAN (R 4.0.0)
#> ps 1.3.3 2020-05-08 [1] CRAN (R 4.0.0)
#> R6 2.4.1 2019-11-12 [1] CRAN (R 4.0.0)
#> Rcpp 1.0.4.6 2020-04-09 [1] CRAN (R 4.0.0)
#> remotes 2.1.1 2020-02-15 [1] CRAN (R 4.0.0)
#> rlang 0.4.6 2020-05-02 [1] CRAN (R 4.0.0)
#> rmarkdown 2.2 2020-05-31 [1] CRAN (R 4.0.0)
#> rprojroot 1.3-2 2018-01-03 [1] CRAN (R 4.0.0)
#> sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 4.0.0)
#> stringi 1.4.6 2020-02-17 [1] CRAN (R 4.0.0)
#> stringr 1.4.0 2019-02-10 [1] CRAN (R 4.0.0)
#> testthat 2.3.2 2020-03-02 [1] CRAN (R 4.0.0)
#> usethis 1.6.1 2020-04-29 [1] CRAN (R 4.0.0)
#> withr 2.2.0 2020-04-20 [1] CRAN (R 4.0.0)
#> xfun 0.14 2020-05-20 [1] CRAN (R 4.0.0)
#> yaml 2.2.1 2020-02-01 [1] CRAN (R 4.0.0)
#>
#> [1] C:/Users/BLISCJX/R/win-library/4.0
#> [2] C:/Program Files/R/R-4.0.1/library