When and why would you want to use RC as your OO system?

AmeliaMN · August 16, 2018, 3:07pm

I've been teaching some of the material from Advanced R this week, and was realizing I can talk about when you might want to use s4 rather than s3, but I have no idea (or good examples) about when you would want to use reference classes. Thoughts?

MikeBadescu · August 16, 2018, 7:46pm

In a presentation about OO I skipped over R original RC (https://numeract.github.io/dallas-roo/#50) and I talked only about R6 (which is of the "reference" type).

One package that uses R's original reference classes is openxlsx(https://github.com/awalker89/openxlsx/blob/master/R/class_definitions.R). The idea is that you need to keep a pointer to the original object (i.e., the workbook tree) and allow the methods to modify its own data instead of returning a copy of the original object.

R6 does this much better in my opinion. I use R6 for caching in rflow (e.g., https://github.com/numeract/rflow/blob/master/R/eddy-r6.R). In this case, an R6Eddy instance stores the caching data for all cached functions; R6 simplifies the manipulation of the structure and allows keeping only one representation of the cache store throughout the R session without the need to sync several such instances.

alexpghayes · August 23, 2018, 12:05am

In modeling contexts, one use case is avoiding a copy of a large dataset. For example, the GauPro package uses R6 classes to allow for stateful updates the posterior. This prevents redundant computation when new data becomes available. I believe the package is intended as a backend for bayesian optimization.

davis · August 25, 2018, 12:51pm

A number of r-lib packages use R6.

processx

github.com

r-lib/processx/blob/master/R/process.R


#' @useDynLib processx, .registration = TRUE, .fixes = "c_"
NULL

#' External process
#'
#' Managing external processes from R is not trivial, and this
#' class aims to help with this deficiency. It is essentially a small
#' wrapper around the `system` base R function, to return the process
#' id of the started process, and set its standard output and error
#' streams. The process id is then used to manage the process.
#'
#' @section Usage:
#' ```
#' p <- process$new(command = NULL, args,
#'                  stdin = NULL, stdout = NULL, stderr = NULL,
#'                  connections = list(), poll_connection = NULL,
#'                  env = NULL, cleanup = TRUE, cleanup_tree = FALSE,
#'                  wd = NULL, echo_cmd = FALSE, supervise = FALSE,
#'                  windows_verbatim_args = FALSE,

This file has been truncated. show original

progress

github.com

r-lib/progress/blob/master/R/progress.R


#' Progress bar in the terminal
#'
#' Progress bars are configurable, may include percentage, elapsed time,
#' and/or the estimated completion time. They work in the command line,
#' in Emacs and in R Studio. The progress package was heavily influenced by
#' https://github.com/tj/node-progress
#'
#' @section Creating the progress bar:
#' A progress bar is an R6 object, that can be created with
#' \code{progress_bar$new()}. It has the following arguments:
#' \describe{
#'   \item{format}{The format of the progress bar. A number of
#'     tokens can be used here, see them below. It defaults to
#'     \code{"[:bar] :percent"}, which means that the progress
#'     bar is within brackets on the left, and the percentage
#'     is printed on the right.}
#'   \item{total}{Total number of ticks to complete. If it is unknown,
#'      use \code{NA} here. Defaults to 100.}
#'   \item{width}{Width of the progress bar. Default is the current

This file has been truncated. show original

They can also be useful to expose C++ classes to R, then you can create a more user friendly functional layer on top of that. See R arrrow for that general idea.

github.com

romainfrancois/arrow/blob/master/r/R/R6.R

# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements.  See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership.  The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License.  You may obtain a copy of the License at
#
#   http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied.  See the License for the
# specific language governing permissions and limitations
# under the License.

#' @include enums.R
#' @importFrom R6 R6Class
#' @importFrom glue glue

This file has been truncated. show original

AmeliaMN · September 12, 2018, 8:49pm

Thanks to @MikeBadescu, @alexpghayes, and @davis for these answers! My high-level takeaway is that R6 is useful when you are manipulating very large datasets, to avoid the copy-on-modify that R usually does. Is there more to it than that?

MikeBadescu · September 13, 2018, 12:42pm

I would add the case where the object has a "state". One could use the following construct:

obj <- list(state = 0, ....)   # all RC objects can be seen as lists (simplification)

f1 <- function(obj_, ...) {
  main_result = ....   # calculation
  obj_$state = 2

  list(res=main_result, obj=obj_)    # need to return the the modified obj
}

lst <- f1(obj, ...)   # no side effect but messy
obj <- lst$obj        # doing this many times is not fun
res <- lst$res

RC/R6 make things easier to work with:

obj <- R6(state = 0, ....)   # not a proper R6 definition, just illustrating the concept

f1 <- function(obj_, ...) {
  main_result = ....
  obj_$state = 2    # being R6 this modifies `obj` defined in the global env ==> side effect!

  main_result
}

res <- f1(obj, ...)  # nicer to work with, obj already updated (but be careful about side effects)