how can a function change the value of a non-returned dataset? without global assignment?

Hi Posit Community!

My mind is blown right now by some behavior that I did not expect and do not understand.

I am using a function and it manages to modify a dataframe that is not returned. I hypothesized that this was because it may have used the <<- operator (which I am only vaguely aware of) and thus attempted to duplicate the dataframe. But both dataframes are nonetheless changed by the function. And I do not see a global assignment in the function code.

  1. How does this assignment happen?? This violates my weak understanding of scoping/ environments.
  2. Why did my attempt to duplicate the dataframe not preserve one unedited copy? That is, why are both tmp_charlson_orig and tmp_charlson changed by the call to comorbidity::score? Even if score changes the tmp_charlson, why isn't tmp_charlson_orig preserved? Why isn't tmp_charlson_orig still a vector of 1s?

Thank you so much!

``` r
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(comorbidity)

sample_people <- 5
sample_codes <- 1e4

icd9_df <- data.frame(
  id = sample(1:sample_people, size = sample_codes, replace = TRUE),
  code = comorbidity::sample_diag(n = sample_codes, version = "ICD9_2015") # default
)
tmp_charlson_orig <- comorbidity::comorbidity(
      x = icd9_df,
      id = "id",
      code = "code",
      map = "charlson_icd9_quan",
      assign0 = FALSE
    )
tmp_charlson_orig$canc
#> [1] 1 1 1 1 1

tmp_charlson <- tmp_charlson_orig
tmp_charlson_score <- comorbidity::score(
      tmp_charlson,
      assign0 = TRUE,
      weights = "quan"
    )

identical(tmp_charlson_orig, tmp_charlson)
#> [1] TRUE

tmp_charlson_orig$canc
#> [1] 0 0 0 0 0

Created on 2024-10-04 with reprex v2.0.2

The code for comorbidity::score is:

> comorbidity::score
function (x, weights = NULL, assign0) 
{
    if (!inherits(x = x, what = "comorbidity")) {
        stop("This function can only be used on an object of class 'comorbidity', which you can obtain by using the 'comorbidity()' function. See ?comorbidity for more details.", 
            call. = FALSE)
    }
    map <- attr(x, "map")
    arg_checks <- checkmate::makeAssertCollection()
    checkmate::assert_class(x, classes = "comorbidity", add = arg_checks)
    checkmate::assert_string(weights, null.ok = TRUE, add = arg_checks)
    if (!is.null(weights)) {
        weights <- stringi::stri_trans_tolower(weights)
    }
    checkmate::assert_choice(weights, choices = names(.weights[[map]]), 
        null.ok = TRUE, add = arg_checks)
    checkmate::assert_logical(assign0, add = arg_checks)
    if (!arg_checks$isEmpty()) 
        checkmate::reportAssertions(arg_checks)
    if (is.null(weights)) {
        ww <- rep(1, length(.maps[[map]]))
        names(ww) <- names(.maps[[map]])
    }
    else {
        ww <- .weights[[map]][[weights]]
    }
    ww <- matrix(data = ww, ncol = 1)
    x <- x[, names(.maps[[map]])]
    if (assign0) {
        data.table::setDT(x)
        x <- .assign0(x = x, map = map)
        data.table::setDF(x)
    }
    score <- as.matrix(x) %*% ww
    score <- drop(score)
    attr(score, "map") <- map
    attr(score, "weights") <- weights
    return(score)
}
<bytecode: 0x5598bc665938>
<environment: namespace:comorbidity>
> 

and the code for comorbidity:::.assign0 is:

> comorbidity:::.assign0
function (x, map) 
{
    if (grepl("charlson", map)) {
        x[msld == 1, `:=`(mld, 0)]
        x[diabwc == 1, `:=`(diab, 0)]
        x[metacanc == 1, `:=`(canc, 0)]
    }
    else if (grepl("elixhauser", map)) {
        x[hypc == 1, `:=`(hypunc, 0)]
        x[diabc == 1, `:=`(diabunc, 0)]
        x[metacanc == 1, `:=`(solidtum, 0)]
    }
    return(x)
}
<bytecode: 0x5598b99e5070>
<environment: namespace:comorbidity>
> 

The comorbidity package imports data.table. I'm guessing that comorbidity::comorbidity() returns a data.table which does not create a copy like R's data.frame but instead can update by reference. So even though you do this, tmp_charlson <- tmp_charlson_orig, both point to the same reference in memory and alter the same object. You can use data.table::copy() to create a "deep" copy of the table.

Edit:
You can check if it is indeed a data.table using class(tmp_charlson_orig).

Ah, thank you @eric-hunt ! I had no idea about that property of data.table! Thank you!

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.