Problem with data.table "by" option.

I am having a problem trying to use the by option in a data table. Either I am missing something blindingly obvious or I have a serious R problem and I have no idea how to check it. The data supplied here is the actual data in use but I have mocked up a couple of 3X3 data.tables and get the same errors.

I am trying to sum a column of integers (dollars) by a another variable (status)
It works until I try to add a variable name to the summation.
Setup details below.

Problem

library(data.table)

# Load data & convert to data.table ---------------------------------------

DT <- structure(list(iso = c("BRA", "CHN", "IRN", "ETH", "IND", "IDN", 
"ARE", "RUS", "ZAF", "EGY", "SAU", "BLR", "BOL", "CUB", "KAZ", 
"MYS", "THA", "UGA", "UZB"), cty = c("Brazil", "China", "Egypt", 
"Ethiopia", "India", "Indonesia", "Iran", "Russia", "South Africa", 
"UAE", "Saudi Arabia", "Belarus", "Bolivia", "Cuba", "Kazakhstan", 
"Malaysia", "Thailand", "Uganda", "Uzbekistan"), dollars = c(4735725L, 
38190085L, 1819807L, 434151L, 16192423L, 4662888L, 870439L, 6921249L, 
989308L, 2225198L, 2519571L, 301471L, 159854L, NA, 842049L, 1378901L, 
1771065L, 163713L, 431926L), status = c("Member", "Member", "Member", 
"Member", "Member", "Member", "Member", "Member", "Member", "Member", 
"Member", "Partner", "Partner", "Partner", "Partner", "Partner", 
"Partner", "Partner", "Partner")), class = "data.frame", row.names = c(NA, 
-19L))


setDT(DT) ; DT


# Works -------------------------------------------------------------------

DT[, sum(dollars, na.rm = TRUE)]

# Works 

DT[, sum(dollars, na.rm = TRUE), by = status]


# Crashes -----------------------------------------------------------------

DT[, sigma = sum(dollars, na.rm = TRUE), by = status]


# Desperate attempt --use factor. Crashes------------------------------------------
DT[, sigma = sum(dollars, na.rm = TRUE), by = as.factor(status)]

*Error in [.data.table(DT, , sigma = sum(dollars, na.rm = TRUE), by = status) : *

  • unused argument (sigma = sum(dollars, na.rm = TRUE))*
Ubuntu 24.04

RStudio 2025.09.2+418 "Cucumberleaf Sunflower"

 sessionInfo()
R version 4.5.2 (2025-10-31)
Platform: x86_64-pc-linux-gnu
Running under: Ubuntu 24.04.2 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.12.0 
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0  LAPACK version 3.12.0

locale:
 [1] LC_CTYPE=en_CA.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8       
 [4] LC_COLLATE=en_CA.UTF-8     LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_CA.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C              
[10] LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

time zone: America/Toronto
tzcode source: system (glibc)

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] data.table_1.18.0

loaded via a namespace (and not attached):
[1] compiler_4.5.2    tools_4.5.2       rstudioapi_0.18.0

Hi @jrkrideau ,

in data.table, when you want to create a new column inside j (the second argument), you need to wrap the assignment in .() or use :=.

So

# This creates a result table with status and sigma.
DT[, .(sigma = sum(dollars, na.rm = TRUE)), by = status]

or

# This modifies DT in place, adding a sigma column where each row gets the group sum
DT[, sigma := sum(dollars, na.rm = TRUE), by = status]

Aha!
I tried

DT[, .(sigma = sum(dollars, na.rm = TRUE)), by = status]

but clearly messed up the syntax.

Thanks.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.