How can I limit multithreading for R-Studio Server (open-source) Users as admin

I am an R admin for some linux servers running R-studio server (the free version.) By default, many R packages that have parallelization (interfacing with Open MP) assume they have all the cores on the system at their disposal. As a consequence, users running r packages like xgboost are causing server crashes at my organization. I've been able to set a default maximum for number of cores used by a session to 4 for R scripts running outside the rstudio ide by setting OMP_NUM_THREADS=4 in a profile.d script, but I'm having trouble figuring out where I can set OMP_NUM_THREADS in a way that rstudio server (open source) users will be affected. I've tried adding an override.conf file for rstudio-server's systemD unit file using sudo systemctl edit rstudio-server to add the following text and restarting the service with no success (r-studio sessions still use the maximum number of cores available.)

[Service]
Environment="OMP_NUM_THREADS=4"

Any advice is much appreciated!

In case it is helpful, here is an example script that would run R "off the rails" if run on one of our servers:


library(xgboost) #for fitting the xgboost model
library(caret) #for general data preparation and model fitting

#load the data (duplicate a few times to simulate bigger dataset)
data = rbind(MASS::Boston)
for(i in 1:250){
  data = rbind(data, MASS::Boston)
}
data = rbind(data, data)
data = rbind(data, data)
data = rbind(data, data)

#view the structure of the data
str(data)

#make this example reproducible
set.seed(0)

#split into training (80%) and testing set (20%)
parts = createDataPartition(data$medv, p = .8, list = F)
train = data[parts, ]
test = data[-parts, ]

#define predictor and response variables in training set
train_x = data.matrix(train[, -13])
train_y = train[,13]

#define predictor and response variables in testing set
test_x = data.matrix(test[, -13])
test_y = test[, 13]

#fit XGBoost model to training set
xgb_train = xgb.DMatrix(data = train_x, label = train_y)
xgb_test = xgb.DMatrix(data = test_x, label = test_y)

#define watchlist
watchlist = list(train=xgb_train, test=xgb_test)

#fit XGBoost model and display training and testing data at each round
model = xgb.train(data = xgb_train, max.depth = 3, watchlist=watchlist, nrounds = 70)

#define final model
final = xgboost(data = xgb_train, max.depth = 3, nrounds = 56, verbose = 0)

#use model to make predictions on test data
pred_y = predict(final, xgb_test)

#measure prediction accuracy
mean((test_y - pred_y)^2) #mse
caret::MAE(test_y, pred_y) #mae
caret::RMSE(test_y, pred_y) #rmse

Have you tried setting OMP_NUM_THREADS=4 in {R_HOME}/etc/Renviron.site?

Thanks for the suggestion! I have set OMP_NUM_THREADS=4 in Renviron.site. The setting shows up when I run Sys.getenv("OMP_NUM_THREADS") or system("env | grep OMP_NUM_THREADS", intern=T) in the rstudio session, but it is not having any affect on the underlying session; using top, I can see that %CPU is still exceeding 400% when I run the script above. Running the same script in batch mode on the same server, I can see that %CPU is limited to below 400% (this is because of the profile.d scripts I created, but this also can be simulated by running export OMP_NUM_THREADS=4 in a bash terminal before running the script in batch mode.)

Also, apologies, I realize I was calling the environment variable OMP_NUM_CORES in my original post, but the correct setting is OMP_NUM_THREADS. I edited this above. I can confirm that this was only mis-specified in the post- this is not what is causing my problem.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.