High CPU Usage When Running SVM with tidymodels and kernlab on Ubuntu

Hello,

I'm experiencing an issue where running a simple SVM model using the tidymodels and kernlab packages on Ubuntu causes unexpectedly high CPU usage, with a single CPU's usage exceeding 100%. This suggests that the process is utilizing multiple CPUs, leading to system lag. Below is the code snippet causing the issue:

library(tidyverse)
library(tidymodels)
library(extrafont)
library(discrim)
library(plsmod)
library(rules)
library(bonsai)
library(themis)
library(ggtext)
library(stacks)
library(colino)
all_raw_data <- iris

run_chi_models <- list()
run_tune_bayes_models_res <- list()

n_run <- 1

i <- 1

set.seed(1234)

for (i in 1:n_run) {
prog <- txtProgressBar(min = 0, max = n_run, style = 3)
setTxtProgressBar(prog, i)
random_seed <- sample.int(10000, 1)
set.seed(random_seed)

initial_splits <- initial_split(all_raw_data, prop = 0.7, strata = "Species", pool = 0.2)
training_set <- training(initial_splits)
testing_set <- testing(initial_splits)

run_5k_10r_cv <- vfold_cv(training_set, v = 2, repeats = 2, strata = "Species", pool = 0.2)

run_recipe1 <- training_set %>%
recipe(Species ~ .,)

svm_rbf_model <- svm_rbf(cost = tune(), rbf_sigma = tune()) %>%
set_mode("classification") %>%
set_engine("kernlab")

run_chi_models[[i]] <- workflow_set(
preproc = list(
recipe1 = run_recipe1
),
models = list(
svm_rfb = svm_rbf_model
),
cross = TRUE)

run_tune_bayes_models_res[[i]] <- run_chi_models[[i]] %>%
workflow_map("tune_bayes", resamples = run_5k_10r_cv, initial = 50 , iter = 200,
metrics = metric_set(accuracy, roc_auc, f_meas, recall, precision, kap), verbose = T, #accuracy, kap, sens, spec, ppv, npv, mcc, j_index, bal_accuracy, detection_prevalence, precision, recall, f_meas
control = control_bayes(save_workflow = FALSE, save_pred = FALSE)
)
}

I've tried limiting the number of cores via environment variables (Sys.setenv(OMP_NUM_THREADS = 1)) and doParallel without any luck. The svm_rbf function seems to automatically use all available CPU cores, which I do not understand. Is there any way to restrict kernlab to use only one core, or is there a workaround to prevent such high CPU utilization?

Any insights or suggestions would be greatly appreciated!

Below is my sessionInfo.

R version 4.1.2 (2021-11-01)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 22.04.4 LTS

Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so

locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] colino_0.0.1 stacks_1.0.2 ggtext_0.1.2
[4] themis_1.0.2.9000 bonsai_0.3.1.9000 rules_1.0.2.9000
[7] plsmod_1.0.0.9000 discrim_1.0.1.9000 extrafont_0.19
[10] yardstick_1.3.1.9000 workflowsets_1.1.0.9000 workflows_1.1.4.9000
[13] tune_1.2.1.9000 rsample_1.2.1.9000 recipes_1.1.0.9000
[16] parsnip_1.2.1.9001 modeldata_1.4.0 infer_1.0.7.9000
[19] dials_1.3.0 scales_1.3.0 broom_1.0.6.9000
[22] tidymodels_1.2.0.9000 lubridate_1.9.2 forcats_1.0.0
[25] stringr_1.5.0 dplyr_1.1.4 purrr_1.0.2
[28] readr_2.1.4 tidyr_1.3.1 tibble_3.2.1
[31] ggplot2_3.5.1 tidyverse_2.0.0

loaded via a namespace (and not attached):
[1] matrixStats_1.0.0 DiceDesign_1.9 RColorBrewer_1.1-3
[4] tools_4.1.2 backports_1.4.1 utf8_1.2.3
[7] R6_2.5.1 rpart_4.1.19 colorspace_2.1-0
[10] nnet_7.3-19 withr_2.5.0 tidyselect_1.2.0
[13] gridExtra_2.3 compiler_4.1.2 extrafontdb_1.0
[16] cli_3.6.3 xml2_1.3.5 digest_0.6.33
[19] pkgconfig_2.0.3 parallelly_1.36.0 lhs_1.1.6
[22] rlang_1.1.4 rstudioapi_0.16.0 generics_0.1.3
[25] BiocParallel_1.28.3 magrittr_2.0.3 ROSE_0.0-4
[28] Matrix_1.6-0 Rcpp_1.0.11 munsell_0.5.0
[31] fansi_1.0.4 GPfit_1.0-8 lifecycle_1.0.3
[34] furrr_0.3.1.9000 stringi_1.7.12 MASS_7.3-60
[37] plyr_1.8.8 grid_4.1.2 parallel_4.1.2
[40] listenv_0.9.0 ggrepel_0.9.3 butcher_0.3.2
[43] lattice_0.21-8 splines_4.1.2 gridtext_0.1.5
[46] hms_1.1.3 pillar_1.9.0 igraph_2.0.3.9044
[49] corpcor_1.6.10 future.apply_1.11.0 reshape2_1.4.4
[52] codetools_0.2-19 mixOmics_6.18.1 glue_1.6.2
[55] data.table_1.14.8 vctrs_0.6.5 tzdb_0.4.0
[58] foreach_1.5.2 Rttf2pt1_1.3.12 gtable_0.3.3
[61] future_1.33.0 gower_1.0.1 prodlim_2023.03.31
[64] RSpectra_0.16-1 class_7.3-22 survival_3.5-5
[67] timeDate_4022.108 rARPACK_0.11-0 iterators_1.0.14
[70] hardhat_1.4.0 ellipse_0.4.5 lava_1.7.2.1
[73] timechange_0.2.0 globals_0.16.2 ipred_0.9-14

I don't know what's going on but I'm pretty sure that kernlab does not do any parallel processing (internally or externally).

Can you make a smaller reprex with public data so that experiment?

Hi Max,

Thank you for your input. Yes, I've conducted the experiments using the publicly available Iris dataset. This ensures that the results are reproducible and eliminates data-specific issues. Despite not using explicit parallel processing commands, I observed high CPU usage that suggests parallel activity. This behavior might be related to how Ubuntu manages threading by default or some under-the-hood configurations in the kernlab package or the underlying system libraries. Any insights on how to manage or configure thread usage on Ubuntu to prevent this would be greatly appreciated.

all_raw_data <- iris

Sys.setenv(OMP_NUM_THREADS = 1) will have no effect since at the time this is run BLAS/LAPACK is already running and cannot be modified (except using the RhpcBLASctl package)

You have to set OMP_NUM_THREADS before starting R (or an R session in RStudio) to take an effect, ideally in a script in /etc/profile.d if you want to have this set system-wide but you also can set this before running R or Rscript on the command line.

Once you "tamed" the multithreaded BLAS/LAPCK, your code should run with cpu usage not exceeding 100% on one cpu core.

You also can find more information on Performance issues with Posit Workbench Local Launcher and Open Source RStudio Server for this topic.

You then can restart to tuning your code for doParallel/foreach.

At the end of the day you probably want to run your code as fast and efficient as possible with the hardware available. In order to achieve that you probably want to leverage all the 32 cores on your machines, each running 1 process or thread to ensure maximum performance (having multiple processes/threads fight for one core typically leads to reduced performance, especially if the processes are CPU intensive).

Under such a scenario you could play with using a combination of doParallel threads and BLAS/Lapack threads. For example you could configure doParallel to start 8 threads (cl <- makeCluster(8);registerDoParallel(cl)) and set OMP_NUM_THREADS=4. Such a setup would start 8 R threads in parallel which would compute throug all your indices i. Each R thread will launch 4 BLAS/LAPACK threads making it a total of 8 * 4 = 32 processes. Depending on if you rely more on the parallel computing provided by doParallel (increase number of doParallel threads while reducing OMP_NUM_THREADS) or rely more on multithreaded BLAS/LAPACK (increased OMP_NUM_THREADS while reducing doParallel threads) you may get very different overall performance. (always make sure that OMP_NUM_THREADS times doParallelthreads always evaluates to 32, i.e. 1 * 32 , 2 * 16, 4 * 8 , 8 * 4, 16 *2 or 32 * 1 )

My gut feeling is that if you have significantly more indices to work through than you have cpu cores (i.e. n_run >> 32) I don't expect that multithreaded BLAS/LAPACK is really contributing much to the the compute performance, keeping OMP_NUM_THREADS=1 and cl<-makeCluster(32) sounds most appropriate to me in this case.

Let me know if this is helpful - happy to discuss more.