Hello,
I'm experiencing an issue where running a simple SVM model using the tidymodels
and kernlab
packages on Ubuntu causes unexpectedly high CPU usage, with a single CPU's usage exceeding 100%. This suggests that the process is utilizing multiple CPUs, leading to system lag. Below is the code snippet causing the issue:
library(tidyverse)
library(tidymodels)
library(extrafont)
library(discrim)
library(plsmod)
library(rules)
library(bonsai)
library(themis)
library(ggtext)
library(stacks)
library(colino)
all_raw_data <- irisrun_chi_models <- list()
run_tune_bayes_models_res <- list()n_run <- 1
i <- 1
set.seed(1234)
for (i in 1:n_run) {
prog <- txtProgressBar(min = 0, max = n_run, style = 3)
setTxtProgressBar(prog, i)
random_seed <- sample.int(10000, 1)
set.seed(random_seed)initial_splits <- initial_split(all_raw_data, prop = 0.7, strata = "Species", pool = 0.2)
training_set <- training(initial_splits)
testing_set <- testing(initial_splits)run_5k_10r_cv <- vfold_cv(training_set, v = 2, repeats = 2, strata = "Species", pool = 0.2)
run_recipe1 <- training_set %>%
recipe(Species ~ .,)svm_rbf_model <- svm_rbf(cost = tune(), rbf_sigma = tune()) %>%
set_mode("classification") %>%
set_engine("kernlab")run_chi_models[[i]] <- workflow_set(
preproc = list(
recipe1 = run_recipe1
),
models = list(
svm_rfb = svm_rbf_model
),
cross = TRUE)run_tune_bayes_models_res[[i]] <- run_chi_models[[i]] %>%
workflow_map("tune_bayes", resamples = run_5k_10r_cv, initial = 50 , iter = 200,
metrics = metric_set(accuracy, roc_auc, f_meas, recall, precision, kap), verbose = T, #accuracy, kap, sens, spec, ppv, npv, mcc, j_index, bal_accuracy, detection_prevalence, precision, recall, f_meas
control = control_bayes(save_workflow = FALSE, save_pred = FALSE)
)
}
I've tried limiting the number of cores via environment variables (Sys.setenv(OMP_NUM_THREADS = 1)) and doParallel
without any luck. The svm_rbf
function seems to automatically use all available CPU cores, which I do not understand. Is there any way to restrict kernlab
to use only one core, or is there a workaround to prevent such high CPU utilization?
Any insights or suggestions would be greatly appreciated!
Below is my sessionInfo.
R version 4.1.2 (2021-11-01)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 22.04.4 LTSMatrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.solocale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=Cattached base packages:
[1] stats graphics grDevices utils datasets methods baseother attached packages:
[1] colino_0.0.1 stacks_1.0.2 ggtext_0.1.2
[4] themis_1.0.2.9000 bonsai_0.3.1.9000 rules_1.0.2.9000
[7] plsmod_1.0.0.9000 discrim_1.0.1.9000 extrafont_0.19
[10] yardstick_1.3.1.9000 workflowsets_1.1.0.9000 workflows_1.1.4.9000
[13] tune_1.2.1.9000 rsample_1.2.1.9000 recipes_1.1.0.9000
[16] parsnip_1.2.1.9001 modeldata_1.4.0 infer_1.0.7.9000
[19] dials_1.3.0 scales_1.3.0 broom_1.0.6.9000
[22] tidymodels_1.2.0.9000 lubridate_1.9.2 forcats_1.0.0
[25] stringr_1.5.0 dplyr_1.1.4 purrr_1.0.2
[28] readr_2.1.4 tidyr_1.3.1 tibble_3.2.1
[31] ggplot2_3.5.1 tidyverse_2.0.0loaded via a namespace (and not attached):
[1] matrixStats_1.0.0 DiceDesign_1.9 RColorBrewer_1.1-3
[4] tools_4.1.2 backports_1.4.1 utf8_1.2.3
[7] R6_2.5.1 rpart_4.1.19 colorspace_2.1-0
[10] nnet_7.3-19 withr_2.5.0 tidyselect_1.2.0
[13] gridExtra_2.3 compiler_4.1.2 extrafontdb_1.0
[16] cli_3.6.3 xml2_1.3.5 digest_0.6.33
[19] pkgconfig_2.0.3 parallelly_1.36.0 lhs_1.1.6
[22] rlang_1.1.4 rstudioapi_0.16.0 generics_0.1.3
[25] BiocParallel_1.28.3 magrittr_2.0.3 ROSE_0.0-4
[28] Matrix_1.6-0 Rcpp_1.0.11 munsell_0.5.0
[31] fansi_1.0.4 GPfit_1.0-8 lifecycle_1.0.3
[34] furrr_0.3.1.9000 stringi_1.7.12 MASS_7.3-60
[37] plyr_1.8.8 grid_4.1.2 parallel_4.1.2
[40] listenv_0.9.0 ggrepel_0.9.3 butcher_0.3.2
[43] lattice_0.21-8 splines_4.1.2 gridtext_0.1.5
[46] hms_1.1.3 pillar_1.9.0 igraph_2.0.3.9044
[49] corpcor_1.6.10 future.apply_1.11.0 reshape2_1.4.4
[52] codetools_0.2-19 mixOmics_6.18.1 glue_1.6.2
[55] data.table_1.14.8 vctrs_0.6.5 tzdb_0.4.0
[58] foreach_1.5.2 Rttf2pt1_1.3.12 gtable_0.3.3
[61] future_1.33.0 gower_1.0.1 prodlim_2023.03.31
[64] RSpectra_0.16-1 class_7.3-22 survival_3.5-5
[67] timeDate_4022.108 rARPACK_0.11-0 iterators_1.0.14
[70] hardhat_1.4.0 ellipse_0.4.5 lava_1.7.2.1
[73] timechange_0.2.0 globals_0.16.2 ipred_0.9-14