foreach never ends in Linux Fedora 39

Hi,

The code bellow executes in parallel the function "mhrcmx_dep_rep_int" (written in c++) of my R package "mhrcmxIntDepL":

### paths
path = paste0(fs::path_home(),'/rcpedroso/CAI/'); path
path_rds = paste0(path,'rds/'); path_rds

### packages
require(Rcpp)
require(mhrcmxIntDepL)
require(doParallel)

### data
D = 2
J = 40
y_list_mat = readRDS(file=paste0(path,'y_list_CAI',D,'.rds'))

### parallel settings
ncores <- detectCores()-2; ncores
# create the cluster
# cl <- parallel::makeCluster(ncores, type="PSOCK")
cl <- parallel::makeForkCluster(nnodes = ncores)
print(cl)
doParallel::registerDoParallel(cl)
foreach::getDoParRegistered()
foreach::getDoParWorkers()
showConnections()

### mcmc settings
ns = 10
burn = 10
thin = 10

### grid of hyperparameters
pvec = c(5,10,15)
Avec = c(1,10,100)
pars = expand.grid(A=Avec, p=pvec)

### run and save
obj <- foreach(r = 1:nrow(pars), .packages = "mhrcmxIntDepL") %dopar% { # begin foreach

  ### run
  p = pars$p[r]
  A = pars$A[r]
  mod_dep = mhrcmx_dep_rep_int(Y=y_list_mat, D=D, p=p, n_theta=100,
                               A=A, ns=ns, thin=thin, burn=burn, H0=0, J=J, alpha_dir=1/J)

  ### save rds
  rds_name = paste0('CAI',D,'_J',J,'_p',p,'_A',A, '_',mod_dep$model,'.rds')
  saveRDS(mod_dep, file=paste0(path_rds,rds_name))
}

### close connections
parallel::stopCluster(cl)

The code works perfectly for all the values of "p" and "A" in the following Windows machine:
Windows 11 Pro
11th Gen Intel(R) Core(TM) i7-11700KF @ 3.60GHz 3.60 GHz
16,0 GB

But It does not work perfectly for all the values of "p" and "A" in the following Linux machine:
Fedora Linux 39
12th Gen Intel® Core™ i7-12700K × 20
64,0 GiB

In particular, in the Linux machine it works only for small values of the parameter "p". For example, it works perfectly when p=5 but the foreach loop never ends and no result is produced when p=10 or p=15. It may be relevant to say that the parameter "p" is one of the dimensions of many different objects (matrices and cubes) inside the function, which implies that the memory usage increases when "p" increases. But, again, the code is working perfectly in the Windows machine for all the values of "p" and "A".

In both Windows and Linux machines the code works perfectly without parallel, that is, with the usual for

for (r in 1:nrow(pars)) { ... }

Does anyone have any idea about what might be the problem here?

Thanks a lot!