Hi guys, I am current using premium plan for Rstudio Cloud. It provides 4 CPU resource. When I use detect core function in parallel package, it returns 16. For the parallel set up , should I set core equal to 4 or 16? Thanks in advance!
Always use the number cores provided, not the one detected, so in this case 4. You can use more than 4 cores but you will not likely see additional performance benefits.
Reason is that parallel::detectCores()
will detect all cores physically present on the server irrespective of any constraint defined. parallelly::availableCores()
does a much better job in detecting number of usable cores, especially when it comes to cgroups and HPC schedulers. In the case of docker/Kubernetes it however is as "useful" as parallell::detectCores()
.
Hi, author of parallelly here. I'd like to figure out how to make availableCores()
reflect the number of CPUs that you get in RStudio Cloud. That is, how can one infer the number of CPUs available from within an RStudio Cloud instance?
... In the case of docker/Kubernetes it however is as "useful" as
parallel::detectCores()
.
availableCores()
queries nproc
, and I would expect that to be reflected here too, if you're running Docker. For example,
$ docker run --cpuset-cpus=0 --rm -ti rocker/r-base nproc
1
$ docker run --cpuset-cpus=0,3 --rm -ti rocker/r-base nproc
2
$ docker run --cpuset-cpus=0,3,6,7 --rm -ti rocker/r-base nproc
4
but that's not the case in RStudio Cloud; there nproc
reflects whatever the host has set, e.g. I also see 16 on my free nCPU=1 account.
Digging deeper, contrary to above Docker examples, in RStudio Cloud, /sys/fs/cgroup/cpuset/cpuset.cpus
returns 0-15
suggesting all CPUs are available.
Is RStudio Cloud throttling with docker run --cpus=<n> ...
? If so, continuing, for my nCPU=1 free account, I get:
/cloud/project$ cat /sys/fs/cgroup/cpu/cpu.cfs_quota_us
100000
For someone with, say, nCPU = 4, will they get
/cloud/project$ cat /sys/fs/cgroup/cpu/cpu.cfs_quota_us
400000
? If that is the case, then I think
n=$(cat /sys/fs/cgroup/cpu/cpu.cfs_quota_us)
nCPU=$((n / 100000))
could be one way to infer the "amount" of CPUs available. Is that reasonable?
That sounds like the right approach - I definitely can confirm that with 4 CPU's allocated I get nCPU=4 using your method. Will update again once I have the confirmation on the cpu throttling.
I now have got confirmation that the CPU limitation on rstudio cloud is indeed a cgroup setting and hence the solution to use cpu.cfs_quota_us
is applicable.
This was one of the things that surprised me as well. After investigating a bit the issue comes down to kubernetes (what is underlyingly driving rstudio cloud sessions). At this point, kubernetes has removed pretty much all vestiges of docker itself, so things present in docker like docker run --cpus= or cpuset aren't necessarily available.
In speaking with some of the more kubernetes fluent engineers on a quick dive we did identify that it did not seem that the kubernetes api exposed such functionality, though we haven't completely dug through the edges to see if there is some knob somewhere to tune.
The approach you outlined above would indeed be a great step and is the native way kubernetes handles resource management. Hence, if you update the logic, it should not only work in rstudio cloud, but any kubernetes environment!
Thanks for confirming. I've implemented support for cgroups CPU quota and CPU affinity (was already supported via 'nproc') in parallelly (>= 1.30.0-9005). Install as:
remotes::install_github("HenrikBengtsson/parallelly", ref="develop")
With my nCPU = 1 free account, I get:
> parallelly::availableCores(which = "all")
system cgroups.cpuset cgroups.cpuquota nproc
16 16 1 16
and what we're really after:
> parallelly::availableCores()
[1] 1
Please try and see if it works for other settings with nCPUs.
Works like a charm !
> parallelly::availableCores(which = "all")
system cgroups.cpuset cgroups.cpuquota nproc
16 16 4 16
> parallelly::availableCores()
cgroups.cpuquota
4
Also works on another K8S based RStudio environment - it still works as expected on a SLURM based HPC cluster as well.
Thanks so much for the quick fix - and I really should have reached out to you in the first place.
Thanks for confirming. I'll try to submit this updated version to CRAN sooner rather than later.
This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.
If you have a query related to it or one of the replies, start a new topic and refer back with a link.