Use of torch with gpu - Reticulate

paulofelipe · October 20, 2018, 3:10pm

Hi, everyone!

I was trying pytorch with gpu in R. The problem is: first, I tried direct in python and the follow code works:

import torch
dtype = torch.float
#device = torch.device("cpu")
device = torch.device("cuda:0") # Uncomment this to run on GPU
torch.randn(4, 4, device=device, dtype=dtype)

However, I got problems to run the same code in R with reticulate:

But, I got something more interesting. When a I did a reprex with this code, It simply worked:

library(reticulate)

torch <- import('torch')
np <- import('numpy')

device = torch$device("cuda:0") 

torch$randn(c(4L), c(4L), device = device, dtype = torch$float)
#> tensor([[-0.5082, -0.0566,  0.3794, -0.6489],
#>         [ 1.4504,  0.4627,  1.5426, -0.0883],
#>         [ 0.4226, -0.7120, -0.7997,  0.1578],
#>         [ 0.1594,  1.5402,  0.9680, -1.0612]], device='cuda:0')

torch$randn(c(4L), c(4L), device = device, dtype = torch$float)
#> tensor([[-0.2950,  1.3037,  0.6723, -1.9531],
#>         [-0.7189,  0.4939, -0.0259, -0.0818],
#>         [ 0.8678, -0.3601, -0.3294, -1.7991],
#>         [ 0.8432,  0.1208,  0.7282,  1.3152]], device='cuda:0')

^{Created on 2018-10-20 by the reprex package (v0.2.1)}

Any help for this? Thanks!

I'm using RStudio 1.2.97 and:

sessionInfo()
#> R version 3.5.1 (2018-07-02)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows 10 x64 (build 17134)
#> 
#> Matrix products: default
#> 
#> locale:
#> [1] LC_COLLATE=Portuguese_Brazil.1252  LC_CTYPE=Portuguese_Brazil.1252   
#> [3] LC_MONETARY=Portuguese_Brazil.1252 LC_NUMERIC=C                      
#> [5] LC_TIME=Portuguese_Brazil.1252    
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> loaded via a namespace (and not attached):
#>  [1] compiler_3.5.1  backports_1.1.2 magrittr_1.5    rprojroot_1.3-2
#>  [5] tools_3.5.1     htmltools_0.3.6 yaml_2.2.0      Rcpp_0.12.18   
#>  [9] stringi_1.2.4   rmarkdown_1.10  knitr_1.20      stringr_1.3.1  
#> [13] digest_0.6.16   evaluate_0.11

^{Created on 2018-10-20 by the reprex package (v0.2.1)}

Andrea · October 20, 2018, 8:29pm

Weird! If no one else takes a stab at this, maybe remind me next week, and I'll try (next week is super-overloaded for me)! Just post a reply here in one week, and I'll get a notification, so I don't forget. I'm not sure I'll be able to reproduce your issue though - in my experience GPU + CuDA issues are quite harder to reproduce than usual coding issues, as they strongly depend on the combination of environment + library versions. As a matter of fact, people often rely not just on a reproducible example to debug them, but on a reproducible environment - i.e., they use Docker to make sure the enviroment is exactly the same.

On that note, could you let us know which version of pytorch & CuDA you're using? Also, on which GPU are you running? I don't have access to Windows or to gaming GPUs (GeForce), only Linux and datacenter GPUs (Tesla).

PS surely you know this already, but I guess it would be much easier for you to just use the tensorflow and keras R packages, rather than having to access pytorch through reticulate.

paulofelipe · October 21, 2018, 1:09pm

Thank you for your answer, Andrea!

I'm using pytorch 0.4.1 and cuda 9.0. My GPU is a GeForce 940MX (very "powerful" I know hehehe).

The interesting part is the code only returns error in a Rscript. I tried also in a rmarkdown with reticulate and everything worked.

As you said, I'm aware of tensorflow. I was testing the speed to invert a matrix between pytorch and tensorflow. This is the reason that I'm trying pythorch in R.

Thanks again!

Andrea · October 21, 2018, 4:25pm

Do you have a HP Omen laptop? Windows 10, 940 MX...anyway, I may have an older laptop like this at home, or I could find someone willing to let me access hers...Ping me in a week and I’ll see what I can do. This is definitely an environment issue - the fact that the code runs in the clean, new environments that both knit and reprex create when they run code, while it doesn’t run in your global environment, is definitely proof of that. Did you tamper with your global environment in some way? Any funny stuff in .Rprofile or .Renviron? Did you try running your code immediately after restarting R?

It would be great if some user who is both an environment expert and a GPU user could chime in, but I guess that’s a pretty small subset.

paulofelipe · October 21, 2018, 5:59pm

Ok, Andrea! Thanks for your help. And don't worry. I will investigate more here and try to find a explanation for this. I didn't change my .Rprofile or .Renviron.

My laptop (Dell Inspiron 7460) is not that old (in Brazil at least ). It is less than 2 years old. But I complete agree with you it is probably an environment issue.

Thank you! Again, I will do my job and try find an explanation for this.

Andrea · October 22, 2018, 5:01pm

No worries! I'd like to be more of help, but this week I'm pretty stuck. However, I think we're getting closer:

lib & R versions: R version 3.5.1, Windows 10 x64, pytorch 0.4.1 and cuda 9.0
the code works in a "clean" enviroment (knit or reprex)

Can you also check that the code throws the same error, if you run it first thing after a clean restart? The Windows hotkey for restarting R from RStudio should be CTRL+SHIFT+F10

paulofelipe · October 22, 2018, 7:06pm

Hi, Andrea. The first figure of the first post shows the results after a clean start.

I executed torch$randn() twice because the first error message is different from the subsequent errors messages that I get when a run the function again.

I know this is a very specific problem. So, don't worry!

Thanks!

paulofelipe · October 23, 2018, 2:13am

If I try directly from the terminal, I get no error. It seems the problem is related to RStudio.

I will create a virtual machine with Ubuntu and RStudio Server and check If a get any error.

zkajdan · October 23, 2018, 1:30pm

I don't have an idea, but I executed the code successfully in RStudio 1.2.877 (with freshly installed pytorch-0.4.1) ...

paulofelipe · October 23, 2018, 2:24pm

Hi, @zkajdan.. In your test you was able to move tensors between cpu and gpu?

I will try update RStudio and reticulate tonight and see what I get.

Thanks!

zkajdan · October 23, 2018, 2:57pm

yes, this works fine for me in RStudio:

library(reticulate)

torch <- import('torch')
np <- import('numpy')

device = torch$device("cuda:0") 

torch$randn(c(4L), c(4L), device = device, dtype = torch$float)
x <- torch$randn(c(4L), c(4L), device = device, dtype = torch$float)
x$cpu()
x$cuda()