rsconnect deploy Plumber API with reticulate

nwheatley · December 22, 2020, 9:22pm

I am trying to deploy a Plumber API that calls python functions to R-Connect via the rsconnect::deployApp() function in RStudio Pro 1.2.5.
I get errors that indicate python files are not found during deployment.

reticulate 1.18
python 3.7.7
R 4.0.2
plumber 1.0.0

Rstudio project structure
project dir
~/test_reticulate
~/test_reticulate/API/plumber.R
~/test_reticulate/python/two_dataframes.py

**My code to deploy to r-connect: **

rsconnect::deployApp(
  appDir = "~/test_reticulate",
  appPrimaryDoc = "PlumberAPI/plumber.R",
  python = '/path/to/env/python',
  account = "me",
  server ="connect-<deletedtext>.com",
  appName = "test_reticulate_1",
  appTitle = "test_reticulate_1",
  contentCategory = "api",
  launch.browser = function(url) {
    message("Deployment completed: ", url)
  },
  logLevel = "verbose"
)

PlumberAPI/plumber.R

library('plumber')
library('reticulate')

#reticulate by default creates virtualenv in home directory, not project directory. seems to work.
reticulate::use_virtualenv(virtualenv = "~/.virtualenvs/myvenv")

# Tried several different ways to import python. Works in Rstudio, seems to fail on deployApp().
import_from_path('two_dataframes', path="./python")
reticulate::source_python('python/two_dataframes.py')

#* @get /first_df
function(){
  list_two_dfs <- get_list_dfs()
  #return first dataframe
  list_two_dfs[[1]]
}

/python/two_dataframes.py

import pandas_dataframe as pd_df
get_random_df = pd_df.get_random_df

def get_list_dfs():
  df_list = [get_random_df(), get_random_df()]
  return df_list

In the deploy logs,

[Connect] Completed packrat build against R version: '4.0.2'
[Connect] Bundle requested Python version 3.7.7; using /opt/python/3.7.7/bin/python3.7 which has version 3.7.7
[Connect] 2020/12/22 20:50:23.748944548 Running on host: kteusorprdcn
[Connect] 2020/12/22 20:50:23.748960471 Environment will be built with Python "3.7.7 (default, May  7 2020, 21:25:33)  [GCC 7.3.0]" at /opt/python/3.7.7/bin/python3.7
[Connect] 2020/12/22 20:50:23.749202730 Running as user: rstudio-connect
[Connect] 2020/12/22 20:50:23.795201586 Using cached environment: 560zcqgpAFJQS9whHqK1vg
GET /__api__/tasks/ksUEWl4XDqSbC8hj?first_status=139 9ms
GET /__api__/tasks/ksUEWl4XDqSbC8hj?first_status=139 10ms
[Connect] 2020/12/22 20:50:25.176787877 Packages in the environment: appdirs==1.4.4, certifi==2020.4.5.1, cffi==1.14.0, chardet==3.0.4, colorful==0.5.4, conda==4.8.3, conda-package-handling==1.7.0, cryptography==2.9.2, distlib==0.3.1, filelock==3.0.12, idna==2.9, importlib-metadata==3.3.0, joblib==1.0.0, numpy==1.19.4, pandas==1.1.5, prettyprinter==0.18.0, pycosat==0.6.3, pycparser==2.20, Pygments==2.7.3, pyOpenSSL==19.1.0, PySocks==1.7.1, python-dateutil==2.8.1, pytz==2020.4, requests==2.23.0, ruamel-yaml==0.15.87, scikit-learn==0.24.0, scipy==1.5.4, six==1.15.0, threadpoolctl==2.1.0, tqdm==4.46.0, typing-extensions==3.7.4.3, urllib3==1.25.8, virtualenv==20.2.2, zipp==3.4.0, 
[Connect] 2020/12/22 20:50:25.180609687 Creating lockfile: python/requirements.txt.lock
GET /__api__/tasks/ksUEWl4XDqSbC8hj?first_status=141 10ms
[Connect] Completed Python build against Python version: '3.7.7'
[Connect] Launching Shiny application...
GET /__api__/applications/118/config 14ms
Deployment completed: https://connect-<deletedtext>.com/connect/#/apps/118
----- Deployment log finished at  2020-12-22 20:50:27  -----
Warning messages:
1: invalid uid value replaced by that for user 'nobody' 
2: invalid gid value replaced by that for user 'nobody'

When I go to https://connect-.com/connect/#/apps/118, there is always some sort of error where python files are not found, or do not exist.

2020/12/22 20:50:34.972173750 Using Packrat dir /opt/rstudio-connect/mnt/app/packrat/lib/x86_64-pc-linux-gnu/4.0.2
2020/12/22 20:50:38.729973447 Error in value[[3L]](cond) : 
2020/12/22 20:50:38.729983385   Unable to open file 'python/two_dataframes.py' (does it exist?)
2020/12/22 20:50:38.730020440 Calls: local ... tryCatch -> tryCatchList -> tryCatchOne -> <Anonymous>
2020/12/22 20:50:38.730026329 Execution halted

I think I need to add either appFiles or parameters to the deployApp() function, but the docs don't say what structure the params should be (a string? a list? a vector?)

https://rdrr.io/cran/rsconnect/man/deployApp.html

rsconnect::deployApp(
 appFiles = c("PlumberAPI/plumber.R", "python/two_dataframes.py", "python/pandas_dataframe.py"), 
  appDir = "~/test_reticulate",
  appPrimaryDoc = "PlumberAPI/plumber.R",
  python = '/path/to/env/python',
  account = "me",
  server = "connect-<deletedtext>.com",
  appName = "test_reticulate_1",
  appTitle = "test_reticulate_1",
  contentCategory = "api",
  launch.browser = function(url) {
    message("Deployment completed: ", url)
  },
  logLevel = "verbose"
)

Any suggestions would be greatly appreciated. Reticulate and plumber both work well when I Run_API in local RStudio Server Pro.

meztez · December 22, 2020, 10:35pm

I would try using a path relative to the plumber API location in plumber.R file.

library('plumber')
library('reticulate')

#reticulate by default creates virtualenv in home directory, not project directory. seems to work.
reticulate::use_virtualenv(virtualenv = "~/.virtualenvs/myvenv")

# Tried several different ways to import python. Works in Rstudio, seems to fail on deployApp().
import_from_path('two_dataframes', path="./python")
reticulate::source_python('../python/two_dataframes.py')

#* @get /first_df
function(){
  list_two_dfs <- get_list_dfs()
  #return first dataframe
  list_two_dfs[[1]]
}

My assumption is that your API is ran from its location as its what plumber usually does.

nwheatley · December 23, 2020, 12:00am

Thank you mezter

I tried the following:

reticulate::import_from_path('two_dataframes', path="../python")
and
reticulate::source_python('../python/two_dataframes.py')
and even
reticulate::use_virtualenv(virtualenv = "../../.virtualenvs/myvenv") (which was invalid)

but I wasn't able to properly import python files.

Your idea, however, has made me realize that I should try importing the modules and my environment perhaps from different directories that the one that works on RStudio. I have a few more variations I can try!

meztez · December 23, 2020, 1:10am

you can use a dummy plumber router with an endpoint to return the results of dir("..", recursive =TRUE). And another for getwd().

It should give you an idea of what you are working with.

Blair09M · December 23, 2020, 5:05pm

There are a couple of suggestions I have here:

When deploying applications to RStudio Connect, best practice is to have all necessary files in the same directory as the application (in this case, the Plumber API). This would mean moving the python files so that you have a directory structure similar to the following
```
test_reticulate
└── PlumberAPI
    ├── plumber.R
    └── python
        └── two_dataframes.py
```
Then, you would simply refer to the Python script from within your Plumber API via reticulate::source_python("python/two_dataframes.py") since the working directory for a Plumber process should be the directory containing the API itself. You could use a command similar to the following to deploy the API:
```
rsconnect::deployAPI(
  api = "~/test_reticulate/PlumberAPI"
  appFiles = c("plumber.R", "python/two_dataframes.py"),
  ...
)
```
Best practice for using reticulated Python with RStudio Connect is to use the RETICULATE_PYTHON environment variable to specify the Python environment / version that's being used. For more information see this support article and this vignette for reticulate.

I hope that helps and provides some clarity

nwheatley · December 23, 2020, 8:32pm

Hello Blair09M,

Thank you for your suggestions. I restructured my file system as you suggested. I found that I can run python files, and even import python modules between python files, only if those python files are in the same directory as plumber.R.
Tested with Python 3.7.7, R 3.6.3 and R 4.0.2. To minimize complexity, there are no python dependencies and no python virtual environments.

plumber_reticulate_test
-- deploy.R
-- plumber.R
-- say_hello.py
-- say_yay.py
- python_folder
  -- _ init _.py
  -- say_goodbye.py

Plumber API calls say_hello.py, which imports and invokes say_yay.py successfully. However, the moment I uncomment import python_folder.say_goodbye in say_hello.py (not even invoking say_goodbye()) I get an error saying that No module named 'python.say_goodbye'.

2020/12/23 20:06:05.135543695 Using Packrat dir /opt/rstudio-connect/mnt/app/packrat/lib/x86_64-pc-linux-gnu/3.6.3
2020/12/23 20:06:08.034322291 Error in value[[3L]](cond) : 
2020/12/23 20:06:08.034334204   Error on line #5: 'reticulate::use_python("/opt/python/3.7.7/bin/python3", required = TRUE)' - Error in py_run_file_impl(file, local, convert): ModuleNotFoundError: No module named 'python.say_goodbye'
2020/12/23 20:06:08.034369172 
2020/12/23 20:06:08.034374198 Detailed traceback: 
2020/12/23 20:06:08.034381357   File "<string>", line 1, in <module>
2020/12/23 20:06:08.034382496 
2020/12/23 20:06:08.034389452 Calls: local ... tryCatch -> tryCatchList -> tryCatchOne -> <Anonymous>
2020/12/23 20:06:08.034390615 Execution halted

plumber.R

library(reticulate)
library(plumber)

reticulate::use_python("/opt/python/3.7.7/bin/python3", required = TRUE)
reticulate::source_python("say_hello.py")

#* @param name Your name
#* @get /say_hello
function(name="Nicole"){
  say_hello(name)
}

say_hello.py

# import python.say_goodbye as sg
import say_yay as sy

def say_hello(name):
  return f"Hello {name} {sy.say_yay()}"

say_yay.py

def say_yay():
  return "Yaayy!"

say_goodbye.py

def say_goodbye(name):
  return f"Goodbye {name}..."

deploy.R

library(rsconnect)
rsconnect::deployApp(
  appDir = "~/plumber_reticulate_test",
  appPrimaryDoc = "./plumber.R",
  appFiles= c('plumber.R','say_hello.py','say_yay.py', 'python/say_goodbye.py'),  #appFiles param doesn't seem to change much
  account = "me",
  server = "connect-<deletedtext>.com",
  appName = "test_plum1",
  appTitle = "test_plum2",
  contentCategory = "api",
  launch.browser = function(url) {
    message("Deployment completed: ", url)
  },
  logLevel = "verbose",
  appId = 123
)

nwheatley · December 23, 2020, 8:41pm

Hello meztez,

Appologies I am very new to plumber and haven't used R in a few years. (Hence using reticulate). I don't understand the test of using a dummy plumber with dir("..", recursive=TRUE).

However, I simplified my test case. I found that I can run python files, and even import python modules between python files, only if those python files are in the same directory as plumber.R

The longer detailed description of the issue is presented in my response to Blair09M in this thread.

Thanks again

meztez · December 23, 2020, 9:18pm

@nwheatley

This plumber endpoint will list the content of the folder it is executed from plus any subdirectory.

somewhere in plumber.R

#* @get /directorylisting
#* @serializer tsv
function(max = 10L) {
  max <- as.integer(max)
  fname <- dir(recursive = TRUE, include.dirs = TRUE, all.files = TRUE)
  if (length(fname) > max) {
    fname <- fname[seq_len(max)]
  }
  ret <- file.info(fname)
  ret <- cbind("fname" = row.names(ret), ret)
}

You should be able to see where your files end up on the server.

nwheatley · January 15, 2021, 8:09pm

Hello, I figured out my error.

My issue lay in the fact that my "python_folder" subfolder was actually named 'python'. I learned you should not name your folders 'python'!

I changed the folder to "py_scripts" and it all worked.

Thank you for your help! Too bad I wrote 'python_folder' (to be clear to the readers here that it is a folder) instead of the actual name, otherwise perhaps someone would have been able to catch it!

-Nicole

meztez · January 15, 2021, 8:59pm

Well, I might be dump but I don't see how naming your folder python could have an impact on importing code. That is weird and might have to be investigated.

nwheatley · January 15, 2021, 9:16pm

Yes, it is weird. However, as I tackle getting a virtual environment to work - guess what? I find documentation instructs us to call the venv 'python' and put it in the project directory. So maybe, Rconnect is expecting a 'python' folder to be a virtual environment?

OR - maybe naming the venv 'python' is the reason my venv isn't working?!

-Nicole

system · January 22, 2021, 9:16pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.