Offering a custom R-markdown service inside another project

pchtsp · February 29, 2024, 4:58pm

Hello community,

I have a bit of an architectural question on rmarkdown, R and quarto. More like several questions. Sorry in advance for the length of the entry.

Context

I work on an open source project (cornflow) that handles the asynchronous execution of optimisation tasks, among other things. It stores a problem's input data, results, etc. using formal (but somehow abstract) definitions modelled as classes: Instance, Solution, etc. These are personalized for each optimization problem we build.

Each optimization problem we handle can be very different (e.g., vehicle routing problems, a sudoku, graph-coloring, task-scheduling, maximum flow in a network, etc.). But by structuring the input data and the results (via a jsonschema), we build pluggable functionality that shares a common interface. Some examples of functionalities: solution methods (i.e., engines / solvers that generate a solution), case storage and comparison, checks and validations, solving scheduling and queuing, a REST API, unit tests, user permissions, etc. I want to tackle a so-far elusive functionality: user interface.

What I want

I want to have a catalogue of automated templates that take as input a "solved instance" (e.g., in json format) and produce a standard & pretty report ready to be consumed by a user. Each problem can have more than one report. And I want a user to ideally be able to ask for the report via the REST API that we already have.

More details

I'm a big fan of parametrized rmarkdown and I've used it in the past to successfully show/ communicate/ share results of complex optimization problems with colleagues. I want to bring pluggable automated reports that understand the input data and solution structure of a given problem and generates a pretty and powerful self-contained document (html, pdf, etc.). I imagine a rest api endpoint where the client asks "please, generate the report of solved case with id=1543" and the rest api returns the compiled document somehow.

Everything we have server-side is currently built in python.

Example implementation

Taken from this tree. tsp is a problem, vrp is another problem.

I've added a vrp/reports directory below, where I envisioned the Rmarkdown templates will be. These reports assume we have a data structure compliant with the schemas/input.json and schemas/output.json. Or a vrp.core.experiment.Experiment python object if it's done with python. Both are equivalent.

Likewise, we would have a tsp/reports directory somewhere inside tsp with the reports for the tsp problem and compliant with its schemas (tsp/schemas/input.json, ...).

├── tsp
│   ├── (...)
└── vrp
    ├── core
    │   ├── experiment.py
    │   ├── instance.py
    │   └── solution.py
    ├── reports
    │   ├── report1.Rmd
    │   └── report2.Rmd
    ├── data
    │   ├── input_test_1.json
    │   ├── input_test_1_small.json
    │   ├── input_test_2.json
    │   └── output_test_1.json
    ├── README.rst
    ├── schemas
    │   ├── input.json
    │   └── output.json
    └── solvers
        ├── modelClosestNeighbor.py
        ├── modelMIP.py
        ├── model_ortools.py
        └── model.py

Some questions

Should we aim at using rmarkdown, knowing that it would add an R dependency on the server-side? Should we go with Quarto + python? We can always run an R function from python (as command line or reticulate)
In case we go with R, is it possible to replicate the well-structured codebase we have in python in R to help in the production of the rmarkdown files? In python we have modules, classes, type hints, etc (see the vrp/core/experiment.py above). In R I've always ended up creating several scripts, each one with several stateless functions. It works but it always felt a bit dirty.
If we go for Quarto+python, is the functionality available in python as good as with R? I'm in love with ggplot, leaflet, knitr, tidyverse, magrittr. And I'm not at all convinced of using pandas, matplotlib, etc. Maybe plotly?
Is it better to offer a report "on-demand" via our REST API? Or is it better to generate the document and store it in the server and let the user download it? Some documents can be really fast to compile, some others may not.
How far can we go with the html automatic report? How close can we get to having a static webpage with links and a menu? I've used a collapsible TOC in html that already helps a lot in navigation. I've checked the bookdown package and seems promising with the single html option, are there examples of extremely rich 1-file reports to be sent via email/ chat?

Thanks!

Franco

system · April 14, 2024, 4:59pm

This topic was automatically closed 45 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.