Faster loading of R Markdown html document

netraam · November 9, 2023, 3:41pm

Dear Posit Community members,

I am creating big html documents using R Markdown. These documents contain 200 till 300 charts generated using ggplotly. The creation/knitting of these files works well. Though, these files can be 40-50 MB in size and are very slow to load in a web browser. It can take multiple minutes to open the html document in a web browser.

I tried speeding up the loading of these html documents by using the lazyrmd package, but this didn't work. Also, the lazyrmd package seems discontinued.

Are there alternative ways to speed up the loading of big html documents with many ggplotly charts that were generated using R Markdown?

Thank you!

cderv · November 16, 2023, 3:58pm

Are you creating a self-contained document ? If so, maybe you shouldn't so that files are loaded from disk and not embed in the HTML file which will be very big

Also you should consider maybe using a website to show your different graphics into several pages. This means that the plot will load only when a page is viewed.

You can also consider lazy loading of your image

Overall, this is a WEB problem, and not specific to R Markdown.

Maybe reducing the size of your image can be useful - if you plan to have only web viewer and not print version, often DPI can be reduced. Also you could use servive like tinypng and tools like optipng to compress your files with or without much loss in quality.

From R see

xfun::optipng()
xfun::tinify()
webshot2::shrink()

If you work only with interactive graphics (like plotly), then you should really consider splitting the webpage in several pages - so that each plots load when needed. Maybe iframe can also be lazy loaded, and you could have each plot in a file, and included using an iframe with lazy load.

Hope this helps give hint on what you could do

netraam · November 17, 2023, 7:39am

Thank you for the suggestions!

Yes, I am creating one self-contained document. The reason for this is that having one self-contained document allows to be sent by e-mail or be saved on a network disk with a guarantee that it can still be opened by anybody on any network after multiple years (even if the Posit Connect server won't be available to that person (anymore)). Therefore, creating a web site with multiple pages in this case wouldn't be an optimal solution.

You can also consider lazy loading of your image

Do you know how I could implement such lazy loading of my image in Markdown/Quarto?

statquant · December 27, 2023, 6:45am

My 2 cents is that this is much of a problem for users. We use plotly at work and we stopped using Rmarkdown/quarto rendering because the html are getting just too big and load too slowly.

Whether or not this is a web issue as opposed to Rmd/Qmd I think it should be addressed in the documentation, maybe as a best practice session.
For what it worth I have friends that went through the same journey and dropped Rmd html rendering as well.

You seem to suggest we could be lazy-loading graphs, I've haven't seen anything like this but GitHub - hafen/lazyrmd: Render R Markdown outputs lazily that does not seem to be working.

Thank you for your suggestions !

cderv · December 28, 2023, 11:14am

By creating one big HTML file, you are causing this slow loading because your browser needs to read and load a crazy amount of data (200 / 300 dynamic chart on single page is quite numerous)

Possibly not using self_contained would allow to reduce the HTML size, but you would need to send the resource directory with the content. So for email not great, but for disk storage that is still ok.

You are creating yourself some friction because you don't want to host somewhere. Though, a shared disk is hosting - you could even open the file, or run locall web server (servr::httpd() for example) to show the website.

There is no magic solution when you are creating heavy content like that, splitting web pages is often a good solution (nobody will look at 200 / 300 plots anyway).

Also you are using static content here - dynamic content helps solves this issue too (shiny app for example) because they separate the view

guarantee that it can still be opened by anybody on any network after multiple years (even if the Posit Connect server won't be available to that person (anymore))

I don't understand that part. No matter where you host (a shared disk drive, a server, a service, ...) this will be garantee to work as long as the storage that host the content is available. This does not seem to be related to self contained document, or producing a single doc over a website.

cderv · December 28, 2023, 11:25am

@statquant Thanks a lot for the feedback. I am sorry you needed to stop using R Markdown or Quarto.

Are you using self contained document too ?

Asking because html are getting too big and load too slowly are usually because of embedding. HTML are made to be several files by default to help with this issue so that the browser can load stuff correctly. By embebbing things, the browser also need to decode what has been encoded, with complex and heavy content like plotly, this takes time.

if that is the case, why deploying the HTML documents is not an option ? Why is embed-resources: true the preferred way ?

Zipping a folder with all the resources is to me as easy as creating a single file. The advantage will be that the HTML will be lighter to load in session. This allows to create websites also to split the load in several visualisation pages. Nowadays, it also opens the doors to shiny application using shinylive for shiny in browser which allow to simplify visualisations and show only some part of the required content on screen.

Anyhow, documents with a lot of interactive content in the same page is heavy by definition all the more when everything is embedded. Lazy loading won't change that, at it won't work with encoded embedded content.

Usually, the best practice is deployment of HTML report and documents, including organized as websites. Web application are also a good option for visualisation website (with lots of interactive graph), especially now that shiny in browser is available.

Happy to continue the discussion to understand what are the pain points, and how we can help, including with documentation examples and best practices.

For Quarto, feel free to jump in Quarto Discussion (quarto-dev/quarto-cli · Discussions · GitHub) so that we track this with the dev team and others quarto users.

Thank you

statquant · January 1, 2024, 6:14pm

Hello @cderv,
Happy new year.
Yes we were using embedding by way of self_contained: true
We did this because we usually attach html pages in confluence (as convinience for access), hence they needed self_contained
At this point I should say that I employ large files and slow to load bijectively but I actually don't care about size. The only variabled of interests were (and are)

fast loading pages for viewers
seemless workflow for researchers (ie. one render command)
Although I don't know how to deploy a website, you seem to say it's as easy as using servr:httpd, we use servr all the time so I can try.
I still think lazy loading should be the default and it should make a huge difference I don't understand why you say it would not.
We typically show plots in tabs (using a js tabs library) I don't see why we should pay all this loading cost upfront.

Many thanks for your time

netraam · January 3, 2024, 11:35pm

Thank you statquant for your comments. We have exactly the same situation as you described.

It seems that lazyrmd does not work in the self-contained html scenario, because of the following statement on GitHub - hafen/lazyrmd: Render R Markdown outputs lazily:

Note that this approach is not possible with standalone html output - we must store the plots separately. Otherwise, we would defeat the purpose of why this package was created.

cderv · January 15, 2024, 5:00pm

servr::httpd() is a way to run a local R server in a R session to view a website or any content through a web server (localhost and port)

It would not change the size of the self-contained file IMO. That is what I meant. And not sure lazy loading works with embedded content.

it is not as easy as this is - at least to me. Tab implementation could be only a way to organize the DOM, but it does not change the fact that the content is loaded at once when the page loads, and then the tabs layout is created. There could be JS Library handling this though.

Anyhow, great customization idea. Feature request and contributions are welcomed !

system · February 29, 2024, 5:01pm

This topic was automatically closed 45 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.