Unclear about where to put resources (e.g. datasets, csv) for blog posts

zoowalk · January 4, 2021, 9:54pm

Hello,

I am moving from blogdown to distill and am quite enthusiastic about the new package version. One thing I am not entirely clear though is how to deal best with those (larger) files/datasets which e.g. are analyzed in a blog post but don't have to appear on the published site (only interested in the results).

So far I have placed them in a separate folder (blog_data) in the root directory. Posts located in the folder _posts refer to the files contained in this directory. First question - is the folder 'static' originally intended to serve this purpose?

When rendering the blog site, the entire (!) folder blog_data is copied - as indicated in the documentation - into the _site folder. Hence, pretty large data are now also there.

Subsequently, I commit the entire project folder to github and then deploy to netlify. The consequence though was - after a few updates to the blog - my free bandwith quota at netlify was already used up. And this was mainly due to the large data sets contained in the _site folder. The blog posts themselves are by no means excessive in size.

Another approach seems to be to locate the required datasets etc in the folder /_posts/blog1/blog1_files/. And indicating in the YAML of the blog post file to exclude the dataset (.e.g. .csv) in question from being copied to the _site folder. However, in generally, I preferred to have all datasets in one, separate folder since different blog posts can refer to the same data set.

Is there any recommended way how to approach this issue best? To be clear, the datasets do not have to be available on the blog's site. Many thanks.

Finale question - my search function on the blog is pretty slow/freezes briefly. Can this be due to having the larger datasets in the _site folder? Many thanks again.

A test/draft of the site is here
https://test-drive-werk-statt.netlify.app/

UPDATE: Since the question is already closed and I can't add an answer anymore: I simply renamed the folder containing the data files to a name starting with an underscore ("_"). As a consequence, it is not copied into the _site folder when rendering/building the blog. Maybe that's helpful to some.

technocrat · January 4, 2021, 10:50pm

An NaCl grain offering.

If I were trying this (haven't gotten far in Distill and don't have many large objects), I'd put an .Rds in a github gist and link directly to its raw, so that rendering doesn't have to redo the whole folder. If a github object size gets stretched, there's always AMZ S3. I've done this in Hugo with csv files and like it. Of course, wouldn't want to do this with objects that frequently changes.

system · January 25, 2021, 10:50pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.