Create blogdown posts in .Rmd format without share dataset

gianluca · January 6, 2022, 7:01pm

Probably the question I'm posting is quite trivial/obvious, but as I'm new to creating a website and sharing material online I thought it was the right thing to expose my doubts here.

First of all, I thank the developers of the blogdown package for their great job and for making it possible to easily create a website using R and and organize multiple Rmarkdown files in one place.

I know that with blogdown it's possible to create posts in Rmarkdown format (.Rmd or .Rmarkdown) and I would like to use this feature to share some of my data analysis projects in R, showing the outputs generated by the code (summary, tables, graphs, model estimates etc.) based on the dataset used for the analysis.

The only problem is that for some of these projects I have used datasets that I cannot share online for privacy reasons or that have been obtained by accessing databases with username and password.

Therefore, as long as I create a post in .Rmd or .Rmarkdown format on my pc remotely, I can import my data by writing the path of the directories in which they are saved (or my credentials) and use them to reproduce the code, thus avoiding copy the datasets into the content/project/name-project folders. But I don't know what happens when I have to deploy my site online. Hence my doubts arise:

If I deploy my site online and do not copy my datasets into the project folders, the .Rmd files will not have a reference with this data and therefore the R code used for the analysis will not be able to generate the outputs (?);
Suppose my site is hosted on a public GitHub repository, in this case anyone can see the contents within content/project/name-project/index.Rmd and therefore also the username and password I have used to access the data.

Would it be more appropriate that in these cases I create posts simply by copying and pasting the outputs without actually executing any code? Or is there any workaround I could take?

Thanks in advance, I feel like I'm missing something.

andresrcs · January 6, 2022, 7:44pm

The project's content and the rendered output are two different things, you only need to public the html rendered output, no need to make your datasets publicly available if you don't want to, it all depends on how you are deploying your site, if you are pushing changes to a git repository you could restrict what gets into version control using your .gitignore file.

gianluca · January 9, 2022, 8:11pm

Thanks for your answer and I apologize if I write so late.
I'm not sure I understand everything correctly because, as I said, I have not yet deployed the site on any hosting service (GitHub and Netlify are the ones I know). I want to do this once I finished putting my projects in order and publishing everything at once.

As you say so, once I create a blogdown post/project with the .Rmd extension, I import my data which is located on an external folder from the one where my site is located

```{r, echo = FALSE}
data <- read_excel ("~/folder/sub-folder/dataset.xlsx")
``

and I insert my contents (text, code for analysis etc.), the rendered output (the single html page of the website) will remain as it is even when I have deployed the site regardless of where my data is. Am I right?

What do you mean by

"it depends on how you are deploying your site [...] you could restrict what gets into version control using your .gitignore file"

Currently this is what I have in my .gitignore file

.Rproj.user
.Rhistory
.RDate
.Ruserdata

How can I edit this file? Sorry but I'm not very familiar with git.

For example, if I need to import data from a site that I access with a username and password (such as the Human Mortality Database)

```{r, echo = FALSE}
library(demography)
# import mx from HMD
gb_data <- hmd.mx("GBR_NP", "my-username", "my-password", "Great Britain")
 ``

and my site is hosted on a public GitHub repository, is there a way to prevent anyone going to content/project/name-project/index.Rmd from seeing this information?

andresrcs · January 9, 2022, 10:08pm

There are several ways of deploying a site so I can't generalize but, yes, the html file doesn't change because it gets rendered locally (not sure if this applies to all deployment options) not in the server.

If the Rmd is under version control and you commit that file to a public github repository, then anyone can see its content, this is exactly why it is considered a bad practice to hard code your credentials into your code. An alternative is to use a credentials manager like the keyring package for example or simply define your credentials on environmental variables in your startup files (Renviron.site or .Renviron), since others don't have access to that environmental variables there is no way the can see the content.

For example this connection string is publicly available in my blogs github repository and there is no way you can see my credentials.

connection_string <- glue::glue(
  "Driver={{PostgreSQL ANSI}};\\
  Uid={Sys.getenv('MY_UID')};\\
  Pwd={Sys.getenv('MY_PWD')};\\
  Server={Sys.getenv('MY_REMOTE')};\\
  Port=5432;\\
  Database=internet;"
)

gianluca · January 10, 2022, 2:23pm

Perfect thanks a lot.

I did some research on the web about the keyring package and the environmental variables (the .Renviron files) you mentioned and I think that's what I was looking for (if I had done some more thorough research before maybe I wouldn't have needed to open this thread).

Specifically, I found these useful posts on r-bloggers and “Databases using R” in which they also suggest the possibility of using rstudioapi which seemed more intuitive to use and suitable for my case.

What do you think? For example, referring to the code I wrote in my previous post

I could instead import my data to Rmarkdown like so:

```{r, echo = FALSE}
library(demography)
# import mx from HMD
gb_data <- hmd.mx("GBR_NP",
                username = rstudioapi::askForPassword("Digit your username"),
                password = rstudioapi::askForPassword("Digit your password"),
                "Great Britain")
 ``

and then enter my credentials in the popup boxes. Or is there a risk that my credentials will be discovered somewhere?

If you think it will be useful to me, so I avoid going off-topic and asking too many questions in this thread, are there any documentations/tutorials where I could look? I would really appreciate it.

Thanks again.

andresrcs · January 10, 2022, 2:32pm

This is an equally safe option but the down side is that it prevents you from executing your code programmatically and implement any kind of automation. Also, it might not work properly with the "live update" functionality of blogdown.

gianluca · January 10, 2022, 4:51pm

You're right, good observation. While I haven't tried if this method creates problems with blogdown it can be annoying if you have to enter credentials every time the output is rendered.

I think even using the keyring package I might encounter a similar problem ...

library(keyring)
library (demography)

keyring::keyring_create("hmdcred") # dialog box
keyring::key_set("id", keyring ="hmdcred") # dialog box
keyring::key_set("pw", keyring ="hmdcred") # dialog box

gb_data <- hmd.mx("GBR_NP",
                keyring::key_get("id", keyring ="hmdcred"),
                keyring::key_get("pw", keyring ="hmdcred"),
                "Great Britain")

So maybe you mean that it would be better to set the credentials outside of the .Rmd file (if it is possible to do so)?Could you be more specific (explanation for dummies) about how I could automate the process?

In a previous post you mentioned the code you use to set your credentials for the github repo

andresrcs:

connection_string <- glue::glue(
  "Driver={{PostgreSQL ANSI}};\\
  Uid={Sys.getenv('MY_UID')};\\
  Pwd={Sys.getenv('MY_PWD')};\\
  Server={Sys.getenv('MY_REMOTE')};\\
  Port=5432;\\
  Database=internet;"
)

if I may know, what does this function do and what are the steps you follow so that I can take them in my case (and if it is possible to implement it with keyring)?

andresrcs · January 10, 2022, 6:41pm

Not really, you only need interactive dialog boxes when you define your keys the first time but the "keyring" is persistent so you don't need to redefine them on each session, you only need to unlock it and even that can be automated in your local environment (where it needs to be used to render your html output).

The Sys.getenv() simply retrieves the value of the specified environmental variable, for example if I define the MY_UID variable in an Renviron file as

MY_UID=andres

If I run glue("Uid={Sys.getenv('MY_UID')}") the result would be "Uid=andres"

But if another person runs the same code without the same Renviron file available it would get an error or a different result.

gianluca · January 10, 2022, 10:16pm

Thanks again, it's much clearer now. The last two things:

If I define the keyring and keys inside the .Rmd file, every time I add changes and save the file, blogdown re-knits the code (if I understand correctly) and then the dialog box should appear again, even if the key has already been set .. am I wrong?

andresrcs:

The Sys.getenv() simply retrieves the value of the specified environmental variable, for example if I define the MY_UID variable in an Renviron file as
MY_UID=andres
If I run glue("Uid={Sys.getenv('MY_UID')}") the result would be "Uid=andres"

But if another person runs the same code without the same Renviron file available it would get an error or a different result.

Perfect! I found how to create the .Renviron file and then how to define the environmental variables, but I didn't understand where I should save this file, i.e. I don't know if I should leave it on my PC (for example in the Documents folder) or it's necessary to create this file somewhere inside the folder that contains the site and, in the latter case, I don't know how to avoid making it public once I have deployed my site.

andresrcs · January 10, 2022, 11:50pm

You are supposed to define the keyring interactively not in the Rmd file. In your code you only need to retrieve keys not define them.

Startup files can have several locations depending on the desired scope, this article explains locations and the corresponding scope.

gianluca · January 11, 2022, 5:11pm

All very clear now. I also read the article you linked. Thank you so much for your help and for your patience.

I don't know if that's correct but I marked your second post as "solution" because actually, now that I understand what you meant, it was the most complete answer to my initial question.

system · January 18, 2022, 5:12pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.