Your thoughts on interstital comments «like this» to make our R and Rmarkdown code more understandable

Dear Posit Community,
As programming languages, R and Rmarkdown enable us to

  1. express instructions for computers to execute
  2. express these instructions in ways that we humans can understand.

When it comes to writing understandable code, the ball's in our court as programmers, but programming language syntax and semantics determine what' s possible, e.g., compare

  • APL: {(+⌿⍵)÷≢⍵}
  • R: mean(w, na.rm=TRUE)

Integrated Development Environments (IDEs) like RStudio can also play a powerful role, and I'm particularly interested here in the way they can allow us to write one thing (e.g., $e^{i \pi}+1=0$, [Posit Forum](https://forum.posit.co/)) which displays in another way.

This makes me think about interstital comments, i.e., comments in the interstices—the intervening gaps between other things. Languages like C have them, e.g.,

int i /* index over rows */ = N - 1 /* last row */;

...though I haven't seen much code that tries to take advantage of this style.

R, however, has only # comments to end of line
...which limits the kinds of comments we can make.

Now, as someone who has experienced the frustration of trying to change code that was actually within a multiline comment (a long time ago, in a vi editor far, far away), I suggest the following with some hesitation, but also at a time where IDEs are doing more and more to enable us to communicate better in code.

What do you think about interstital comments «like this» to make our R and Rmarkdown code more understandable?

Imagine using these comments to insert additional text to improve readability:

saveRDS(New_ClassList, «to the local» ClassList_file)

Or providing to provide alternate display text for code, so that the following

saveRDS(New_ClassList, «to local file»⸨ClassList_file⸩)

could be displayed as

saveRDS(New_ClassList, «to local file»)

while being interpreted by R as

saveRDS(New_ClassList, ClassList_file)

I'm curious to know what you think about this kind of literate programming idea, its potential pros and cons, and whether such a thing could ever happen in the R universe.
Cheers,
David

I would go with the classic "comments should explain the why, not the how" (discussed in many places, e.g. here). So for your example, I would actually suggest to split it in two lines and have the comment become a variable name:

to_local_path <- ClassList_file
saveRDS(New_ClassList, to_local_path)

I don't really understand what this code is supposed to mean, which makes it hard to write a clear variable name, but that is the general idea: if you have a comment that explains what you're doing, make it the name of a variable or a function. That will avoid later on changing the code and not the comment, and ending up with a comment that doesn't match the code.

Looking at your example

saveRDS(New_ClassList, «to the local» ClassList_file)

this also reminds me a lot of object-oriented languages that use classes. Here «to the local» might be a class which enforces things for ClassList_file, and, if its name is chosen well, allows the reader to better understand what ClassList_file represents.


The main point of literate programming in my eyes is when there is a lot of explanations of the "why", e.g. you want to explain all these equations and illustrate with plots, and you just interleave the code as a way to support the explanation. But the code is usually not the important part when first reading: just like in a math textbook, you might skip the proofs on the first read, but it's important they are there and support the main text.

Typically, most R package vignettes (DESeq2 is a good example IMO) can spend most of their time explaining why you need to run this or that step, and just happen to give you the code in passing (but the fact that the explanation was created by running the code guarantees that they match).

So, back to your question, in a way having additional forms of comments is missing what I think is the main point: the goal is to tie the comments and the code, so that a wrong comment makes the code fail immediately, and forces you to address it. Inline comments make it worse, by having more ways for the comments to diverge from the code, without anyone noticing.

Thanks for your thoughtful response @AlexisW. Choosing variable names so that they clarify what functions or other code is doing is my current strategy. Sometimes this works well. Sometimes I just can't find the right way to name a variable so that its usage is clear and readable in different contexts. Maybe that's further evidence of how hard it is to name things.

I think you are right that interstitial comments would provide more ways to create the impression that code is doing one thing when the execution of the code actually does something else. Introducing a new element into a programming language definitely creates new opportunities for poor usage.

On a more pedestrian note, I think sometimes it would be handy to be able to comment things out in the middle of a line of R code. I'd be interested to know why S and R were designed to only implement comments to end of line...

1 Like

Yep, naming things is hard!

If I might hazard a guess: my impression is that "interstitial" comments like you describe are rare in general. The C block comments /* ... */ are mostly meant as a way to make multi-line comments, rather than less-than-one-line comments. For "proof", I would cite the Wikipedia article on comments, which shows many examples of both block and line comments in many languages, I don't think any single example is an interstitial comment. As far as I can tell, it's not a common practice in an language, even the ones (like C) that make it possible.

So, for the design of S/R, the question is rather why we don't have a convenient multi-line comments symbol. My guess is that it's because S was initially developed more as an interactive language, so that large comment blocks are not something one needs to type everyday (and the few people who do would be the kind who use text editors that can automate it).