The title might not reflect my issue very well which involves programming design/logic more generally.
I'm developing a package to handle authors and affiliations. The package is primarily designed to be used with Quarto and will allow users to inject author data from a dataset into a yaml header following Quarto's author/affiliations schema. I'd also like to make the package accessible to non-Quarto/Rmarkdown users (or users who don't use journal template with their qmd
documents) by generating author lists and affiliations as character strings.
I'm struggling a lot with the logic of my code to generate author list. By author list, I mean a list of authors with annotations, e.g. René Descartes1,2*†, Blaise Pascal3, Antoine Lavoisier1,4‡.
I'm building the package around a few R6 classes. My approach to produce author lists is to have a method that takes a format
argument as a character string which is then parsed to inject actual data from a dataset. The format
argument consists of keys defining each annotation (a
for affiliation, c
for correspondence and n
for note), superscript ^
and separator ,
. E.g., if I reuse the example above, "^ac^"
would produce René Descartes^1,2*^
when "^c,a^n"
would produce René Descartes^*,1,2^†
.
I made a simplified reproducible example of the part I'm struggling with. The example
dataset is the type of dataset generated by that particular method prior to building the author list with the default settings of the class instance.
library(tidyverse)
library(rlang)
library(glue)
example <- structure(list(
id = 1:3,
literal_name = c("René Descartes", "Blaise Pascal", "Antoine Lavoisier"),
corresponding = c(TRUE, FALSE, FALSE),
affiliation_id = c("1,2", "3", "1,4"),
note_id = c("†", "", "‡")
), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -3L))
This is where the author list is built:
aut <- mutate(example, .authors = !!make_author_str(format = "^a,c^n"))
pull(aut) %>%
glue_collapse(", ", last = " and ") %>%
cat()
# René Descartes^1,2,\*^†, Blaise Pascal^3,\*^ and Antoine Lavoisier^1,4,\*^‡
Below are the required helper functions:
make_author_str <- function(format) {
expr({
env <- environment()
dict <- list(
c = .data[["corresponding"]],
a = .data[["affiliation_id"]],
n = .data[["note_id"]]
)
fmt <- parse_format(!!format)
assign_to_keys(dict, seps = fmt$seps, env = env)
pattern <- str_replace_all(fmt$format, "([acn])", "{\\1}")
suffixes <- glue(pattern)
paste0(.data[["literal_name"]], suffixes)
})
}
# build the a, c and n variables prior to parsing from the dict object
# and assign their respective annotations/symbols with separator
assign_to_keys <- function(dict, seps, env) {
iwalk(dict, ~ {
symbols <- if (.y == "c") "\\*" else .x
value <- if_else(
is_true(.x) | !is.null(.x) | .x != "",
paste0(seps[[.y]], symbols),
""
)
assign(.y, value, envir = env)
})
}
clean_format <- function(x) {
gsub("([a-z^,])\\K\\1+|,+", "", x, perl = TRUE)
}
extract_keys <- function(x) {
x <- strsplit(x, split = "")
x <- unlist(x)
x[x %in% letters]
}
extract_key_sep <- function(format, key) {
out <- str_extract(format, paste0("(?!^)(?<=[a-z^]),(?=", key, ")"))
if (is.na(out)) "" else out
}
# returns key separators and a cleaned 'format' string (without comma)
parse_format <- function(format) {
keys <- extract_keys(format)
seps <- map_chr(keys, ~ extract_key_sep(format, .x))
list(
seps = set_names(seps, keys),
format = clean_format(format)
)
}
The above works (minus the correspondence, not sure why but it works in the class) when the corresponding
, affiliation_id
, note_id
columns exist in the dataset but doesn't if any of those is missing (either in format
or example
).
The dict
object as it currently is is too constraining. A better approach might be to build the dict object dynamically, like so:
cols <- list(c = "corresponding", a = "affiliation_id", n = "note_id")
dict <- cols[cols %in% names(example)]
But then I don't manage to retrieve the data using the .data
pronoun inside assign_to_keys()
.
Note that a lot of the complexity here comes from dealing with key separators in format
which is done in parse_format()
and assign_to_keys()
.
Any insights on how I could make it more flexible? I changed this part quite a bit since the first draft and I might have followed a wrong logic in the process. So if you see a better logic/simpler way to do it, please share it.