Converting a tibble column to an array of dictionaries in YAML

I'm trying to generate the following YAML structure from tabular data:

- name: Josiah Carberry
  roles: 
    - investigation: lead 
    - data curation: supporting

I'm struggling with the structure of the roles key. It's basically an array of dictionaries which would translate to a list of tibbles/data frames in R.

My issue is that I can't figure out how to store such lists of tibbles in a way that will produce the same output as in the example above.

My closest attempt is:

library(tibble)

tibble(
  id = paste0("id", 1:3),
  roles = list(
    list(tibble(writing = "lead"), tibble(supervision = "supporting")),
    list(tibble(writing = "equal")),
    list(tibble(writing = "supporting"), tibble(supervision = "lead"))
  )
) |> 
  jsonlite::toJSON() |> 
  jsonlite::parse_json() |> 
  yaml::as.yaml() |> 
  cat()

Which produces:

- id: id1
  roles:
  - - writing: lead
  - - supervision: supporting
- id: id2
  roles:
  - - writing: equal
- id: id3
  roles:
  - - writing: supporting
  - - supervision: lead

As you can see I have one extra dash before each role because of the outer list I use to store the roles data.

Any idea how I could generate the following:

- id: id1
  roles:
    - writing: lead
    - supervision: supporting
...

The calls to jsonlite confuse me, I think all you're trying to do is change the outer tibble from column major to row major, this can be done with purrr::transpose():

library(tibble)

tibble(
  id = paste0("id", 1:3),
  roles = list(
    list(tibble(writing = "lead"), tibble(supervision = "supporting")),
    list(tibble(writing = "equal")),
    list(tibble(writing = "supporting"), tibble(supervision = "lead"))
  )
) |>
  purrr::transpose() |>
  yaml::as.yaml() |>
  cat()
#> - id: id1
#>   roles:
#>   - writing: lead
#>   - supervision: supporting
#> - id: id2
#>   roles:
#>   - writing: equal
#> - id: id3
#>   roles:
#>   - writing: supporting
#>   - supervision: lead

Created on 2023-11-06 with reprex v2.0.2

1 Like

Thanks, that looks promising. I probably simplified the example a bit too much and using your solution doesn't work for more complex cases that use a combination of dictionaries, arrays and nested arrays of dictionaries. E.g.:

library(tibble)

tibble(
  id = paste0("id", 1:3),
  name = tibble(
    given = c("John", "David", "Tom"),
    family = c("Smith", "Brown", "Williams"),
  ),
  roles = list(
    list(tibble(writing = "lead"), tibble(supervision = "supporting")),
    list(tibble(writing = "equal")),
    list(tibble(writing = "supporting"), tibble(supervision = "lead"))
  )
) |> 
  purrr::transpose() |>
  yaml::as.yaml(indent.mapping.sequence = TRUE) |> 
  cat()
#> Warning: Element 2 must be length 3, not 2
#> - id: id1
#>   name:
#>     - John
#>     - David
#>     - Tom
#>   roles:
#>     - writing: lead
#>     - supervision: supporting
#> - id: id2
#>   name:
#>     - Smith
#>     - Brown
#>     - Williams
#>   roles:
#>     - writing: equal
#> - id: id3
#>   name: ~
#>   roles:
#>     - writing: supporting
#>     - supervision: lead

Created on 2023-11-07 by the reprex package (v2.0.1)

When the desired output is:

- id: id1
  name:
    given: John
    family: Smith
  roles:
    - writing: lead
    - supervision: supporting
- id: id2
  name:
    given: David
    family: Brown
  roles:
    - writing: equal
- id: id3
  name:
    given: Tom
    family: Williams
  roles:
    - writing: supporting
    - supervision: lead

I can tweak the data above to have something closer to what I want, but at the moment I can't get it work with all the different data types I need.

The conversion to JSON allows me to handle that variety of data as well as missing values.
This is originally for a package I'm developing. You can see a real life example of the typical YAML generated by the package here.

library(tidyverse)
library(jsonlite)
library(yaml) 
library(slider)

(info <- tibble(
  id = paste0("id", 1:3),
  name = tibble(
    given = c("John", "David", "Tom"),
    family = c("Smith", "Brown", "Williams"),
  ),
  roles = list(
    list(tibble(writing = "lead"), tibble(supervision = "supporting")),
    list(tibble(writing = "equal")),
    list(tibble(writing = "supporting"), tibble(supervision = "lead"))
  )
))

row_translate <- function(row){
i1 <- as.list(row)
i1$name <- as.list(i1$name)
i1$roles <- i1$roles[[1]] |> map(as.list)
i1
}

slide(info,row_translate) |> as.yaml() |> cat()
2 Likes

From my understanding, the problem here is you also want to transpose the "names" sub-tibble, so you could do that with purrr::modify_in():

library(tibble)

tibble(
  id = paste0("id", 1:3),
  name = tibble(
    given = c("John", "David", "Tom"),
    family = c("Smith", "Brown", "Williams"),
  ),
  roles = list(
    list(tibble(writing = "lead"), tibble(supervision = "supporting")),
    list(tibble(writing = "equal")),
    list(tibble(writing = "supporting"), tibble(supervision = "lead"))
  )
) |>
  purrr::modify_in(.where = "name", .f = purrr::transpose) |>
  purrr::transpose() |>
  yaml::as.yaml(indent.mapping.sequence = TRUE) |> 
  cat()
#> - id: id1
#>   name:
#>     given: John
#>     family: Smith
#>   roles:
#>     - writing: lead
#>     - supervision: supporting
#> - id: id2
#>   name:
#>     given: David
#>     family: Brown
#>   roles:
#>     - writing: equal
#> - id: id3
#>   name:
#>     given: Tom
#>     family: Williams
#>   roles:
#>     - writing: supporting
#>     - supervision: lead

Created on 2023-11-07 with reprex v2.0.2

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.