Compose a data frame from a string formatted by markdown syntax (or from pandoc's JASON AST)

atusy · December 19, 2019, 2:09pm

I want to convert a markdown text with inline formatting to a data frame whose column Str represents text, and other columns represent how to decorate the text.
I guess one approach is to convert a markdown text to JSON AST by pandoc, and then to the data frame, but I haven't come up with a nice function.
Any idea?

Example:

# md
# I am a ***big** man*.

# JSON
# {"blocks":[{"t":"Para","c":[{"t":"Str","c":"I"},{"t":"Space"},{"t":"Str","c":"am"},{"t":"Space"},{"t":"Str","c":"a"},{"t":"Space"},{"t":"Emph","c":[{"t":"Strong","c":[{"t":"Str","c":"big"}]},{"t":"Space"},{"t":"Str","c":"strong"},{"t":"Space"},{"t":"Str","c":"man"}]},{"t":"Str","c":"."}]}],"pandoc-api-version":[1,17,5,4],"meta":{}}

# What I want
tibble::tribble(
  ~ Emph, ~ Strong, ~ Str,
  FALSE,    FALSE,  "I",
  FALSE,    FALSE,  " ",
  FALSE,    FALSE,  "am",
  FALSE,    FALSE,  " ",
  FALSE,    FALSE,  "a",
  FALSE,    FALSE,  " ",
  TRUE,     TRUE,   "big",
  TRUE,     FALSE,  " ",
  TRUE,     FALSE,  "man",
  FALSE,    FALSE,  "."
)

system · January 9, 2020, 2:09pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.