I want to convert a markdown text with inline formatting to a data frame whose column Str
represents text, and other columns represent how to decorate the text.
I guess one approach is to convert a markdown text to JSON AST by pandoc, and then to the data frame, but I haven't come up with a nice function.
Any idea?
Example:
# md
# I am a ***big** man*.
# JSON
# {"blocks":[{"t":"Para","c":[{"t":"Str","c":"I"},{"t":"Space"},{"t":"Str","c":"am"},{"t":"Space"},{"t":"Str","c":"a"},{"t":"Space"},{"t":"Emph","c":[{"t":"Strong","c":[{"t":"Str","c":"big"}]},{"t":"Space"},{"t":"Str","c":"strong"},{"t":"Space"},{"t":"Str","c":"man"}]},{"t":"Str","c":"."}]}],"pandoc-api-version":[1,17,5,4],"meta":{}}
# What I want
tibble::tribble(
~ Emph, ~ Strong, ~ Str,
FALSE, FALSE, "I",
FALSE, FALSE, " ",
FALSE, FALSE, "am",
FALSE, FALSE, " ",
FALSE, FALSE, "a",
FALSE, FALSE, " ",
TRUE, TRUE, "big",
TRUE, FALSE, " ",
TRUE, FALSE, "man",
FALSE, FALSE, "."
)