Rendering Unicode in knitr to pdf

I have written a .Rmd file which includes a unicode character (\u2264, <=) in some figure headings. When I render the code within the .Rmd document, it renders correctly. But when I compile the document using the knit button, the character is rendered incorrectly as "=". This is demonstrated with the following TestCase:


title: "TestCase"
author: "Hunsicker LG"
date: "7/13/2020"
output: pdf_document

R Markdown

This is an R Markdown test document.

library(knitr)
library(survminer)
library(survival)
df1 <- data.frame(time = rnorm(100, 100, 20),
                  event1 = sample(0:1, 100, replace = T),
                  covar1 = sample(0:1, 100, replace = T))
model1 = survfit(Surv(time, event1) ~ covar1, data = df1)
ggsurvplot(fit = model1, risk.table = F, xlab = 'Years', 
           pval = F, title = "Junk \u2264 Stuff")

When rendered within the Rmd document in the R-Studio editor, it renders correctly.
TestCasefromButton|584x360]
But when printed out using the knit button the "<=" character (\u2264) prints out incorrectly as '='.
I suspect that this is some sort of font issue. The Latex is evidently being rendered correctly, but the default Latex to pdf program apparently doesn't interpret the unicode correctly. But I haven't been able to get the document to print correctly. What's the required magic?

Addendum:
One of the nice things about RStudio is that most of the time things just work. One doesn't have to understand how the system does things. But when things don't work, one then has to figure out what is actually happening. After some noodling around, it seems to me that when one pushes the "Knit" button, the following sequence happens:

  • the "Knit" button invokes the render function, which sends the .Rmd file file to knitr, which
  • executes the R code, and then sends the file to markdown, which
  • formats the combined files using pandoc to some sort of "document-p-code," and then
  • passes the document to tinytex/TinyTex to compile the .tex document, which then
  • in the case of a pdf-document, uses either pdfLatex or xeLatex to render the doc to pdf.

Now when I run my TestCase above, if I look at the ouput within the Rmd file in the RStudio Source window, the \u2264 unicode <= character gets rendered correctly. Similarly, if I render the file as a MS Word .docx file, it gets rendered correctly. But when I render the file as a pdf document, it gets mistranslated as "=". For what it's worth, I am using xelatex rather than the default pdflatex, as I am told that xelatex handles unicode better than pdflatex, but this didn't fix my problem.
So it seems likely that the problem is at the tinytex/TinyTex stage, and probably results from the default font file lacking the \u2264 character.
My somewhat refined questions, then, are:

  • Do I understand the above sequence correctly?
  • If so, how do I determine what the default Latex font(s) are?
  • What font(s) would have the \u2254 character (and more generally, math characters)?
  • And finally, how would I install a font that DOES have the \u2254 character without messing up the rest of the formatting? Can I do this by including something in the .Rmd file header?
    Thanks in advance for the help.
    Larry Hunsicker

You are using the special character in a graphic. So I don't think this is pure latex stuff.
Try changing the default graphic device. It is PDF by default, and that may cause the issue with unicode character. I don't really know how to configure the pdf device to work with unicode characters and fonts that supports it.

For now, you can try a png device for example

output:
  pdf_document: 
    dev: png

When I run your example it works for me. Please can you confirm ?

With last knitr version, you can also try the new ragg :package: with its ragg_png device

output:
  pdf_document: 
    dev: ragg_png
1 Like

Amazing! First of all, both your dev: png and (once I have installed ragg) dev: ragg work perfectly. So though you haven't quite answered my questions, you have solved my problem.
Many thanks for that!

But you have raised a new question for me. I am aware that knitr/rmarkdown deals with graphics inserted in a document by rendering the graphics first, then rendering the document inserting the already rendered graphics. If one does the knit using render(XXX.Rmd, clean = FALSE), then all the intermediate files (including the graph files) are saved, and the latex code clearly just inserts the already rendered graphics.
But I never knew about the dev: parameter to the pdf_document line in the markdown header, or specifically that the default rendering device, using pdf_document, was the pdf device. Apparently the pdf device (at least the Ubuntu pdf device) also uses latex code, as when I ran my toy example in my Ubuntu box, it failed with a clear-cut latex error code (inputenc something about the unicode not being set up). Where do I read about dev: options? in knitr or in rmarkdown, or ???.
Again, many thanks for solving my problem!
Larry

You can find informations in several places:

And surely many more resources :wink:

1 Like

A new Yaml problem with the dev: png. I don't understand what I have done wrong. The dev:png worked fine in my toy example, but I am getting a yaml error when I add that one line to the actual document that I am trying to knit.
Yaml header in the original code that works:

---
title:    | 
     | CIT-07 and CIT-06 Long Term Outcomes:
     | Analysis Packet (Version 3)
  
author: "L. G. Hunsicker, MD"
date: '`r format(Sys.time(), "%d %B, %Y")`'
output: pdf_document
---

Header after adding a colon to "document" and the one new line:

---
title:    | 
     | CIT-07 and CIT-06 Long Term Outcomes:
     | Analysis Packet (Version 3)
  
author: "L. G. Hunsicker, MD"
date: '`r format(Sys.time(), "%d %B, %Y")`'
output: pdf_document:
  dev: png
---

Error message returned (after loading all of the required packages):

Error in yaml::yaml.load(..., eval.expr = TRUE) : 
  Scanner error: mapping values are not allowed in this context at line 7, column 21
Calls: <Anonymous> ... parse_yaml_front_matter -> yaml_load -> <Anonymous>

Execution halted
This is particularly confusing given that line 7 (the Date line) is not changed. I would normally work over this befoe asking another question. But since you are obviously now on-line, perhaps you can tell me what happened and what I have to do. (I was worrried that I was keeping you up in the middle of the night. But I see that you are in Paris. I am currently in UK.) Larry

Problem solved after a review of the yaml documentation. Evidently the form:

Blockquote

output: pdf_document

is accepted by yaml, but if one wants to add the dev:png, one has to use:

output:
  pdf_document:
     dev:  png

Odd. But in any case, the problem is solved, and my real document now prints correctly.
Thanks again. Larry

1 Like

I wasn't quick enough :wink: glad you found the answer !
Yaml is very sensible to formatting.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.