There was an R blog post announcing UTF-8 support on Windows 10, starting with R 4.0.
It says:
In the experimental build of R, UTF-8 is the native encoding, so RGui will not use any
\u
,\U
escapes when sending text to R and R will not embed any UTF-8 strings, because the native encoding is already UTF-8.
Since I just stumbled on one more UTF-8 related problem, I decided to upgrade to R 4.0.2 and the new toolchain hoping the problems will go away.
However, after installing, the default locale on my system (Windows 10) is:
Sys.getlocale()
# [1] "LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States.1252;LC_MONETARY=English_United States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252"
And attempts to set the locale either in session or from .Rprofile
via calls like
Sys.setlocale(category = "LC_CTYPE", locale = "en_US.UTF8")
Sys.setlocale(category = "LC_CTYPE", locale = "English_United States.utf8")
Sys.setlocale(category = "LC_COLLATE", locale = "English_United States.utf8")
Sys.setlocale(category = "LC_COLLATE", locale = "en_US.UTF8")
result in
Warning message:
In Sys.setlocale(category = "LC_CTYPE", locale = "en_US.UTF8") :
OS reports request to set locale to "en_US.UTF8" cannot be honored
The problem arises in both RStudio and RGUI.
Maybe I was searching in bad places, but aside from the blog, the only other reference to UTF-8 on Windows and 4.0 I found is this Stack Overflow question: utf 8 - UTF-8 support in R on Windows - Stack Overflow where a user has the same problem as me and the only suggestion is to use specific locale categories (i.e. LC_CTYPE
) instead of LC_ALL
, which unfortunately makes no difference for me. All the other resources I could find refer to older R versions.
Did I misunderstood the blog post and the support for native UTF-8 is yet to come in a future version? Or am I missing some step needed to make UTF-8 work for me?
Thanks for any hints.