TL;DR:
Rstudio cannot be used on Windows if you have multibyte characters in filepaths. The answers to the exact same bug has been unanswered for more than a year or nonchalantly brushed aside.
see exact same error
PROBLEM:
Latin characters does not get interpreted correctly by Rstudio and core functions in R.
This is unique to RStudio on Windows. I do not know if it is related to R itself. Rgui has no problem printing æøå out of the box when installed. I should test R on the command line.
**sessionInfo()**
R version 4.2.2 (2022-10-31 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 22000)
Matrix products: default
locale:
[1] C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
loaded via a namespace (and not attached):
[1] compiler_4.2.2 tools_4.2.2
I am actually on Windows 11, but hey ...
Mind you, problems persist also under other settings for locale.
consider a completely fresh install of Rstudio 22.07.2 Build 576 with R 4.2.2 (2022-10-31 ucrt) :
normalizePath("~")
[1] "C:\\Users\\userpath\\OneDrive - organisation name with�\\Dokumenter"
Warning message:
In normalizePath(path.expand(path), winslash, mustWork) : path[1]="C:/Users/userpath/OneDrive - organisation name with�/Dokumenter": The system cannot find the path specified
normalizePath("C:/Users/userpath/OneDrive - organisation name withø/Dokumenter")
Error in normalizePath("C:/Users/userpath/OneDrive - organisation name with�/Dokumenter") :
file name conversion problem -- name too long?
path.expand("~")
[1] "C:/Users/userpath/OneDrive - organisation name with\xf8/Dokumenter"
### or, depending on the locale settings:
[1] "C:/Users/userpath/OneDrive - organisation name with�/Dokumenter"
print("æøå")
[1] "���"
REAL problems arise:
df <- read.csv2("C:/Users/userpath/OneDrive - organisation name withø/Dokumenter/csvtest.csv")`
Error in file(file, "rt") :
invalid input 'C:/Users/userpath/OneDrive - organisation name withr�/csvtest.csv' in 'utf8towcs'
But thankfully, the above works when locale is sett to .UTF8. Not so if you try to use ~ e.g. to make things a bit more compact:
df <- read.csv2("~/csvtest.csv")
Error in file(file, "rt") : cannot open the connection
In addition: Warning message:
In file(file, "rt") : cannot open file 'C:/Users/userpath/OneDrive - organisation name withr�/Dokumenter/csvtest.csv': Illegal byte sequence
Setting R_HOME to path without funny characters in Windows directly helps. Then I can place .Rprofile in "safe" place, so that I can set locale safely. It still does not help resolving directories outside ~ containing multibyte characters.
Basically, Norwegian is my language and language is a effin big part of the REALITY of a large part of the human population.
I am a bit sad and desperate in my tone because I find again and again that the common answer to questions why encoding gets messy is: "Do not use those characters" or "Encode them to ASCII". I also know that naming folders with funny o-s or other exiting characters is a non-safe habit - but 1) we all have to deal with other people, and reality 2) they are allowed according to ISO.
This last year, support for printing my language-specific characters have deteriorated significantly. There has always been problems - e.g. having to run two different processes to get a parameters with æøå to get correctly processed in markdown whether it is rendered interrogatively or through a script. Now however, I cannot continue using RStudio on Windows.
That sucks, as I have no other alternatives to Windows at work. R under WSL2 is a pain in the behinds.
So, either this be fixed or I have to ditch R altogether - or VScode for the most necessary scripting.
- edited to be more to the point