Hello,
Thank you for your answer. Maybe this is more specific:
When I do the same as you using readLines, I get a correct result:
> gl <- readLines("C:/Vuilbak/greek_letters.txt",encoding = "UTF-8")
> cat(gl)
Α α, Β β, Γ γ, Δ δ, Ε ε, Ζ ζ, Η η, Θ θ, Ι ι, Κ κ, Λ λ, Μ μ, Ν ν, Ξ ξ
But, when I want to use read.table, without encoding I get “wrong” characters because it is not Unicode,
When I use fileEncoding, I get errors in R (see below).
read.table(file="C:/Vuilbak/greek_letters.txt",sep=",",header=FALSE)
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14
1 Α α Î’ β Γ γ Δ δ Ε ε Ζ ζ Η η Θ θ Ι ι Κ κ Λ λ Îœ μ Î\u009d ν Ξ ξ
> read.table(file="C:/Vuilbak/greek_letters.txt",sep=",",header=FALSE,fileEncoding="UTF-8")
Error in read.table(file = "C:/Vuilbak/greek_letters.txt", sep = ",", :
no lines available in input
In addition: Warning message:
In read.table(file = "C:/Vuilbak/greek_letters.txt", sep = ",", :
invalid input found on input connection 'C:/Vuilbak/greek_letters.txt'
In order to find a solution, I have reinstalled R to the newest version, to be sure it is not linked to the stackoverflow item you mention that there were some encoding issues. But this did not solve the problem. I have looked further on the internet and finally I have found the line that solved the problem:
“Sys.setlocale(locale = 'en_BE.UTF-8')"
> Sys.setlocale(locale = 'en_BE.UTF-8')
[1] "LC_COLLATE=en_BE.UTF-8;LC_CTYPE=en_BE.UTF-8;LC_MONETARY=en_BE.UTF-8;LC_NUMERIC=C;LC_TIME=en_BE.UTF-8"
> sessionInfo()
R version 4.2.1 (2022-06-23 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19042)
Matrix products: default
locale:
[1] LC_COLLATE=en_BE.UTF-8 LC_CTYPE=en_BE.UTF-8 LC_MONETARY=en_BE.UTF-8 LC_NUMERIC=C LC_TIME=en_BE.UTF-8
system code page: 1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] …
loaded via a namespace (and not attached):
[1] …
> gl <- readLines("C:/Vuilbak/greek_letters.txt",encoding = "UTF-8")
> cat(gl)
Α α, Β β, Γ γ, Δ δ, Ε ε, Ζ ζ, Η η, Θ θ, Ι ι, Κ κ, Λ λ, Μ μ, Ν ν, Ξ ξ
> read.table(file="C:/Vuilbak/greek_letters.txt",sep=",",header=FALSE,fileEncoding="UTF-8")
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14
1 Α α Β β Γ γ Δ δ Ε ε Ζ ζ Η η Θ θ Ι ι Κ κ Λ λ Μ μ Ν ν Ξ ξ
Thanks again for you reply!!
Laura