Native language issue

Hayk · May 6, 2020, 7:03pm

Hi,

I have a tibble with dates in wide format. I transposed this tibble so that I have a column of dates as a variable and a column of values with melt function (reshape package) as well as with pivot_longer function (tidyr package). In both cases the character values which were written in my native language (Armenian) messed up and now there are weird symbols in their place.

nirgrahamuk · May 7, 2020, 8:56am

How do you come to have the tibble in question?
You constructed it by typing it into your code? If so confirm that the script encoding is UTF-8.
If you imported from a file such as a CSV. Confirm that the CSV is in UTF-8

Hayk · May 7, 2020, 5:53pm

I have read it from an excel file. okay and how can I confirm script encoding or check it?

nirgrahamuk · May 7, 2020, 6:09pm

what R command did you use ?

Hayk · May 7, 2020, 6:11pm

R command? R studio? Do you mean that?

nirgrahamuk · May 7, 2020, 10:18pm

What code did you write

Hayk · May 8, 2020, 11:58am

Ah yes, sorry. here is my code.

library(readxl)

library(reshape2)

setwd("D:/M&L_Risks/Bank_Performance_Analysis")

excel_sheets("Working_X.xlsx")

df_db <- read_excel("Working_X.xlsx", sheet = "Accounts_DB")
col <- colnames(df_db[,1:18])

m_db <- iconv(df_db,from = "Latin-1", to = "UTF-8")

m_db <- melt(df_db, col, na.rm = TRUE, value.name = "Value" )

Hayk · May 8, 2020, 12:01pm

And by the way the code written as
iconv(df_db,from = "Latin-1", to = "UTF-8") doesn't work.
It generates an error - unsupported conversion from Latin-1 to UTF-8

Hayk · May 10, 2020, 4:30pm

Here is the topic I have created on this issue.

nirgrahamuk · May 11, 2020, 8:41am

iconv() wants to convert character vectors, it wont work for dataframes like you are trying to use it.

Were you trying iconv because you know your excel file is in Latin-1 encoding ? or was that a guess on your part ?
If you have excel perhaps you can open the file and resave it with UTF-8 encoding ?

Perhaps you can raise an issue, or add to an existing issue on readxl github.

github.com/tidyverse/readxl

Can't read xls with non UTF-8 encoding

opened 06:54AM - 02 Apr 19 UTC

closed 03:28AM - 19 Mar 22 UTC

MeoWoo6

bug xls 👵 libxls

I want to read xls files in loop but the **read_xls** seems can only read the **…first** xls file, but when I open the second or other xls file in EXCEL2016 window(PS:NO file broken or other infomation occured), then close it **without** any changes. Rerun the code below , then the read_xls can read the second xls file. I am so confused. I have put the data here [Data](https://drive.google.com/open?id=1q5gcHeeeXNv_rZqOfdr9zZ-J88rj_yUw) My code show as below ``` r library(readxl) library(reprex) setwd("C:\\Users\\len\\Desktop\\Data") files<-list.files(recursive = T) files<-files[grepl(".XLS",files)] files #> [1] "20190326.seq/33.0000.XLS" "20190327.seq/01.0000.XLS" #> [3] "20190328.seq/04.0000.XLS" "20190329.seq/15.0000.XLS" #> [5] "20190330.seq/09.0000.XLS" "20190331.seq/03.0000.XLS" packageVersion("readxl") #> [1] '1.3.1' for(i in 1:length(files)) { #dat<-read_xls(files[i],sheet = 1,na = "n.a.",skip = 0,col_names = F)[1:52,] dat<-read_xls(files[i]) message(paste("Read",files[i])) print(head(dat)) } #> New names: #> * Area -> Area...3 #> * Area -> Area...4 #> * Area -> Area...5 #> * Area -> Area...6 #> * Area -> Area...7 #> * ... and 13 more problems #> Read 20190326.seq/33.0000.XLS #> # A tibble: 6 x 20 #> No. Time Area...3 Area...4 Area...5 Area...6 Area...7 Area...8 #> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> #> 1 <NA> <NA> "\xa6\x~ "\xa6\x~ "\xa6\x~ "\xa6\x~ "\xa6\x~ "\xa6\x~ #> 2 <NA> <NA> AOC Cl NO2 SO4 NO3 NH4 #> 3 <NA> <NA> ECD_1 ECD_1 ECD_1 ECD_1 ECD_1 ECD_2 #> 4 1 4355~ n.a. 0.11860~ 0.02600~ 0.36162~ 0.36177~ 0.54646~ #> 5 2 4355~ n.a. 0.13981~ 0.03278~ 0.36993~ 0.38382~ 0.56742~ #> 6 3 4355~ n.a. n.a. n.a. 0.35741~ 0.34220~ 0.56133~ #> # ... with 12 more variables: Area...9 <chr>, Area...10 <chr>, #> # Area...11 <chr>, Area...12 <chr>, Area...13 <chr>, Area...14 <chr>, #> # Area...15 <chr>, Area...16 <chr>, Area...17 <chr>, Area...18 <chr>, #> # Area...19 <chr>, Area...20 <chr> #> Read 20190327.seq/01.0000.XLS #> # A tibble: 0 x 0 #> Read 20190328.seq/04.0000.XLS #> # A tibble: 0 x 0 #> Read 20190329.seq/15.0000.XLS #> # A tibble: 0 x 0 #> Read 20190330.seq/09.0000.XLS #> # A tibble: 0 x 0 #> Read 20190331.seq/03.0000.XLS #> # A tibble: 0 x 0 ``` rerun the code after I open the second XLS file ``` r library(readxl) library(reprex) setwd("C:\\Users\\len\\Desktop\\Data") files<-list.files(recursive = T) files<-files[grepl(".XLS",files)] files #> [1] "20190326.seq/33.0000.XLS" "20190327.seq/01.0000.XLS" #> [3] "20190328.seq/04.0000.XLS" "20190329.seq/15.0000.XLS" #> [5] "20190330.seq/09.0000.XLS" "20190331.seq/03.0000.XLS" packageVersion("readxl") #> [1] '1.3.1' for(i in 1:length(files)) { #dat<-read_xls(files[i],sheet = 1,na = "n.a.",skip = 0,col_names = F)[1:52,] dat<-read_xls(files[i]) message(paste("Read",files[i])) print(head(dat)) } #> New names: #> * Area -> Area...3 #> * Area -> Area...4 #> * Area -> Area...5 #> * Area -> Area...6 #> * Area -> Area...7 #> * ... and 13 more problems #> Read 20190326.seq/33.0000.XLS #> # A tibble: 6 x 20 #> No. Time Area...3 Area...4 Area...5 Area...6 Area...7 Area...8 #> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> #> 1 <NA> <NA> "\xa6\x~ "\xa6\x~ "\xa6\x~ "\xa6\x~ "\xa6\x~ "\xa6\x~ #> 2 <NA> <NA> AOC Cl NO2 SO4 NO3 NH4 #> 3 <NA> <NA> ECD_1 ECD_1 ECD_1 ECD_1 ECD_1 ECD_2 #> 4 1 4355~ n.a. 0.11860~ 0.02600~ 0.36162~ 0.36177~ 0.54646~ #> 5 2 4355~ n.a. 0.13981~ 0.03278~ 0.36993~ 0.38382~ 0.56742~ #> 6 3 4355~ n.a. n.a. n.a. 0.35741~ 0.34220~ 0.56133~ #> # ... with 12 more variables: Area...9 <chr>, Area...10 <chr>, #> # Area...11 <chr>, Area...12 <chr>, Area...13 <chr>, Area...14 <chr>, #> # Area...15 <chr>, Area...16 <chr>, Area...17 <chr>, Area...18 <chr>, #> # Area...19 <chr>, Area...20 <chr> #> New names: #> * Area -> Area...3 #> * Area -> Area...4 #> * Area -> Area...5 #> * Area -> Area...6 #> * Area -> Area...7 #> * ... and 13 more problems #> Read 20190327.seq/01.0000.XLS #> # A tibble: 6 x 20 #> No. Time Area...3 Area...4 Area...5 Area...6 Area...7 Area...8 #> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> #> 1 <NA> <NA> "\xa6\x~ "\xa6\x~ "\xa6\x~ "\xa6\x~ "\xa6\x~ "\xa6\x~ #> 2 <NA> <NA> AOC Cl NO2 SO4 NO3 NH4 #> 3 <NA> <NA> ECD_1 ECD_1 ECD_1 ECD_1 ECD_1 ECD_2 #> 4 1 4355~ n.a. 0.05765~ 0.01526~ 0.43807~ 0.37693~ 0.57607~ #> 5 2 4355~ n.a. 0.05745~ 0.01683~ 0.42131~ 0.36903~ 0.57463~ #> 6 3 4355~ n.a. 0.05793~ 0.01444~ 0.42888~ 0.38235~ 0.58119~ #> # ... with 12 more variables: Area...9 <chr>, Area...10 <chr>, #> # Area...11 <chr>, Area...12 <chr>, Area...13 <chr>, Area...14 <chr>, #> # Area...15 <chr>, Area...16 <chr>, Area...17 <chr>, Area...18 <chr>, #> # Area...19 <chr>, Area...20 <chr> #> Read 20190328.seq/04.0000.XLS #> # A tibble: 0 x 0 #> Read 20190329.seq/15.0000.XLS #> # A tibble: 0 x 0 #> Read 20190330.seq/09.0000.XLS #> # A tibble: 0 x 0 #> Read 20190331.seq/03.0000.XLS #> # A tibble: 0 x 0 ```

system · May 18, 2020, 8:41am

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.