Hi everyone,
I have a dataset from Vietnam. But when I read it in the R, the string variables are imported broken.
I used stri_trans_general from stringi package. It works on only few columns.
I checked the raw dataset, it seems those few columns were broken when the dataset was exported from the survey collecting platform.
"Du?c ch?t m?i"
When I say broken i mean with "?" or ">" instead of actual words.
So any recommendation, how i can retrieve these broken words in R?
If you can get the data from the 'survey collecting platform' to be exported in UTF-8, you can then import it in R with UTF-8 encoding and that should solve it.
In case you can't get the correct input data anymore, your only option is to substitute the characters again I think using something like str_replace_all from the stringr package. This will only work if each symbol is only matching to one letter of course...