I have a list of customer comments(more than 10000) mixed with English comments and Chinese comments.
How can I separate these two? I want an outcome with one list of English comments and another list of Chinese comments.
Thanks in advance!
HI @Sara97,
Check-out this great explanation of character-string encoding in R:
http://kevinushey.github.io/blog/2018/02/21/string-encoding-and-r/
You may need to convert your mixed strings into all unicode characters, then filter out those within the "English" and "Chinese" character ranges.
HTH
This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.