How to seperate chinese element from a list of english text

Sara97 · April 24, 2020, 8:59pm

I have a list of customer comments(more than 10000) mixed with English comments and Chinese comments.
How can I separate these two? I want an outcome with one list of English comments and another list of Chinese comments.
Thanks in advance!

DavoWW · April 25, 2020, 9:15am

HI @Sara97,
Check-out this great explanation of character-string encoding in R:
http://kevinushey.github.io/blog/2018/02/21/string-encoding-and-r/

You may need to convert your mixed strings into all unicode characters, then filter out those within the "English" and "Chinese" character ranges.

HTH

system · May 16, 2020, 9:15am

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.