I am working on a project where I have used tweets with Emojis and Emoticons. My main goal is to get the combined sentiment score of the tweets( text + Emoticons ) and as we know these emoticons are probably the most meaningful part of the data and that's they can not be neglected. I have converted the encoding structure of the emojis and emoticons via iconv but I am only getting the sentiment score for the text, not the emojis. I am using Vader sentiment in this process but if there is another Sentiment library/Lexicon that can be used which will give me the senti score for all the emojis too it will be a lot helpful and highly appreciated.
Tweets:
dput(df_emoji$Description)
c("DoorDash or Uber method asap<f0><9f><98><ad> cause I be starving<f0><9f><98><ad><f0><9f><98><ad>",
"such a real ahh niqq cuz I be having myself weak asl<f0><9f><98><82>",
"shii made me laugh so fuccin hard bro<f0><9f><98><82><f0><9f><98><82><f0><9f><98><82><f0><9f><98><82>",
"Hart and Will Ferrell made a Gem in Get hard fr<f0><9f><98><82><f0><9f><98><82><f0><9f><98><82>",
"@NigerianAmazon Chill<f0><9f><a4><a3><f0><9f><98><ad>", "so bomedy <f0><9f><98><82><f0><9f><98><82><f0><9f><98><82>",
"is that ass Gotdam<f0><9f><98><82><f0><9f><98><82><f0><9f><98><82>",
"wild<f0><9f><98><82><f0><9f><98><82><f0><9f><98><82>",
"them late night DoorDash<e2><80><99>s be goin crazy<f0><9f><a4><a3>",
"of the week<f0><9f><98><82><f0><9f><98><82><f0><9f><98><82><f0><9f><98><82>"
)
Code:
emoji_senti <- data.frame(text = iconv(data_sample$text, "latin1", "ASCII", "byte"),
stringsAsFactors = FALSE)
column1 <- separate(emoji_senti, text, into = c("Bytes", "Description"), sep = "\\ ")
column2 <- separate(emoji_senti, text, into = c("Bytes", "Description"), sep = "^[^\\s]*\\s")
df_emoji <- data.frame(Bytes = column1$Bytes, Description = column2$Description)
allvals_emoji <- NULL
for (i in 1:length(df_emoji$Description)){
outs <- vader_df(df_emoji$Description[i])
allvals_emoji <- rbind(allvals_emoji,outs)
}
allvals_emoji
See this that the first tweet has only 9 English words which have their scores but it misses the score for converted Unicode for emojis.
# word_scores compound pos neu neg but_count
# 1 {0, 0, 0, 0, 0, 0, 0, 0, 0} 0.000 0.000 1.000 0.000 0
# 2 {0, 0, 0, 0, 0, 0, 0, 0, 0, 0, -1.9, 0, 0} -0.440 0.000 0.805 0.195 0
# 3 {0, 0, 0, 2.6, 0, 0, -0.67835, 0, 0} 0.444 0.293 0.570 0.137 0
# 4 {0, 0, 0, 0, 0, 0, 0, 0, 0, -0.4, 0} -0.103 0.000 0.877 0.123 0
# 5 {0, 0} 0.000 0.000 1.000 0.000 0
# 6 {0, 0, 0, 0} 0.000 0.000 1.000 0.000 0
# 7 {0, 0, -2.5, 0, 0} -0.542 0.000 0.533 0.467 0
# 8 {0, 0} 0.000 0.000 1.000 0.000 0
# 9 {0, 0, 0, 0, 0, 0, 0} 0.000 0.000 1.000 0.000 0
# 10 {0, 0, 0, 0} 0.000 0.000 1.000 0.000