How to categorize each word in a row as positive or negative (Sentiment Analysis)?

Rapidz · September 24, 2022, 5:03am

My data frame consists of one column containing many sentences.
I know I need to use this:

library(tidytext)
get_sentiments("bing")

I have spent considerable time following various tutorials to figure out the frequency of positive and negative words in each row. I am at a loss for what to do, as I have exhausted so much time already. Any help would be appreciated!

HanOostdijk · September 24, 2022, 2:00pm

See the vignette of the tidytext package for some examples.
By reading the first few pages I was able to 'compose' the following prose:

library(tidytext)
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
sentiment_table = get_sentiments('bing') 
my_sentences = data.frame(
  text1 = c(
    "This good man accuses him of erratic and strange behaviour",
    "Uncertainty is a tragic given these days"
  )
)

x <- my_sentences |>
  mutate(sentence_number = row_number()) |>
  unnest_tokens(word,text1) |>
  inner_join(sentiment_table,by="word") |>
  print()
#>   sentence_number    word sentiment
#> 1               1    good  positive
#> 2               1 accuses  negative
#> 3               1 erratic  negative
#> 4               1 strange  negative
#> 5               2  tragic  negative
Created on 2022-09-24 with reprex v2.0.2

Rapidz · September 28, 2022, 1:45am

@HanOostdijk
This is what I was looking for, but I am struggling to replace the sentence you have inputted with my entire data frame.

HanOostdijk · September 28, 2022, 8:32am

Hello @Rapidz,
in the last part of my code replace my_sentences by the name of your data_frame
and replace text1by the name of your column (that contains the text strings that you want to categorize.

If this does not help, show us the code that you use, the first few lines of the input data.frame and messages (if any). In this case the info in producing a minimal reproducible example might help.

system · October 5, 2022, 8:33am

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.