Find frequency of words

sharmachetan · April 24, 2022, 6:27pm

Can someone please help me find the frequency of words from excel .csv file. I want frequency from each column separately except first column and then want to correlate frequency of words with the first column score (which starts from 7 and ends to 9). Here is how data looks like:

squads <- tibble::tribble(
            ~`Q4_1__Category-_Liking_attribute`, ~Q8__1___COMMENTS, ~Q8__2___COMMENTS,         ~Q8__3___COMMENTS,   ~Q8__4___COMMENTS,         ~Q8__5___COMMENTS,
                                             7L,     "Good flavor",       "Off color",                  "Smooth",        "Whipped ok",                    "Warm",
                                             8L,          "Smooth",          "Creamy",               "Wholesome",           "Natural",                 "Organic",
                                             9L,       "Wholesome",         "Natural",               "Delicious",           "Healthy",                   "Tasty",
                                             9L,       "Different",       "Wholesome",                 "Natural",             "Tasty",                 "Organic",
                                             8L,           "Plain",        "Potatoey",                   "Tasty",           "Natural",                "Homemade",
                                             7L,            "Good",          "Chunky",               "Flavorful",         "Authentic",                   "Tasty",
                                             7L,           "Thick",           "Tasty",               "Authentic",     "Very potatoey",                    "Good",
                                             7L,          "Purple",     "Interesting",                     "Fun",         "Different",                 "Unusual",
                                             7L,           "White",             "Hot",                  "Mashed",            "Smooth",                  "Creamy",
                                             8L,        "Colorful",          "Bright",                   "Tasty", "Real potato taste", "Worthy of Sunday dinner",
                                             8L,            "Bold",       "Flavorful", "Tastes like real potato",               "Hot",                    "Good",
                                             7L,             "Hot",          "Smooth",                  "Creamy",           "Potatey",                   "White",
                                             8L,        "Colorful",     "Interesting",                "Creative",              "Bold",                 "Unusual"
            )
head(sqauds)
#> Error in head(sqauds): object 'sqauds' not found

Code:

library(tidyverse)
library(tidytext)
library(tm)
#> Loading required package: NLP
#> 
#> Attaching package: 'NLP'
#> The following object is masked from 'package:ggplot2':
#> 
#>     annotate
library(dplyr)
library(qdap)
#> Loading required package: qdapDictionaries
#> Loading required package: qdapRegex
#> 
#> Attaching package: 'qdapRegex'
#> The following object is masked from 'package:dplyr':
#> 
#>     explain
#> The following object is masked from 'package:ggplot2':
#> 
#>     %+%
#> Loading required package: qdapTools
#> 
#> Attaching package: 'qdapTools'
#> The following object is masked from 'package:dplyr':
#> 
#>     id
#> Loading required package: RColorBrewer
#> Error: package or namespace load failed for 'qdap':
#>  .onLoad failed in loadNamespace() for 'rJava', details:
#>   call: fun(libname, pkgname)
#>   error: JAVA_HOME cannot be determined from the Registry
df <- read.csv("C:/Users/cs/Downloads/review.csv", header=T)
colnames(df)[1] = "score"
colnames(df)[2] = "First_response"
colnames(df)[3] = "Second_response"
colnames(df)[4] = "Third_response"
colnames(df)[5] = "Fourth_response"
colnames(df)[6] = "Fifth_response"

^{Created on 2022-04-24 by the reprex package (v2.0.1)}

nirgrahamuk · April 25, 2022, 8:47am

Here are transformations using only tidyverse

(squads_long <- mutate(squads, rn = row_number()) %>%
  pivot_longer(cols = where(is.character),
               names_to = "question", values_to = "answer") %>% 
    separate_rows(answer))

(per_q_word_frq <- group_by(squads_long, question, answer) %>% summarise(frq = n()))

(together <- left_join(squads_long, per_q_word_frq,
  by = c("question", "answer")
))

I dont understand how to interpret what you said about correlation so I've not gone there. Perhaps you can say more about it.

sharmachetan · April 25, 2022, 12:26pm

I am getting an error, when I run this code:

> Error in mutate(squads, rn = row_number()) : object 'squads' not found

If I am missing some package updates or else?

nirgrahamuk · April 25, 2022, 12:31pm

you had inconsistent spelling in your post.
The first time squads the second time sqauds. I picked the first one to go with, review that.

sharmachetan · April 26, 2022, 6:38pm

Sorry, I didn't get it. What do you mean?

nirgrahamuk · April 26, 2022, 6:45pm

your very first line of shared code was :

squads <- tibble::tribble(

therefore my code assumes that your starting dataset is called squads.
you can change the name

sharmachetan · April 26, 2022, 6:57pm

Thanks, I got it. It ran and I got the frequencies. Do you know if how I can get a table (in console) of frequencies in descending or ascending order. I want to see the highest frequencies.

nirgrahamuk · April 26, 2022, 7:18pm

slice_max(together,frq,n=10)

for top 10.
change max to min for bottom

sharmachetan · April 26, 2022, 7:49pm

@nirgrahamuk Thanks for the help. How I can I get all the data because console only shows few rows, like

> # ... with 24 more rows>

For correlation, I was wondering if somehow I can show correlation in a scatterplot diagram where x-axis would be like 7 to 9 and words correlating with 7 would show near the origin and other words correlating with, say 9 would show farther to 7, in the top right corner. Like this:

system · May 17, 2022, 7:49pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.