How to calculate the Frequency of specific words across ALL rows?

Rapidz · September 23, 2022, 2:39am

I am using this dataset UCI Machine Learning Repository: Eco-hotel Data Set

I am trying to figure out how to count the frequency of certain words like "room" or "vacation" within each row. I figured out the code to make it work for columns, but I need it for rows.

There are 16 columns in this dataset, but I need the frequency of certain words for each row. If anyone could lend some insights, it would be greatly appreciated.

Here is my code:
library(tidyverse)
EcoResort %>%
summarize(across(everything(), ~ sum(str_detect(., 'room'))))

melih_guven · September 23, 2022, 8:29am

Hello, i quess rowwise and c_across may help you.
i have created and imaginary dataset for repex.

ps: assuming the dataset has the first column is as id, and the other columns are some strings like reviews.

library(tidyverse)


review1 <- c( "clean room nice vacation", "empty mini bar", "nice hotel", "bad vacation", "nice view in the room")
review2 <- c( "tidy room", "pool is nice", "nice vacation", "bad service", "awful breakfast")

df <- tibble(reviewid = 1:5,
       r1 = review1 ,
       r2 = review2
       )



word_freq_per_row <- function(df, query){
    if(!is.character(query)){stop("query must be charcter")}
    df %>% 
    rowwise(1) %>% 
    summarize(across(everything(), ~ sum(str_detect(., query)))) %>% 
    mutate(num_of_occurences = sum(c_across())) %>% 
    select(reviewid, num_of_occurences)
}

word_freq_per_row(df, "room")
    # # A tibble: 5 x 2
    # # Groups:   reviewid [5]
    # reviewid num_of_occurences
    # <int>             <int>
# 1        1                 2
# 2        2                 0
# 3        3                 0
# 4        4                 0
# 5        5                 1


word_freq_per_row(df, "nice")
        # # A tibble: 5 x 2
        # # Groups:   reviewid [5]
        # reviewid num_of_occurences
        # <int>             <int>
# 1        1                 1
# 2        2                 1
# 3        3                 2
# 4        4                 0
# 5        5                 1

system · October 14, 2022, 8:29am

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.