Error when using mutate to label sets of rows in dataset

nbaes · June 8, 2022, 5:57am

Context: I am attempting to label subsets of my data frame using the code below so that I can then:

coll_count <- coll_count %>% mutate(I=year==1974:1979,
                                      II=year==1980:1984,
                                      III=year==1985:1989,
                                      IV=year==1990:1994,
                                      V=year==1995:1999,
                                      VI=year==2000:2004,
                                      VII=year==2005:2009,
                                      VIII=year==2010:2014,
                                      IX=year==2014:2017)

use pivot_longer place "I:IX" columns into one column named "epoch"
Run the following code below

df_word <- df_word %>% # sum repetitions by year (denominator)  
  group_by(epoch) %>% 
  mutate(sum_repet_epoch = sum(repet)) %>% 
  ungroup()

df_word_year <- df_word %>% # compute standardization (for AV,A,V)
  group_by(epoch) %>%
  mutate(sev_word = (sumAVprod.word/sum_repet_epoch),
         aro_word = (sumAprod.word/sum_repet_epoch),  
         val_word = (sumVprod.word/sum_repet_epoch)) %>%
  distinct() %>%
  select(year, lemma, sev_word)
  ungroup()

partition the data frame into the epochs

sev1 <- df_word_year %>% filter(year==1974:1979) %>% arrange(sev_word) %>% slice_max(sev_word, n=100)
sev2 <- df_word_year %>% filter(year==1980:1984) %>% arrange(sev_word) %>% slice_max(sev_word, n=100)
sev3 <- df_word_year %>% filter(year==1985:1989) %>% arrange(sev_word) %>% slice_max(sev_word, n=100) 
sev4 <- df_word_year %>% filter(year==1990:1994) %>% arrange(sev_word) %>% slice_max(sev_word, n=100)
sev5 <- df_word_year %>% filter(year==1995:1999) %>% arrange(sev_word) %>% slice_max(sev_word, n=100)
sev6 <- df_word_year %>% filter(year==2000:2004) %>% arrange(sev_word) %>% slice_max(sev_word, n=100)
sev7 <- df_word_year %>% filter(year==2005:2009) %>% arrange(sev_word) %>% slice_max(sev_word, n=100)
sev8 <- df_word_year %>% filter(year==2010:2014) %>% arrange(sev_word) %>% slice_max(sev_word, n=100)
sev9 <- df_word_year %>% filter(year==2014:2017) %>% arrange(sev_word) %>% slice_max(sev_word, n=100)

Problem: It almost seems like it would be quicker to section the data manually in excel and then import it but I am trying to learn to handle a bigger data frame efficiently a there are 40,597 rows.

I am essentially trying to add another column to my data frame that partitions the columns by the "year" column in the data frame (broken into the 9 groups specified above). Because I think I want to group_by using this "epoch" column afterwards, I am not immediately partitioning the initial data frame using slice etc.

Would anyone have an idea as to how to better automate this? Currently, I am getting the following error after the first code chunk: "longer object length is not a multiple of shorter object"

williaml · June 8, 2022, 6:31am

nbaes:

mutate(I=year==1974:1979,
                                      II=year==1980:1984,
                                      III=year==1985:1989,
                                      IV=year==1990:1994,
                                      V=year==1995:1999,
                                      VI=year==2000:2004,
                                      VII=year==2005:2009,
                                      VIII=year==2010:2014,
                                      IX=year==2014:2017)

Hi, imagine this bit doesn't work. you probably want to use dplyr::case_when() for this. For the rest of your problem, it would be good if you provided a reproducible example.

nbaes · June 8, 2022, 11:42am

Thanks. I am looking at how to apply case_when() but it is a little confusing. Would you be able to give an example on how to do the below/correct the below attempt?

coll_count <- coll_count %>% mutate(I = case_when(year==1974:1979))

At present, I receive the following error:
**longer object length is not a multiple of shorter object lengthError in mutate():
! Problem while computing I = case_when(year == 1974:1979).
Caused by error in case_when():
! Case 1 (year == 1974:1979) must be a two-sided formula, not a logical vector.
Backtrace:

coll_count %>% mutate(I = case_when(year == 1974:1979))
dplyr::case_when(year == 1974:1979)
Error in mutate(., I = case_when(year == 1974:1979)) : Caused by error in case_when(): ! Case 1 (year == 1974:1979) must be a two-sided formula, not a logical vector.**

williaml · June 8, 2022, 12:09pm

Hi, can you provide a reproducible example of your dataset and perhaps an example of the output you are after?

FAQ: How to do a minimal reproducible example ( reprex ) for beginners Guides & FAQs

A minimal reproducible example consists of the following items: A minimal dataset, necessary to reproduce the issue The minimal runnable code necessary to reproduce the issue, which can be run on the given dataset, and including the necessary information on the used packages. Let's quickly go over each one of these with examples: Minimal Dataset (Sample Data) You need to provide a data frame that is small enough to be (reasonably) pasted on a post, but big enough to reproduce your issue. Let's say, as an example, that you are working with the iris data frame head(iris) #> Sepal.Length Sepal.Width Petal.Length Petal.Width Species #> 1 5.1 3.5 1.4 0.…

This is a very basic example anyway:

library(tidyverse)

tibble(year = seq(1974, 2009, 1)) %>% 
  mutate(x = case_when(year %in% 1974:1979 ~ "I",
                     year %in% 1980:1984  ~ "II",
                     TRUE ~ "III"))

# A tibble: 36 x 2
    year x    
   <dbl> <chr>
 1  1974 I    
 2  1975 I    
 3  1976 I    
 4  1977 I    
 5  1978 I    
 6  1979 I    
 7  1980 II   
 8  1981 II   
 9  1982 II   
10  1983 II   
# ... with 26 more rows

system · June 29, 2022, 12:10pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.