Context: I am attempting to label subsets of my data frame using the code below so that I can then:
coll_count <- coll_count %>% mutate(I=year==1974:1979,
II=year==1980:1984,
III=year==1985:1989,
IV=year==1990:1994,
V=year==1995:1999,
VI=year==2000:2004,
VII=year==2005:2009,
VIII=year==2010:2014,
IX=year==2014:2017)
-
use pivot_longer place "I:IX" columns into one column named "epoch"
-
Run the following code below
df_word <- df_word %>% # sum repetitions by year (denominator)
group_by(epoch) %>%
mutate(sum_repet_epoch = sum(repet)) %>%
ungroup()
df_word_year <- df_word %>% # compute standardization (for AV,A,V)
group_by(epoch) %>%
mutate(sev_word = (sumAVprod.word/sum_repet_epoch),
aro_word = (sumAprod.word/sum_repet_epoch),
val_word = (sumVprod.word/sum_repet_epoch)) %>%
distinct() %>%
select(year, lemma, sev_word)
ungroup()
- partition the data frame into the epochs
sev1 <- df_word_year %>% filter(year==1974:1979) %>% arrange(sev_word) %>% slice_max(sev_word, n=100)
sev2 <- df_word_year %>% filter(year==1980:1984) %>% arrange(sev_word) %>% slice_max(sev_word, n=100)
sev3 <- df_word_year %>% filter(year==1985:1989) %>% arrange(sev_word) %>% slice_max(sev_word, n=100)
sev4 <- df_word_year %>% filter(year==1990:1994) %>% arrange(sev_word) %>% slice_max(sev_word, n=100)
sev5 <- df_word_year %>% filter(year==1995:1999) %>% arrange(sev_word) %>% slice_max(sev_word, n=100)
sev6 <- df_word_year %>% filter(year==2000:2004) %>% arrange(sev_word) %>% slice_max(sev_word, n=100)
sev7 <- df_word_year %>% filter(year==2005:2009) %>% arrange(sev_word) %>% slice_max(sev_word, n=100)
sev8 <- df_word_year %>% filter(year==2010:2014) %>% arrange(sev_word) %>% slice_max(sev_word, n=100)
sev9 <- df_word_year %>% filter(year==2014:2017) %>% arrange(sev_word) %>% slice_max(sev_word, n=100)
Problem: It almost seems like it would be quicker to section the data manually in excel and then import it but I am trying to learn to handle a bigger data frame efficiently a there are 40,597 rows.
I am essentially trying to add another column to my data frame that partitions the columns by the "year" column in the data frame (broken into the 9 groups specified above). Because I think I want to group_by using this "epoch" column afterwards, I am not immediately partitioning the initial data frame using slice etc.
Would anyone have an idea as to how to better automate this? Currently, I am getting the following error after the first code chunk: "longer object length is not a multiple of shorter object"