Hi all,
I use R for only half a year and have no basics for the programming. (total newbie)
Sincerely I appreciate any suggestions.
The list I process contains 400 elements of data frame with each data frame containing about 2000 rows. If I rbindlist() it, then it will contain 1956,970 rows.
The beauty with the "big data frame" is that I can use handy function in tidyverse directly, but have to group_by years many times for different calculations; And with lapply, it is split by "years" nicely, but I have to write lousy functions. So which is better? Is there any other measure to deal with big data frame?
(And to my surprise, the dplyr package is super fast dealing with a million rows data frame.)
My code is like the below, two methods seem comparable in terms of their speed.
elementone=data.frame(occurrence=100:109)
elementtwo=elementone
names_list=list(elementone,elementtwo)
names(names_list)=c("year1","year2")
names_list%>% #element-wise
lapply(FUN=.%>%
mutate(total=sum(.[,1]))%>%
top_n(1,wt=occurrence))%>%
do.call(rbind,.)
names_list%>% #big data.frame
data.table::rbindlist(use.names =F,idcol = "year")%>%
group_by(year)%>%
mutate(total=sum(occurrence))%>%
top_n(1,wt=occurrence)