Make this code execute faster

piku9290dgp · October 3, 2021, 8:44pm

Hello Everyone,

I am executing a for loop code for 100k+ rows and 50+ columns to apply a conditional formula across all element of the dataframe. Is there a logical way to make the code run faster

for(i in 1:nrow(dataframe)){
for(j in 4:column_number){
if(j <= min(dataframe$index[i] +6,column_number)){
dataframe[i,j] <- round(dataframe[i,j]*dataframe[i,col_12],0)
}else{
dataframe[i,j] <- 0
}
}
}

Thanks

williaml · October 4, 2021, 4:33am

Can you provide a reproducible example of what this dataframe might look like?

Stargazer_ch · October 4, 2021, 8:45am

I would suggest to explore using the foreach package registered against a parallel backend (e.g. doMC on a multi core PC or doMPI for distributed computing on an HPC cluster). Foreach will already give you a performance boost compared to ordinary for loops. After that you should see an almost linear increase in performance the more cores you use (twice the cores, twice as fast).

From a code perspective I would be careful to make sure you are not overwriting elements of dataframe[i,j] while running the loop (reason is that in the if clause you depend on dataframe[i,col_12]). After all your input variable is the same as the output.

system · October 25, 2021, 8:46am

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.