I am executing a for loop code for 100k+ rows and 50+ columns to apply a conditional formula across all element of the dataframe. Is there a logical way to make the code run faster
for(i in 1:nrow(dataframe)){
for(j in 4:column_number){
if(j <= min(dataframe$index[i] +6,column_number)){
dataframe[i,j] <- round(dataframe[i,j]*dataframe[i,col_12],0)
}else{
dataframe[i,j] <- 0
}
}
}
I would suggest to explore using the foreach package registered against a parallel backend (e.g. doMC on a multi core PC or doMPI for distributed computing on an HPC cluster). Foreach will already give you a performance boost compared to ordinary for loops. After that you should see an almost linear increase in performance the more cores you use (twice the cores, twice as fast).
From a code perspective I would be careful to make sure you are not overwriting elements of dataframe[i,j] while running the loop (reason is that in the if clause you depend on dataframe[i,col_12]). After all your input variable is the same as the output.