Is there any package which has fast for loops. there are times when you need for loops and I want to know if there is any package which has for loops that are faster than the base so that I can use it at times.
I am not looking for apply family or purrr. Or how to speed up your for loops. At times when i seriously need for loops which package should i go to
If you're specifically looking for a faster implementation of for loops, doParallel and foreach are your friends.
If you simply want faster iteration, then you might consider learning more about purrr, which is the tidyverse implementation of functional programming techniques (esp. useful here would be the R4DS iteration chapter). This package has mapped functions to lots of common for loops, and so in lots of use cases can make your code more readable and less error-prone.
I think the current way for fast forloops is writing relevant code in faster languages such as c++ (via the Rcpp-pkg).
Of course one can try to use already vectorized functions (those obviously work on vectors, but are also already written in c, Fortran, c++ or other fast languages behind the scene) or fast packages like data.table, which were written with optimal speed and memory handling considerations in mind.
I'm not sure that purrr actually speeds up iteration in general, though it discourages some common slowdowns (such as growing a vector one item at a time). In general, you shouldn't avoid for loops because of speed. From the R4DS chapter that @ewen linked to:
If your for loop is slow, then you could try using the profiler (under the "Profile" menu in RStudio). If there's no obvious gains to be made there, then you are likely looking at something like Rcpp for faster code or foreach for cramming more of it through at a time.
The OP was looking for faster for loops, and I specified a resource that looked to address the 'speed' aspect of the query. However, I wondered if the for loop notation was necessarily crucial, and so referred to an alternative iteration implementation - I realise that my wording looked like I was suggesting purrr as a match in the performance stakes. Clarity and less opportunity for errors def the wins
If you need to speed up your for loops, your only real choices (in terms of packages) are dropping down to C++ with Rcpp or enabling parallel computation with foreach and doParallel. However, if the problem you are solving requires the use of for loops, it's likely you don't have a problem that could easily be parallelized.
Frankly, most issues with speed in R are due to poor programming techniques and not an issue with loops themselves. Familiarize yourself with best practices in R, and if you are indeed following them all and your code is too slow, then drop down to C++.
Here's a post I wrote on the benefits of optimizing your for loops in base R:
Here's code that benchmarks the time to run an empty loop one million times, and compares that to the time to loop with a single assignment in that million-iteration loop:
nums <- rep(1, 1e6)
j <- 1L
microbenchmark::microbenchmark(
{
for (i in nums) {
}
},
{
for (i in nums) {
j <- 1L
}
}
)
On my laptop, I get the following results:
Unit: milliseconds
expr min lq mean median uq max neval
{ for (i in nums) { } } 13.52649 14.15539 15.31180 14.65343 16.01338 22.20064 100
{ for (i in nums) { j <- 1L } } 55.51761 58.48899 63.65749 61.59985 67.21565 97.32986 100
Looking at the median time, the loop overhead is about one third of the time that the single assignment takes. If you are doing any worthwhile calculation in the loop, then the loop functionality itself is not the slow part. Hence why people are pointing at alternative methods of code optimization: if you want a "fast for loop", use base::for. No package is going to be capable of reducing the loop overhead enough to matter without other optimizations.
I have written a program that pics up 50 to 100 xlsb files ;where file starts from 3 row, from a folder and checks their column. Organize them. Compiler them into 4 or 5 xlsx files; if data is large.
It takes around 10 to 20 minutes to run even though i am using data.table and apply functions. And three are times when i just cant avoid loops.
And on month end that could take more than 30 minutes to nearly 45 minutes. Every second that you talking about actually counts. If i could reduce it even to half. I would love it.
I understand your concern but sometime is just necessary to use loops.I don't deny there are limitations i just wanted to know if the is any package that does the trick. Now i know there isn't simple.
Dealing with xlsb/xlsx files could involve an unavoidable overhead, particularly if they are large. If possible you could test using other file formats or a database to organise the data. Concerns about loops may then not be important.
this is one of the process in my company that I have recently automated using RInno an R package to build desktop application. some of my colleagues use it everyday. They asked me if I could faster the speed. I changed the package from openxlsx to RODBC to directly connect to xlsb files. and I used data.table instead. this was maximum I could optimize. I thought for loops are something that are consuming more time. that is the only thing left to optimize.
but it's all good. I got the answer there is no package that can optimize for loops and I am fine with it.
Now if you tell me any package where I can read xlsb files easily and faster that could help. but that's okay too...
I really appreciate your reply and thank you very much for responding.
I didn't think readxl could read xlsb files, but I might be wrong. As a binary format it is less straightforward to read and far fewer packages deal with it.
@hoelk, @martin.R and @DaveRGP thanks for pointing out resources. I appreciate your advice I just don't think learning parallel computation is worth my time for it. I have different responsibilities and speed is the least of it's concern. I can do away with slow code if it means it saves my time. But then again I wish it would be fast enough that I don't have to grab a cup of cofee and find a friend to talk to untill it finishes.
But then again I am okay with things as they are. and thanks for responding and it you find any package that reads xlsb because in all the organizations that I have worked xlsb is the new trend. because they are smaller.
From how you describe your problem there should be verry little effort to convert your loop to foreach. You probably just have to add 3 lines of code, and the speed gain can be dramatic depending on the number of cores and what really is slowing you down (whether its processing power or hard drive usage)