R studio run really slow

mingmei · April 16, 2019, 3:30am

Hi,

For anyone who can help me out! I am struggling with this situation.

Somehow my R studio just runs the dplyr package with function 'summarise_at' really slow. The problem is solved for two weeks after I update the R studio but it just comes back again. I try to reinstall R and R studio as well as reinstall my computer. Nothing can really help. The function still works but with an extremely long time.

Thanks.

mingmei · April 16, 2019, 3:39am

Forgot to say. This thing also happens for my aggregate function. I try to use the aggregate function to avoid the summarise_at. But it runs really really slow.

andresrcs · April 16, 2019, 3:41am

Hi Mingmei, welcome!
Do you have the same problem if you run your code in R (not in RStudio)? What are your versions of R, RStudio, dplyr and OS? Any chance that you could make a minimal REPRoducible EXample (reprex)? A reprex makes it much easier for others to understand your issue and figure out how to help.

If you've never heard of a reprex before, you might want to start by reading this FAQ:

FAQ: How to do a minimal reproducible example ( reprex ) for beginners Guides & FAQs

A minimal reproducible example consists of the following items: A minimal dataset, necessary to reproduce the issue The minimal runnable code necessary to reproduce the issue, which can be run on the given dataset, and including the necessary information on the used packages. Let's quickly go over each one of these with examples: Minimal Dataset (Sample Data) You need to provide a data frame that is small enough to be (reasonably) pasted on a post, but big enough to reproduce your issue. Let's say, as an example, that you are working with the iris data frame head(iris) #> Sepal.Length Sepal.Width Petal.Length Petal.Width Species #> 1 5.1 3.5 1.4 0.…

mingmei · April 16, 2019, 3:57am

Thank you so much for your advice. Briefly introduction, I do test it on R also and it runs really slow as well. But last time, the problem is solved by updating R studio no the R, that's why I come here for help. My version of R is 3.5.3 and the version of my R studio is 1.2.1335. I use a Windows 10 computer and last week I run the Novabench score test it's still fine.
I try to create some reproducible example with the same dimension as the data I use. So how the simple example (mydata and test) doesn't have a problem but with the real data RX part. The problem still exits. I wish I could upload the data online for you to check since I'm desperate to fix this problem. LOL. Please let me know if I could or maybe paste the data on the forum. Thank you.

mydata<-as.data.frame(replicate(100,rnorm(100000,0,1)))

mydata$group=rpois(100000,5)
mydata$group.2=rpois(100000,500)

test <- mydata %>%
group_by(group,group.2) %>%
summarise_at(vars(V1:V10),sum)%>%
ungroup

RX <- RX %>%
group_by(DUPERSID,EVNTIDX) %>%
summarise_at(vars(RXXP[i]),sum) %>%
ungroup

mingmei · April 16, 2019, 4:10am

New updates, to 100% sure match the two cases. I numeric the DUPERSID and EVNTIDX from factor to number and the problem is solved. However, I don't think this is a desirable way to do that. BTW, my DUPERSID has a format like 60001101 and EVNTIDX has a similar but longer format 600011011361.

EconomiCurtis · April 16, 2019, 6:17pm

On my desktop, creating mydata does feel a little slow. It has 10mil values in it.

Have a look at the reprex below. (On making a good reproducible example, note that it's important to include any libraries you call upon. Also note I needed to hide the line referring to RX object since it was never set-up before it was called. =)

library(dplyr)
system.time({
  mydata<-as.data.frame(replicate(100,rnorm(100000,0,1)))
})
#>    user  system elapsed 
#>   0.779   0.101   0.886

system.time({
  mydata$group=rpois(100000,5)
  mydata$group.2=rpois(100000,500)
})
#>    user  system elapsed 
#>   0.010   0.001   0.011

system.time({
test <- mydata %>%
  group_by(group,group.2) %>%
  summarise_at(vars(V1:V10),sum)%>%
  ungroup
})
#>    user  system elapsed 
#>   0.023   0.001   0.024

# RX <- RX %>%
#   group_by(DUPERSID,EVNTIDX) %>%
#   summarise_at(vars(RXXP[i]),sum) %>%
#   ungroup

^{Created on 2019-04-16 by the reprex package (v0.2.1)}

In terms of speeding up R,

John Mount explores some options here:

mingmei · April 17, 2019, 5:52am

Sorry for late respoding and thank you for the reponse. Acutally, I figured the reason out. The thing is that my original data (real data:RX from MEPS dataset) has numeric data with labels. Eg:
AMOUNT PAID, PRIVATE INSURANCE (IMPUTED)
[1] 0 0 0 0 0 0 0 0 0 0

If I move the label and trun everything into numberic value as the example I created, the computational time is back to normal.

mara · April 17, 2019, 11:18am

If your question's been answered (even if by you), would you mind choosing a solution? (See FAQ below for how).

Having questions checked as resolved makes it a bit easier to navigate the site visually and see which threads still need help.

Thanks

system · April 24, 2019, 11:18am

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.