Computation-heavy code causes Win10 computer to crash when knitting

Hello all,

I am attempting to knit my markdown file to html, however the process hangs on certain computation-heavy code chunks, leading to either my computer totally freezing, or a complete shutdown of my computer.

I am running the newest version of rstudio (64-bit 4.2.2) and per a recommendation in another forum, I chose to render using software instead of desktop on launch.

I'm currently going through my markdown file and using the rm() command to keep my environment as bare-bones as possible, but I continue to run into issues when running computation-heavy commands.

Is this a common problem? Additionally, what information can I provide to help the community guide me towards an answer?

For reference, I'm working with a dataframe that includes about 5.5 million records and running commands like the following:

may_to_october_behavior <- trips_w_stations %>% 
  filter(month == 'May'|
           month == 'June'|
           month == 'July'|
           month == 'August'|
           month == 'September'|
           month == 'October') %>% 
  rowwise() %>% 
  mutate(trip_distance = distHaversine(c(start_lng, start_lat), c(end_lng, end_lat), 
                                       r=6378137)) %>% 
  filter(trip_distance > 0) %>% 
  group_by(customer_type, bike_type) %>% 
  summarize(mean_trip_length = mean(trip_length),
            mean_distance_travelled_m = mean(trip_distance),
            ride_count = n_distinct(ride_id)) %>% 
  mutate(avg_mph = (mean_distance_travelled_m/mean_trip_length)*2.2369)


trips_2 <- trips_1 %>% 
     weekday = wday(started_at, label = T, abbr = F),
     month = month(started_at, label = T, abbr =F),
     week = strftime(started_at, format = "%V"),
     day = day(started_at),
     start_hour = hour(started_at),
     trip_length = as.numeric(difftime(ended_at, started_at,units = "secs"))

Do you mean the code execution crashes only when knitting but all is good when running the code interactively? That would be weird but you could save intermidate output to work around the problem.

Although, I think you are simply lacking enough computing resources to process such large data the way you are doing it. If this is the case, you could explore using "on-disk" approaches like arrow, duckdb, or an RDBMS

BTW, just to clarify, this is your R version, not your RStudio version.

Thank you for the reply.

Yes, I can run the code interactively, but seems to run into an issue only during the knitting process.

Can you elaborate a bit on how to save intermediate output? Would this entail dropping sequences of my code into new markdown files, then knitting each?

Given your confidence that this may be a computer resource issue, can you point me in the direction of resources to learn about "on-disk" solutions?

Thanks again!

One strategy for optimizing the knitting of a file is to pre-process large data transformations that don't need to be done at knitting time. For example, you might prep a dataframe in a code chunk that you don't run during knitting; in that chunk, save the generated dataframe to disk with saveRDS(). Then, when you need the dataframe in the code you run at knit-time, use readRDS() to read the prepped dataframe.

We teach our students (esp. with group work) not to keep repeating data prep that only needs to be done once. Prep your dataframes and simply load time into the notebooks that need them.

You can use the caching feature on RMarkdown

Another common workflow is to keep separate R scripts for data processing that output already cleaned and summarized data to data files that later get loaded into your Rmd file.

I'm not aware of a broad enough resource that addresses all available options, I would say Google a little bit, choose the method you fill more familiar with and then ask here for specifics.

1 Like

Caching solved my issue completely. Thank you so much for pointing me in the right direction.

There's been ups and downs as I've become more familiar with R, but having a 2000+ line markdown file that I simply couldn't publish was very demoralizing.

Thank you very much for your suggestion. I will keep this in mind for future projects.

The only reason I didn't pursue this solution (even though I plan to take this approach in the future) is because I'm planning on making this a part of my portfolio and want to show every step of my work.

I'm not judging but without context, reading this feels wrong. I would expect this file to be very hard to maintain and troubleshoot, I think it is better to split your project into smaller purpose-specific scripts.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.