Hello!
I've been down a rabbit hole on this for hours and have made no progress (pun intended) so I thought I would see if anyone has ideas for this.
I work with fairly large data frames (>100k rows) and am often running dplyr pipelines for data cleaning that can take a while to run. When I'm running for-loops, I have progress bars set up and it has helped a ton. I am now trying to figure out if there is a way I can get a progress bar integrated in with my dplyr pipelines so that I can see how much longer it needs to run.
I will note -- I do NOT understand functions super well and I am honestly still relatively new to tidyverse in general, so more explanation is better!
As an example, I'll be using the flights data from the nycflights13 package.
The code I have written here is complete nonsense.
I just tried to add as much stuff as I could so it takes a second to run instead of being virtually instantaneous so that a progress bar would actually like... work.
flights %>%
mutate(departure = make_datetime(year, month, day, hour, minute)) %>%
arrange(desc(departure)) %>%
mutate(Q1_flag = ifelse(departure %within% interval(start=ymd("2013-01-01"),end=ymd("2013-03-31")),1,0)) %>%
mutate(Q2_flag = ifelse(departure %within% interval(start=ymd("2013-04-01"),end=ymd("2013-06-30")),1,0)) %>%
mutate(Q3_flag = ifelse(departure %within% interval(start=ymd("2013-07-01"),end=ymd("2013-09-30")),1,0)) %>%
mutate(Q4_flag = ifelse(departure %within% interval(start=ymd("2013-10-01"),end=ymd("2013-12-31")),1,0)) %>%
mutate(month_name = case_when(
month==1 ~ "January",
month==2 ~ "February",
month==3 ~ "March",
month==4 ~ "April",
month==5 ~ "May",
month==6 ~ "June",
month==7 ~ "July",
month==8 ~ "August",
month==9 ~ "September",
month==10 ~ "October",
month==11 ~ "November",
month==12 ~ "December"
)) %>%
mutate(sun_sign = case_when(
departure %within% interval(start=ymd("2013-01-20"),end=ymd("2013-02-18")) ~ "Aquarius",
departure %within% interval(start=ymd("2013-02-19"),end=ymd("2013-03-20")) ~ "Pisces",
departure %within% interval(start=ymd("2013-03-21"),end=ymd("2013-04-19")) ~ "Aries",
departure %within% interval(start=ymd("2013-04-20"),end=ymd("2013-05-20")) ~ "Taurus",
departure %within% interval(start=ymd("2013-05-21"),end=ymd("2013-06-20")) ~ "Gemini",
departure %within% interval(start=ymd("2013-06-21"),end=ymd("2013-07-22")) ~ "Cancer",
departure %within% interval(start=ymd("2013-07-23"),end=ymd("2013-08-22")) ~ "Leo",
departure %within% interval(start=ymd("2013-09-23"),end=ymd("2013-10-22")) ~ "Virgo",
departure %within% interval(start=ymd("2013-10-23"),end=ymd("2013-11-21")) ~ "Libra",
departure %within% interval(start=ymd("2013-11-22"),end=ymd("2013-12-21")) ~ "Sagittarius",
TRUE ~ "Capricorn",
)) %>%
filter(carrier %in% c("AA","DL","UA","WN")) %>%
arrange(carrier) %>%
filter(dest %in% c("BUR","LAX","SNA","LGB")) %>%
filter(origin == "JFK") %>%
filter(dep_delay<0) %>%
filter(dep_time<1200 & dep_time>600) -> df
Now obviously this actual pipeline only takes like a second to run -- but humor me and pretend it's taking several minutes.
What I'm trying to figure out is how to incorporate a progress bar into this pipeline so that as it's running, I can see something like this:
|========= > ---------------| 48%
Is this possible? Does this exist?
Thank you in advance!