Logic Question. Pick the smallest of two dates that must have happened after another date

mclp · July 9, 2021, 9:26pm

Hello everyone.

I am having a bit of a hard time to write some logic in R.
I dont think it to be very difficult but I am simply stuck.
Your help is very much appreciated.

I want to calculate how long it takes until it goes from "issues" to either "production" or "canceled"
This is my sample df:

issues <- c("2021-05-10 01:23:32","2021-04-28 15:10:17","2021-02-28 09:27:13","2021-03-01 00:27:17","2021-04-05 22:12:18") %>% ymd_hms()
production <- c("2021-03-08 16:54:11","2021-04-28 09:57:42", NA,"2021-06-15 17:32:17","2021-04-21 06:31:10") %>% ymd_hms()
canceled <- c("2021-05-16 05:39:02","2021-05-12 14:24:46","2021-06-13 01:14:53", NA,"2021-04-21 16:31:07")  %>% ymd_hms()

teste <- data.frame(issues, production,canceled) %>% as_tibble()

So the calculation I want to do is: (production OR canceled) - issues

Logic:
i) I have to pick the lowest date between production and canceled BUT whatever date is picked it has to have happened AFTER Issues.

For example.
on the 1st row, production happened before issue, so we should pick canceled.
on the last row, production happened after issues and before canceled, so we should pick production.

I wrote a use case when code but it didn't work.... I am a bit frustrated to be honest, so if you could help me I would really appreciate it.

Thank you everyone, let me know if something is not clear.
Cheers

lucasgraybuck · July 11, 2021, 5:42am

I think this may work, though perhaps someone has another suggestion for handling the NA cases, which made this a bit trickier:

library(lubridate)
library(tidyr)
library(dplyr)

results <- teste %>%
  mutate(time = case_when(
    # if canceled is later than issues but not production
    canceled > issues & is.na(production) ~ canceled - issues,
    canceled > issues & production < issues ~ canceled - issues,
    # if production is later than issues but not canceled
    production > issues & is.na(canceled) ~ production - issues,
    production > issues & canceled < issues ~ production - issues,
    # if both are later than issues, choose the smaller
    canceled > issues & production > issues ~ pmin(canceled - issues, production - issues),
    TRUE ~ as.difftime(0, units = "hours")
  )) %>%
  mutate(c_or_p = case_when(
    # if canceled is later than issues but not production
    canceled > issues & is.na(production) ~ "canceled",
    canceled > issues & production < issues ~ "canceled",
    # if production is later than issues but not canceled
    production > issues & is.na(canceled) ~ "production",
    production > issues & canceled < issues ~ "production",
    # if both are later than issues, choose the smaller
    canceled > issues & production > issues & canceled < production ~ "canceled",
    canceled > issues & production > issues & canceled > production ~ "production",
    TRUE ~ "issues"
  ))

# A tibble: 5 x 5
  issues              production          canceled            time             c_or_p    
  <dttm>              <dttm>              <dttm>              <drtn>           <chr>     
1 2021-05-10 01:23:32 2021-03-08 16:54:11 2021-05-16 05:39:02    6.177431 days canceled  
2 2021-04-28 15:10:17 2021-04-28 09:57:42 2021-05-12 14:24:46   13.968391 days canceled  
3 2021-02-28 09:27:13 NA                  2021-06-13 01:14:53  104.658102 days canceled  
4 2021-03-01 00:27:17 2021-06-15 17:32:17 NA                  2561.083333 days production
5 2021-04-05 22:12:18 2021-04-21 06:31:10 2021-04-21 16:31:07  368.314444 days production

mclp · July 14, 2021, 8:47am

WOW! Thank you @lucasgraybuck ! Your case when is way better than mine. I learnt something today.
If you don't mind me asking, what is the "True~" thing about? I am a bit confused.

system · July 21, 2021, 8:47am

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.