I am having a bit of a hard time to write some logic in R.
I dont think it to be very difficult but I am simply stuck.
Your help is very much appreciated.
I want to calculate how long it takes until it goes from "issues" to either "production" or "canceled"
This is my sample df:
So the calculation I want to do is: (production OR canceled) - issues
Logic:
i) I have to pick the lowest date between production and canceled BUT whatever date is picked it has to have happened AFTER Issues.
For example.
on the 1st row, production happened before issue, so we should pick canceled.
on the last row, production happened after issues and before canceled, so we should pick production.
I wrote a use case when code but it didn't work.... I am a bit frustrated to be honest, so if you could help me I would really appreciate it.
Thank you everyone, let me know if something is not clear.
Cheers
I think this may work, though perhaps someone has another suggestion for handling the NA cases, which made this a bit trickier:
library(lubridate)
library(tidyr)
library(dplyr)
results <- teste %>%
mutate(time = case_when(
# if canceled is later than issues but not production
canceled > issues & is.na(production) ~ canceled - issues,
canceled > issues & production < issues ~ canceled - issues,
# if production is later than issues but not canceled
production > issues & is.na(canceled) ~ production - issues,
production > issues & canceled < issues ~ production - issues,
# if both are later than issues, choose the smaller
canceled > issues & production > issues ~ pmin(canceled - issues, production - issues),
TRUE ~ as.difftime(0, units = "hours")
)) %>%
mutate(c_or_p = case_when(
# if canceled is later than issues but not production
canceled > issues & is.na(production) ~ "canceled",
canceled > issues & production < issues ~ "canceled",
# if production is later than issues but not canceled
production > issues & is.na(canceled) ~ "production",
production > issues & canceled < issues ~ "production",
# if both are later than issues, choose the smaller
canceled > issues & production > issues & canceled < production ~ "canceled",
canceled > issues & production > issues & canceled > production ~ "production",
TRUE ~ "issues"
))
# A tibble: 5 x 5
issues production canceled time c_or_p
<dttm> <dttm> <dttm> <drtn> <chr>
1 2021-05-10 01:23:32 2021-03-08 16:54:11 2021-05-16 05:39:02 6.177431 days canceled
2 2021-04-28 15:10:17 2021-04-28 09:57:42 2021-05-12 14:24:46 13.968391 days canceled
3 2021-02-28 09:27:13 NA 2021-06-13 01:14:53 104.658102 days canceled
4 2021-03-01 00:27:17 2021-06-15 17:32:17 NA 2561.083333 days production
5 2021-04-05 22:12:18 2021-04-21 06:31:10 2021-04-21 16:31:07 368.314444 days production
WOW! Thank you @lucasgraybuck ! Your case when is way better than mine. I learnt something today.
If you don't mind me asking, what is the "True~" thing about? I am a bit confused.