I have a dataset which has clock in and clock out times of employees. Some of these employees have clocked in and clocked out on same day, some have clocked in on one day and clocked out on another day (for example clocked in today and clocked out tomorrow) and some have forgot to either entry their clock in or clock out time. I need to separate these 3 cases into three different datasets and calculate the number of hours an employee has worked in a day. Below is how the dataset looks like:
| Id | Date | Time | Time_event_type |
|----|:---------:|--------------------:|-----------------|
| 82 | 3/11/2019 | 1899-12-31 10:41:51 | Clock-In |
| 82 | 3/11/2019 | 1899-12-31 20:15:16 | Clock-Out |
| 82 | 3/12/2019 | 1899-12-31 10:51:12 | Clock-In |
| 82 | 3/12/2019 | 1899-12-31 20:10:07 | Clock-Out |
| 19 | 3/11/2019 | 1899-12-31 07:00:00 | Clock-Out |
| 19 | 3/12/2019 | 1899-12-31 05:45:57 | Clock-In |
| 19 | 3/12/2019 | 1899-12-31 18:05:15 | Clock-Out |
| 19 | 3/13/2019 | 1899-12-31 16:23:09 | Clock-In |
| 19 | 3/14/2019 | 1899-12-31 18:05:15 | Clock-Out |
| 19 | 3/18/2019 | 1899-12-31 17:44:23 | Clock-In |
| 68 | 3/12/2019 | 1899-12-31 16:52:39 | Clock-In |
| 68 | 3/13/2019 | 1899-12-31 05:00:50 | Clock-Out |
| 68 | 3/13/2019 | 1899-12-31 17:03:01 | Clock-In |
| 68 | 3/14/2019 | 1899-12-31 04:53:25 | Clock-Out |
| 68 | 3/14/2019 | 1899-12-31 17:02:52 | Clock-In |
| 68 | 3/15/2019 | 1899-12-31 05:20:22 | Clock-Out |
which would be:
Id = c("82","82","82","82","19","19","19","19","19","19","68","68","68","68","68","68")
Date = c("3/11/2019","3/11/2019","3/12/2019","3/12/2019","3/11/2019","3/12/2019","3/12/2019",
"3/13/2019","3/14/2019","3/18/2019","3/12/2019","3/13/2019","3/13/2019","3/14/2019","3/14/2019","3/15/2019")
Time = as.POSIXct(c("1899-12-31 10:41:51","1899-12-31 20:15:16","1899-12-31 10:51:12","1899-12-31 20:10:07",
"1899-12-31 07:00:00","1899-12-31 05:45:57","1899-12-31 18:05:15","1899-12-31 16:23:09","1899-12-31 18:05:15",
"1899-12-31 17:44:23","1899-12-31 16:52:39","1899-12-31 05:00:50","1899-12-31 17:03:01","1899-12-31 04:53:25",
"1899-12-31 17:02:52","1899-12-31 05:20:22"))
Time_event_type = c("Clock-In","Clock-Out","Clock-In","Clock-Out","Clock-Out","Clock-In", "Clock-Out",
"Clock-In","Clock-Out","Clock-In","Clock-In", "Clock-Out","Clock-In","Clock-Out","Clock-In","Clock-Out")
df2=data.frame(Id,Date,Time,Time_event_type)
df2$Date= as.Date(df2$Date, format = "%m/%d/%Y")
The output should be as follows:
1st case: When the employees clock in and clock out on same date (grouped on Id and Date)
| Id | Date | Time | Time_event_type | Diff(In seconds) |
|----|:---------:|--------------------:|-----------------|------------------|
| 82 | 3/11/2019 | 1899-12-31 10:41:51 | Clock-In | NA |
| 82 | 3/11/2019 | 1899-12-31 20:15:16 | Clock-Out | 34405 |
| 82 | 3/12/2019 | 1899-12-31 10:51:12 | Clock-In | NA |
| 82 | 3/12/2019 | 1899-12-31 20:10:07 | Clock-Out | 33535 |
| 19 | 3/12/2019 | 1899-12-31 05:45:57 | Clock-In | NA |
| 19 | 3/12/2019 | 1899-12-31 18:05:15 | Clock-Out | 44358 |
2nd Case: When the employee clocks in and clocks out on different dates
| Id | Date | Time | Time_event_type | Diff (In Seconds) |
|----|:---------:|--------------------:|-----------------|-------------------|
| 19 | 3/13/2019 | 1899-12-31 16:23:09 | Clock-In | NA |
| 19 | 3/14/2019 | 1899-12-31 18:05:15 | Clock-Out | 6126 |
| 68 | 3/12/2019 | 1899-12-31 16:52:39 | Clock-In | NA |
| 68 | 3/13/2019 | 1899-12-31 05:00:50 | Clock-Out | 43691 |
| 68 | 3/13/2019 | 1899-12-31 17:03:01 | Clock-In | NA |
| 68 | 3/14/2019 | 1899-12-31 04:53:25 | Clock-Out | 42624 |
| 68 | 3/14/2019 | 1899-12-31 17:02:52 | Clock-In | NA |
| 68 | 3/15/2019 | 1899-12-31 05:20:22 | Clock-Out | 44250 |
3rd Case:
When the employee forgets to either clock in or clock out (I don't need to calculate difference in this case)
| Id | Date | Time | Time_event_type |
|----|:---------:|--------------------:|-----------------|
| 19 | 3/11/2019 | 1899-12-31 07:00:00 | Clock-Out |
| 19 | 3/18/2019 | 1899-12-31 17:44:23 | Clock-In |
In order to get the 1st case I tried the following code, but it gives all the duplicate records, even the ones on different dates.
df2_same=df2 %>% group_by(Id,Date) %>% mutate(dupe = n()>1)
For the 2nd case, I tried
df2_diff=df2[!(duplicated(df2[c("Id","Date")]) | duplicated(df2[c("Id","Date")], fromLast = TRUE)), ]
but this also lists out the 3rd case, which I want separately. Can someone please help me in getting the desired outputs?