Assigning Employees Tasks

Hi,

I am looking to come up with a model which assigns Employees tasks based on their availability.

Below is the data I have in a simple format. TaskID, RequestTime, DueTime.

tibble::tribble(
   ~TaskID, ~RequestedTime, ~DueTime,
  "Task 1",        "06:30",  "10:30",
  "Task 2",        "07:00",  "07:30",
  "Task 3",        "07:30",  "09:30",
  "Task 4",        "09:30",  "10:00"
  )

What I would like is the end result, which looks at the availability of Employees. Task1 would be assigned to Employee A, whereas Task 2 would be assigned to B as A is busy doing Task1.

tibble::tribble(
   ~TaskID, ~RequestedTime, ~DueTime, ~Result,
  "Task 1",        "06:30",  "10:30",     "A",
  "Task 2",        "07:00",  "07:30",     "B",
  "Task 3",        "07:30",  "09:30",     "B",
  "Task 4",        "09:30",  "10:00",     "B"
  )

Hope this makes sense, and all guidance would be appreciated!

Thank you,

Hi,

All guidance hugely appreciated, currently googling without much luck. Not sure if a function would be the best, or another suitable package?

Hi,

This is an interesting problem and I'm happy to think about it for a bit, but have some questions fist before I try and come up with an (efficient) algorithm:

  • Do all employees start / stop at the same time and have no breaks? This would change which tasks they can be assigned to and when
  • How many employees do you have? Just 2, or does this need to work for 2+
  • What would you like to happen (if it could happen) in case there are more simultaneous tasks than employees
  • Finally, would you like to spread the load among all available employees, or have the fewest number of them complete all the tasks

What is the setting for which you are building this?

PJ

@pieterjanvc - Thank you for your help.

They both start and stop at the same time, yes. No breaks are required in the model.
We have just two employees.
If there are simultaneous tasks then I would like it to say something on the lines of 'Too Much Work'.
I would like it by default to assign to Employee A, and if they are busy return Employee B. If Employee A should be the lead, and only if Employee A is busy assign to B.

I have thought about using lag() in dplyr, however it comes a problem as it assumes Employee A is still busy even though he/she may have finished.

Any help appreciated.

df <- df %>%
mutate(PreviousDueTime= lag(DueTime)) %>%
  mutate(RuleOne = case_when(
            RequestedTime > PreviousDueTime ~ 'Employee A',
            is.na(PreviousDueTime) ~ 'Employee A',
            TRUE ~ 'Employee B''))

Hi,

I'll get back to you soon, as this scenario should not be too difficult.

Just to be sure: This is not part of a homework or course assignment right? We have certain policies here on the forum regarding that, but if it's not the case, I can just think of a solution, otherwise I'd have to stick to giving tips :slight_smile:

PJ

No, this is definitely not homework!! I wish I was back at school... those days are long gone.

2 Likes

Hi,

I spent a good amount of time coming up with some sort of algorithm, and ended up with a greedy approach that seems to do a decent job. I had a real good time coding it :slight_smile:

Ok, so you have to start with a data frame in the form like this one (here I generate a random one)

library(dplyr)
library(lubridate)

set.seed(10)

#Tasks to be done (Random generated), replace with your data
 #Name, start and end must be columns present
n = 8
tasks = data.frame(
  name = LETTERS[1:n],
  start = sample(8:14, n, replace = T)) %>% 
  mutate(end = start + sample(0:2, n, replace = T),
         start = paste0(start, sample(c(":00", ":15"), n, replace = T)),
         end = paste0(end, sample(c(":30", ":45"), n, replace = T)))

tasks
#>   name start   end
#> 1    A 10:00 12:45
#> 2    B  8:15 10:30
#> 3    C  9:15 11:30
#> 4    D 11:00 12:30
#> 5    E 14:15 16:30
#> 6    F 13:00 14:45
#> 7    G 14:00 15:45
#> 8    H 14:15 14:30

Once you have your data in a data frame called 'tasks' with these 3 columns, you can run the rest of the code:

library(dplyr)
library(lubridate)

#Convert the time to a numeric scale
tasks = tasks %>% mutate(
  taskId = 1:n,
  iStart = hm(start)$hour*60 + hm(start)$minute,
  iEnd = hm(end)$hour*60 + hm(end)$minute,
  iEnd = iEnd - min(iStart) + 1,
  iStart = iStart - min(iStart) + 1,
  duration = iEnd - iStart, employeeId = ""
)

#Quick check that input is correct...
if(any(tasks$iStart > tasks$iEnd)){
  stop("Start time must be lower than End in all cases!")
}

#Create a time table of all tasks (matrix with rows tasks and columns time)
iEndTime = max(tasks$iEnd)

timeTable = mapply(function(taskId, iStart, iEnd){
  y = rep(NA, iEndTime)
  y[iStart:(iEnd - 1)] = taskId
  y
}, tasks$taskId, tasks$iStart, tasks$iEnd) %>% t()

timeTable = rbind(timeTable, NA) #add one blank row for code to run properly (1 row is not a matrix)

#Max employees needed at any time
maxEmpl = max(apply(timeTable, 2, function(x) sum(!is.na(x))))
employees = data.frame(id = 1:maxEmpl, timeSpent = 0, nextFreeTime = 1)

#Start at time one
currentTime = 1

#Continue as long as there are unassigned tasks
while(sum(tasks$employeeId == "") != 0){
  #Find the next time when one or more tasks iStart
  nextTasks = timeTable[,currentTime]
  nextTasks = nextTasks[!is.na(nextTasks)]
  
  #Assign each task to the employee that is free 
  for(nextTask in nextTasks){
    #Employee with most time spent on tasks first filled-up
    assignTaskTo = employees %>% filter(currentTime >= nextFreeTime) %>% 
      filter(timeSpent == max(timeSpent)) %>% slice(1) %>% pull(id)
    
    #Remove the task from the time table
    timeTable = timeTable[-which((tasks %>% filter(employeeId == "") %>% pull(taskId)) == nextTask),]
    
    #Update the task and employee table
    tasks[tasks$taskId == nextTask, "employeeId"] = assignTaskTo
    employees[employees$id == assignTaskTo, c("timeSpent", "nextFreeTime")] = 
      c( employees[employees$id == assignTaskTo, "timeSpent"] +
           tasks[tasks$taskId == nextTask, "duration"], 
         tasks[tasks$taskId == nextTask, "iEnd"])
    
  }
  
  #Get iStart time of the next unassigned task(s)
  currentTime = if(sum(tasks$employeeId == "") != 0) {
    which(cumsum(colSums(timeTable, na.rm = T)) > 0)[1]
  }
}

#Updated tasks list
tasks = tasks %>% arrange(iStart, iEnd) %>% select(-iStart, -iEnd)

tasks
#>   name start   end taskId duration employeeId
#> 1    B  8:15 10:30      2      135          1
#> 2    C  9:15 11:30      3      135          2
#> 3    A 10:00 12:45      1      165          3
#> 4    D 11:00 12:30      4       90          1
#> 5    F 13:00 14:45      6      105          1
#> 6    G 14:00 15:45      7      105          3
#> 7    H 14:15 14:30      8       15          4
#> 8    E 14:15 16:30      5      135          2

As you can see, the results is 3 more columns added to your tasks data frame: a unique ID, the duration of the task in minutes, and the employee Id it's assigned to. The code generates as many employees as needed to complete all tasks, based on the max number of tasks that are going on at the same time at any given moment.

You can then use this new task data frame to generate other summaries like this one:

#Summary of tasks per employee and time spent on tasks
employeeStats = tasks %>% group_by(employeeId) %>% 
  summarise(
    startTime = start[1], endTime = end[n()],
    timeOnTasks = sum(duration),
    taskList = paste(name, collapse = ", "),
    .groups = "drop")

employeeStats
#> # A tibble: 4 x 5
#>   employeeId startTime endTime timeOnTasks taskList
#>   <chr>      <chr>     <chr>         <dbl> <chr>   
#> 1 1          8:15      14:45           330 B, D, F 
#> 2 2          9:15      16:30           270 C, E    
#> 3 3          10:00     15:45           270 A, G    
#> 4 4          14:15     14:30            15 H

The code is written so that the max number of tasks are assigned to the first employee in the list, and others get filled up once the previous ones are busy. I've put comments in the code to explain most steps, but I think running it yourself line by line might give you more insight.

Hope this helps,
PJ

Thank you PJ!
EXACTLY what I needed. I have adapted the code to match the SQL database and everything has worked.

Very insightful. Maybe explains why everyone's so busy...!

Sorry PJ - One last comment. I've been running through the code line by line today getting a understanding of how the process works.

Im 99% there, the only thing I'm slightly confused about is how the "currentTime" value updates within the loop. For example, having run one line at a time to see how each value updates, I cant seem to get "currentTime" to update to anything other than 1?

Could you help take me through where im going wrong?

Thank you for all your help so far, its been a huge help.

Hi,

The currentTime looks at the start of the next task(s) when the previous onces have been assigned. As you can see a tasks get removed from the time table (matrix) after it's been assigned

timeTable = timeTable[-which((tasks %>% filter(employeeId == "") %>% 
  pull(taskId)) == nextTask),]

To find the start of the next task, you look at which column (e.g. time point) is the first one to not have all NAs. That is the start time of the next task.

which(cumsum(colSums(timeTable, na.rm = T)) > 0)[1]

The if-statement around that last line of code is to prevent an error when the last task has been assigned because at that point the matrix is one-dimensional and becomes a vector which does not allow operations like colSums.

Hope this helps,
PJ

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.