Sampling using tidyverse sample_n and determining the number of samples by variable

Hi Guys,
I am trying to prepare a function/loop which would sample a number of rows based on conditions.
I have two dataframe the first one is a dataframe with day dates, people and their roles.

library(bizdays)
library(tidyverse)
from_date <- "2023-01-01"
to_date <- "2023-12-31"
create.calendar(name = "cal23", holidays = holidays, weekdays = c("saturday", "sunday"))
daty2023 <- bizseq(from_date, to_date, "cal23") %>% as.data.frame()
daty2023 %>% mutate(resource = "Michael")
daty2023 <- daty2023 %>% mutate(role = "Manager")
daty20231 <- daty2023 %>% mutate(resource = "Jose", role = "Manager")
daty20232 <- daty2023 %>% mutate(resource = "Jakub", role = "Researcher")
daty2023 <- rbind(daty2023, daty20231, daty20232)
colnames(daty2023) <- c("date", "resource", "role")
daty2023$date <-as.Date(daty2023$date)

The second dataframe is a tasklist with names of tasks, roles needed for these tasks, a period in which they should be performed and a number of mandays which should be assigned to each task in a given period.

tasklist <- data.frame(task = c("Task1", "Task2", "Task3", "Task4"), sdate = c("2023-01-12", "2023-04-18", "2023-08-25", "2023-11-03"), fdate = c("2023-01-12", "2023-04-25","2023-09-26","2023-12-29"), resource = c("Manager", "Manager","Researcher", "Manager") 
tasklist <- tasklist %>% mutate(mandays = c("1", "10","20","30"))
tasklist$sdate <- as.Date(tasklist$sdate)
tasklist$fdate <- as.Date(tasklist$fdate)

What I am trying to do now is to sample dates in which people are available from the first data frame and assign them to tasks in the second data frame based on conditions. To do that I use the sample_n package.

df = data.frame()
for (i in 1:nrow(tasklist)) {
output = daty2023 %>% filter(date >= tasklist$sdate[i] & date <= tasklist$fdate[i] & role == tasklist$resource[i]) %>% sample_n(., tasklist$mandays[i]) %>% mutate(Task = tasklist$task[i])
df = rbind(df, output)
}

This loop allows me to match a person's availability with project task time slot and a role. However I started to wonder, whether it would be possible, and if yes how it would be possible, to add additional condition which would assign a given amount of mandays to a person. The loop from above produces 61 mandays assigned to people. I am thinking how to make sure that i.e. 40 mandays at least are assigned to Michael, 11 to Jose and 10 to Jakub. I reckon I could probably play with the weight argument but I do not see how it could give me exactly the split I am looking for.

All the best!

I think what you wish to do is non trivial and would require more lower level programming.
You would need to only ever sample 1 row at a time; recording somewhere how many of a given resources allocations have been made , and if they have exceeded their personal limit to exclude them from further sampling.

Thank you, @nirgrahamuk for you reply.

I was thinking maybe about creating additional data frame with indicated number of days I want to assign to a person - like this one:

split <- data.frame(person = c("Jose","Michael"), role = c("Manager", "Manager"), days = c(12,5))

And including this information in the loop:

daty2023 %>% filter(date >= tasklist$sdate[4] & date <= tasklist$fdate[4] & role == tasklist$resource[4] & resource == split$person[1]) %>% sample_n(., split$days[1]) %>% mutate(Task = tasklist$task[4])

but I think it works only if I put the number of rows manually and when the data frame size is different it may not work.
Best,

After I wrote you my thoughts, I tinkered with a possible solution; it seems to work if the utilisation limits are set so as allocation is possible.

library(bizdays)
library(tidyverse)
from_date <- "2023-01-01"
to_date <- "2023-12-31"
create.calendar(name = "cal23", holidays = integer(0), weekdays = c("saturday", "sunday"))
daty2023 <- bizseq(from_date, to_date, "cal23") %>% as.data.frame()
daty2023 <- daty2023 %>% mutate(resource = "Michael")
daty2023 <- daty2023 %>% mutate(role = "Manager")
daty20231 <- daty2023 %>% mutate(resource = "Jose", role = "Manager")
daty20232 <- daty2023 %>% mutate(resource = "Jakub", role = "Researcher")
daty2023 <- rbind(daty2023, daty20231, daty20232)
colnames(daty2023) <- c("date", "resource", "role")
daty2023$date <-as.Date(daty2023$date)


tasklist <- data.frame(
  task = c("Task1", "Task2", "Task3", "Task4"),
  sdate = c("2023-01-12", "2023-04-18", "2023-08-25", "2023-11-03"),
  fdate = c("2023-01-12", "2023-04-25", "2023-09-26", "2023-12-29"),
  resource = c("Manager", "Manager", "Researcher", "Manager")
)

tasklist <- tasklist %>% mutate(mandays = as.integer(c("1", "10", "20", "30")))
tasklist$sdate <- as.Date(tasklist$sdate)
tasklist$fdate <- as.Date(tasklist$fdate)

limit_vec <- c(30,11,20)
names(limit_vec) <- c("Michael","Jose","Jakub")
utilised_vec <- rep(0,3)
names(utilised_vec) <- names(limit_vec)
my_sample <- function(indf , n, task_rec){
  indf <- indf |> mutate(task=task_rec)
  result <- filter(indf,FALSE)  
  for ( i in seq_len(n)){
    for(r in unique(indf$resource)){
      if(utilised_vec[r]>=limit_vec[r]){
        warning(paste0(r ,
                       " is fully utilised so wont be chooseable for further days in ",
                       task_rec))
        indf <- filter(indf,
                       resource != r)
      }
    }
    if(nrow(indf) < 1 ){
      warning(paste0("failed to pull a sample for ",
                     task_rec ,
                     " as no resource available"))
      return(result)
    }
    new_sel <- sample_n(indf,size=1)
    utilised_vec <<- utilised_vec + 
      as.integer(new_sel$resource == names(limit_vec))
      result <- bind_rows(result,new_sel)
  
  }
  result
}

df <- data.frame()
for (i in 1:nrow(tasklist)) {
  output <- daty2023_temp %>%
    filter(date >= tasklist$sdate[i] &
      date <= tasklist$fdate[i] &
      role == tasklist$resource[i]) %>%
    my_sample(., n=tasklist$mandays[i],task_rec =  tasklist$task[i])
  df <- rbind(df, output)
}

Thank you, @nirgrahamuk, looks like it is working. I just need to make sure I follow your logic, because I do not fully understand the specific steps, especially the my_sample() function.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.