Changing cell value based on multiple conditions

chibulu87 · June 29, 2021, 11:44pm

I'm pretty new to R and looking for an efficient way to update values in cells based on certain conditions. I'm assuming this would require a for loop or other function.

This is the dataset.

project_ID <- c(1, 1, 1, 2, 2, 2, 3, 3, 3)
sector <- c("Energy", "None", "None", "Water", "None", "None", "Solar", "Solar", "None")
percentage_approval  <- c(100, 50, 50, 100, 25, 25, 100, 30, 40)
type <- c("Program", "Sub-Project", "Sub-Project", "Program", "Sub-Project", "Sub-Project", "Program", "Sub-Project", "Sub-Project")
funding <- c(5, 2.5, 2.5, 16, 4, 4, 10, 3, 4)
cofinancing <- c(100000, 750000, 80000, 4000000, 6660000, 11000, 12000, 1111111, 1111999)


df <- data.frame(project_ID, sector, percentage_approval, type, funding, cofinancing)

What I'd like to do:

Group the data by project ID.

Then, check if the sum of "percentage_approval" for a sub-project is equal to 100. If this is the case, then the "Program" row with the same project ID should be removed.

If the sum of "percentage_approval" for a sub-project is not equal to 100, then the following adjustments are needed:

Subtract the sub-project "funding" from the Program with the same project ID.

Adjust "cofinancing" based on the "sub-project" amount. So for example, cofinancing for the Program for Project ID #2 should be adjusted to 677000 (i.e., remaining 50% based on sub-project co-financing (i.e., 666000 + 11000)). Similarly for Project ID #3, cofinancing for the program should be updated to 1285714.29 (i.e., the remaining 30% based on cofinaning levels from the sub-projects).

Finally, I would like to update the sectors where there is "None" to take on sector values for each project ID.

Therefore, this is the final table I'd like to end up with:

project_ID_2 <- c(1, 1, 2, 2, 2, 3, 3, 3)
sector_2 <- c("Energy", "Energy", "Water", "Water", "Water", "Solar", "Solar", "None")
percentage_approval_2  <- c(50, 50, 100, 25, 25, 100, 30, 40)
type_2 <- c("Sub-Project", "Sub-Project", "Program", "Sub-Project", "Sub-Project", "Program", "Sub-Project", "Sub-Project")
funding_2 <- c(2.5, 2.5, 8, 4, 4, 3, 3, 4)
cofinancing_2 <- c(750000, 80000, 677000, 666000, 11000, 1285714.29, 1000000, 2000000)

df.fixed <- data.frame(project_ID_2, sector_2, percentage_approval_2, type_2, funding_2, cofinancing_2)

lucasgraybuck · July 3, 2021, 10:42pm

I think we can get this done using dplyr with a little help from some temporary columns and changing the order of operations a bit. There may be a more streamlined solution.

df %>%
# Group the data by project ID.
  group_by(project_ID) %>%
# Finally, I would like to update the sectors where there is "None" to take on sector values for each project ID.
  mutate(sector = sector[type == "Program"]) %>%  
#Then, check if the sum of "percentage_approval" for a sub-project is equal to 100. If this is the case, then the "Program" row with the same project ID should be removed.
  mutate(sub_approval = sum(percentage_approval[type != "Program"])) %>%
  filter(!(type == "Program" & sub_approval == 100)) %>%
  select(-sub_approval) %>%
#If the sum of "percentage_approval" for a sub-project is not equal to 100, then the following adjustments are needed:
#Subtract the sub-project "funding" from the Program with the same project ID.
  mutate(sub_funding = sum(funding[type != "Program"])) %>%
  mutate(funding = ifelse(type == "Program", funding - sub_funding, funding)) %>%
  select(-sub_funding) %>%
#Adjust "cofinancing" based on the "sub-project" amount. So for example, cofinancing for the Program for Project ID #2 should be adjusted to 677000 (i.e., remaining 50% based on sub-project co-financing (i.e., 666000 + 11000)). Similarly for Project ID #3, cofinancing for the program should be updated to 1285714.29 (i.e., the remaining 30% based on cofinaning levels from the sub-projects).
  mutate(sub_cofinancing = sum(cofinancing[type != "Program"])) %>%
  mutate(cofinancing = ifelse(type == "Program", sub_cofinancing, cofinancing)) %>%
  select(-sub_cofinancing)

I'm not sure I totally follow the last point, and I ended up with some different values than in your example:

  project_ID sector percentage_approval type        funding cofinancing
       <dbl> <fct>                <dbl> <fct>         <dbl>       <dbl>
1          1 Energy                  50 Sub-Project     2.5      750000
2          1 Energy                  50 Sub-Project     2.5       80000
3          2 Water                  100 Program         8       6671000
4          2 Water                   25 Sub-Project     4       6660000
5          2 Water                   25 Sub-Project     4         11000
6          3 Solar                  100 Program         3       2223110
7          3 Solar                   30 Sub-Project     3       1111111
8          3 Solar                   40 Sub-Project     4       1111999

lucasgraybuck · July 3, 2021, 11:18pm

Now that I've thought about this a bit more, another way to do this is to write your own function to accomplish each of the steps you want to apply to each project, then use a split -> apply -> bind approach to get everything done.

Here's an attempt at such a function:

update_project <- function(project_df) {
  # save this for later
  project_sector <- project_df$sector[project_df$type == "Program"]
  
  #Then, check if the sum of "percentage_approval" for a sub-project is equal to 100. If this is the case, then the "Program" row with the same project ID should be removed.
  sub_approval <- sum(project_df$percentage_approval[project_df$type != "Program"])
  if(sub_approval == 100) {
    project_df <- project_df[project_df$type != "Program",]
  } else {
    # Split Program from non-program
    # This just makes some of these steps a bit easier to read
    program_df <- project_df[project_df$type == "Program",]
    sub_df <- project_df[project_df$type != "Program",]
    
    #If the sum of "percentage_approval" for a sub-project is not equal to 100, then the following adjustments are needed:
    #Subtract the sub-project "funding" from the Program with the same project ID.
    sub_funding <- sum(sub_df$funding)
    program_df$funding <- program_df$funding - sub_funding
    
    #Adjust "cofinancing" based on the "sub-project" amount. So for example, cofinancing for the Program for Project ID #2 should be adjusted to 677000 (i.e., remaining 50% based on sub-project co-financing (i.e., 666000 + 11000)). Similarly for Project ID #3, cofinancing for the program should be updated to 1285714.29 (i.e., the remaining 30% based on cofinaning levels from the sub-projects).
    sub_cofinancing <- sum(sub_df$cofinancing)
    program_df$cofinancing <- sub_cofinancing
    
    #re-assemble the program and subprograms
    project_df <- rbind(program_df, sub_df)
  }
  
  # Finally, I would like to update the sectors where there is "None" to take on sector values for each project ID.
  project_df$sector <- project_sector
  
  # Return the result
  project_df
}

With this in hand, you can now split up your data.frame by project using split(), run the function per-project using lapply(), and then put everything back together with rbind().

# Split your data.frame by project
project_list <- split(df, df$project_ID)
# Apply your function to each new project data.frame
project_list <- lapply(project_list, update_project)
# re-assemble your results
df.fixed <- do.call(rbind, project_list)

chibulu87 · July 6, 2021, 5:38pm

Wow @lucasgraybuck! Thank you so much, this is super helpful.

I received some updated data which means that I don't need to change the co-financing numbers. But everything else works really well!

system · July 13, 2021, 5:39pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.