I'm pretty new to R and looking for an efficient way to update values in cells based on certain conditions. I'm assuming this would require a for loop or other function.
This is the dataset.
project_ID <- c(1, 1, 1, 2, 2, 2, 3, 3, 3)
sector <- c("Energy", "None", "None", "Water", "None", "None", "Solar", "Solar", "None")
percentage_approval <- c(100, 50, 50, 100, 25, 25, 100, 30, 40)
type <- c("Program", "Sub-Project", "Sub-Project", "Program", "Sub-Project", "Sub-Project", "Program", "Sub-Project", "Sub-Project")
funding <- c(5, 2.5, 2.5, 16, 4, 4, 10, 3, 4)
cofinancing <- c(100000, 750000, 80000, 4000000, 6660000, 11000, 12000, 1111111, 1111999)
df <- data.frame(project_ID, sector, percentage_approval, type, funding, cofinancing)
What I'd like to do:
Group the data by project ID.
Then, check if the sum of "percentage_approval" for a sub-project is equal to 100. If this is the case, then the "Program" row with the same project ID should be removed.
If the sum of "percentage_approval" for a sub-project is not equal to 100, then the following adjustments are needed:
Subtract the sub-project "funding" from the Program with the same project ID.
Adjust "cofinancing" based on the "sub-project" amount. So for example, cofinancing for the Program for Project ID #2 should be adjusted to 677000 (i.e., remaining 50% based on sub-project co-financing (i.e., 666000 + 11000)). Similarly for Project ID #3, cofinancing for the program should be updated to 1285714.29 (i.e., the remaining 30% based on cofinaning levels from the sub-projects).
Finally, I would like to update the sectors where there is "None" to take on sector values for each project ID.
Therefore, this is the final table I'd like to end up with:
project_ID_2 <- c(1, 1, 2, 2, 2, 3, 3, 3)
sector_2 <- c("Energy", "Energy", "Water", "Water", "Water", "Solar", "Solar", "None")
percentage_approval_2 <- c(50, 50, 100, 25, 25, 100, 30, 40)
type_2 <- c("Sub-Project", "Sub-Project", "Program", "Sub-Project", "Sub-Project", "Program", "Sub-Project", "Sub-Project")
funding_2 <- c(2.5, 2.5, 8, 4, 4, 3, 3, 4)
cofinancing_2 <- c(750000, 80000, 677000, 666000, 11000, 1285714.29, 1000000, 2000000)
df.fixed <- data.frame(project_ID_2, sector_2, percentage_approval_2, type_2, funding_2, cofinancing_2)