Create new column based on values from previous columns

Hi All, I am fairly new to R and trying to produce a column 'Adjusted_Avg' where the column entries are based on conditions of values in a previous columns called 'Net' and 'Cumulative'.

Conditions (when grouped by ID):

(i): If Net != 0 & Cumulative = Net, Adjusted_Avg = Net
(ii): If Net != 0 and lag(Net) != 0, Adjusted_Avg = Net
(iii): If Net = 0, Adjusted_Avg = the next non-zero Net value divided by number of rows to reach that non-zero value.

This can be more easily understood by comparing df_net to df_spread_ideal

For context, this is the starting table 'df_net':

R Code for df_net table:

{r}
df_net <- structure(list(ID = c(14, 14, 14, 14, 23, 23, 36, 36, 36, 36, 
                      112, 112, 112, 112, 112), month_year = c("04-2023", "05-2023", 
                                                               "06-2023", "07-2023", "04-2023", "05-2023", "04-2023", "05-2023", 
                                                               "06-2023", "07-2023", "03-2023", "04-2023", "05-2023", "06-2023", 
                                                               "07-2023"), Cumulative = c(200, 200, 200, 1000, 400, 900, 300, 
                                                                                          900, 900, 1600, 200, 200, 1000, 1000, 2000), Net = c(200, 0, 
                                                                                                                                               0, 800, 400, 500, 300, 600, 0, 700, 200, 0, 800, 0, 1000)), class = c("grouped_df", 
                                                                                                                                                                                                                     "tbl_df", "tbl", "data.frame"), row.names = c(NA, -15L), groups = structure(list(
                                                                                                                                                                                                                       ID = c(14, 23, 36, 112), .rows = structure(list(1:4, 5:6, 
                                                                                                                                                                                                                                                                       7:10, 11:15), ptype = integer(0), class = c("vctrs_list_of", 
                                                                                                                                                                                                                                                                                                                   "vctrs_vctr", "list"))), class = c("tbl_df", "tbl", "data.frame"
                                                                                                                                                                                                                                                                                                                   ), row.names = c(NA, -4L), .drop = TRUE))

This is the table I want

And here is my code so far for trying to transform df_net into df_spread_ideal:

df_spread <- df_net %>%
  group_by(ID) %>%
  mutate(
    Adjusted_Avg = case_when(
      Net != 0 & Net == Cumulative ~ Net, 
      Net != 0 & Net != Cumulative & lag(Net) != 0 ~ Cumulative - lag(Cumulative), 
      TRUE ~ {
        interval_count <- sum(Net == 0) + 1 
        Next_Net <- tail(Net[Net != 0], 1) 
        Next_Net / interval_count
      }
    )
  )  

I know my code is slightly flawed as it is taking the last non-zero 'Net' value for each ID rather than the next non-zero 'Net' value to work out the numerator and similarly it's working out how many 0 Net values there are for the entire ID rather than up until the point of the next non-zero net value.

Any help would be appreciated, thanks!

This seems to get you what you want with the example data set and I think it will work generally but I'm not fully awake yet.

df_net <- structure(list(ID = c(14, 14, 14, 14, 23, 23, 36, 36, 36, 36, 
                                112, 112, 112, 112, 112), 
                         month_year = c("04-2023", "05-2023", "06-2023", "07-2023", "04-2023", "05-2023", "04-2023", "05-2023", 
                                                              "06-2023", "07-2023", "03-2023", "04-2023", "05-2023", "06-2023", 
                                                              "07-2023"), 
                         Cumulative = c(200, 200, 200, 1000, 400, 900, 300, 900, 900, 1600, 200, 200, 1000, 1000, 2000), 
                         Net = c(200, 0, 0, 800, 400, 500, 300, 600, 0, 700, 200, 0, 800, 0, 1000)), 
                    class = c("grouped_df", "tbl_df", "tbl", "data.frame"), row.names = c(NA, -15L), 
                    groups = structure(list(ID = c(14, 23, 36, 112), 
                                            .rows = structure(list(1:4, 5:6, 7:10, 11:15), 
                                                              ptype = integer(0), class = c("vctrs_list_of", "vctrs_vctr", "list"))), 
                                       class = c("tbl_df", "tbl", "data.frame"),row.names = c(NA, -4L), .drop = TRUE))
library(tidyverse)
df_net <- ungroup(df_net)
j <- 1
df_net$Grp <- 1
for (i in 2:nrow(df_net)) {
  if(df_net[i-1, "Net"] != 0) j <- j + 1
  df_net[i, "Grp"] <- j
}
df_net <- df_net |> group_by(Grp) |> mutate(Adjusted_Avg = mean(Net))
df_net
#> # A tibble: 15 × 6
#> # Groups:   Grp [10]
#>       ID month_year Cumulative   Net   Grp Adjusted_Avg
#>    <dbl> <chr>           <dbl> <dbl> <dbl>        <dbl>
#>  1    14 04-2023           200   200     1         200 
#>  2    14 05-2023           200     0     2         267.
#>  3    14 06-2023           200     0     2         267.
#>  4    14 07-2023          1000   800     2         267.
#>  5    23 04-2023           400   400     3         400 
#>  6    23 05-2023           900   500     4         500 
#>  7    36 04-2023           300   300     5         300 
#>  8    36 05-2023           900   600     6         600 
#>  9    36 06-2023           900     0     7         350 
#> 10    36 07-2023          1600   700     7         350 
#> 11   112 03-2023           200   200     8         200 
#> 12   112 04-2023           200     0     9         400 
#> 13   112 05-2023          1000   800     9         400 
#> 14   112 06-2023          1000     0    10         500 
#> 15   112 07-2023          2000  1000    10         500

Created on 2024-01-30 with reprex v2.0.2

This does work generally. Thanks so much!

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.