Help with adding rows for each month based on content of column

ksingh19 · November 7, 2023, 8:57pm

Hello All!

Need help in adding rows please based on column content. In this example, for some months, we have all codes per type, but for others we don't have any Code. I would like to add rows for each month for each type of Code we see in the entire column. For Type A, we have 4 codes for 202308 and data only has 2 codes in 202309 and other ACCT_DT. I am looking to add all 4 codes in all months for Type A with NAs in absence of data for code. For Type B, we have 3 Codes in total and looking to add these 3 Codes for all months.

df <- data.frame(
  stringsAsFactors = FALSE,
  ACCT_DT = c(202308L,202308L,202308L,
              202308L,202309L,202309L,202310L,202310L,202310L,202308L,
              202308L,202308L,202309L,202310L,202310L),
  Country = c("CA","CA","CA","CA","CA",
              "CA","CA","CA","CA","CA","CA","CA","CA","CA","CA"),
  Type = c("A","A","A","A","A","A",
           "A","A","A","B","B","B","B","B","B"),
  Code = c("Code_1","Code_2","Code_3",
           "Code_4","Code_3","Code_4","Code_1","Code_2","Code_3",
           "Code_1","Code_2","Code_3","Code_2",
           "Code_1","Code_2"),
  Sales = c(100L,90L,80L,120L,80L,120L,
            110L,NA,NA,98L,96L,87L,78L,120L,NA)
)

# All Codes per Type 
codes_by_type <- df %>%
  distinct(Country, Type, Code)

AmeliaMN · November 7, 2023, 9:34pm

I'm not sure I fully understand the question, but I'm guessing you want the tidyr complete function. That would allow you to get all the combinations of factors.

ksingh19 · November 8, 2023, 1:11pm

Thank you!
complete will give all possible combinations for each Type. Instead, here I am looking to add missing Code for each Type if that Code was used for any given month in dataset for any group.

For example, for 202308, we have 4 different codes, but for 202309 & Type A, we only have code 3 & 4. So, in this case, I would like to add rows for 202308, Type A, with codes 1 & 2 with NAs. Similarly, for 202310 & Type A, we have 3 codes and the missing one is Code_4. So, for this one, I would like to add row with code_4. Same thing, for Type B where rows to be added if any code is missing but was otherwise existing in any of the previous months.

AmeliaMN · November 8, 2023, 2:22pm

Hmm, I would think that

df %>%
complete(ACCT_DT, Type, Code)

would get you there. What rows don't exist in that dataframe that you want, and/or which ones are there that you don't want?

ksingh19 · November 8, 2023, 3:44pm

I just edited my sample data for ease of understanding and here Type B doesn't show Code_4. Currently complete considers all combinations. But in this example, Type A has 4 codes and Type B has 3 Codes in total for any given month. I would like to maintain 4 codes for Type A and 3 codes for Type B for all months. When we don't have values, it would be NAs.

In df1, we see Code_4 in rows 8, 16 & 24 for Type B. But these should not be there.

library(tidyverse)
df <- data.frame(
  stringsAsFactors = FALSE,
  ACCT_DT = c(202308L,202308L,202308L,
              202308L,202309L,202309L,202310L,202310L,202310L,202308L,
              202308L,202308L,202309L,202310L,202310L),
  Country = c("CA","CA","CA","CA","CA",
              "CA","CA","CA","CA","CA","CA","CA","CA","CA","CA"),
  Type = c("A","A","A","A","A","A",
           "A","A","A","B","B","B","B","B","B"),
  Code = c("Code_1","Code_2","Code_3",
           "Code_4","Code_3","Code_4","Code_1","Code_2","Code_3",
           "Code_1","Code_2","Code_3","Code_2",
           "Code_1","Code_2"),
  Sales = c(100L,90L,80L,120L,80L,120L,
            110L,NA,NA,98L,96L,87L,78L,120L,NA)
)

# All Codes per Type 
codes_by_type <- df %>%
  distinct(Country, Type, Code)

# Complete adds Code_4 for Type B for row 8. However, Type should consist of Code_1, 2 & 3 only as original column doesn't have Code_4 anywhere.
df1 <- df %>%
  complete(ACCT_DT, Country, Type, Code)

AmeliaMN · November 8, 2023, 4:57pm

Ah, I think I get it! What about

df %>%
    group_by(Type) %>%
    complete(ACCT_DT, Code)

ksingh19 · November 8, 2023, 9:25pm

Thank you so much @AmeliaMN !
This seems to work great in this example. When I try it with my original big data, it seems like it completes using only one group. If I had multiple groups here, example, different countries, different categories, and then Type, how would that work? I was trying the following and it only captures new rows for one group of categories and not another.

df %>%
group_by(Country, Category, Type) %>%
complete(ACCT_DT, Code)

AmeliaMN · November 9, 2023, 7:15pm

I think you'd have to post another reprex with some example data that has categories.

The other way to use complete() is with nesting(). So another approach that would come up with the same dataset (I think) as my previous example would be

df %>%
    complete(Code, nesting(Type, ACCT_DT))

It's possible something like this could be extended to additional nesting() variables.

ksingh19 · November 9, 2023, 10:03pm

Thank you! Adding new reprex to make sure above solution can handle all groups. In my testing, I couldn't achieve the right solution. But it would be helpful, if some modifications can still work.

library(tidyverse)
df <- data.frame(
  stringsAsFactors = FALSE,
                 ACCT_DT = c(202306L,202306L,
                             202306L,202306L,202306L,202306L,202306L,202306L,
                             202306L,202306L,202306L,202306L,202306L,
                             202306L,202306L,202307L,202307L,202307L,202307L,
                             202307L,202307L,202307L,202307L,202307L,202307L,
                             202307L,202307L,202308L,202308L,202308L,
                             202308L,202308L,202308L,202308L,202308L,202308L,
                             202308L,202308L,202308L,202308L,202309L,202309L,
                             202309L,202309L,202309L,202309L,202309L,
                             202309L,202309L,202309L,202309L,202309L,202309L,
                             202310L,202310L,202310L,202310L,202310L,202310L,
                             202310L,202310L,202310L,202310L,202310L),
                 Country = c("USA","USA","USA",
                             "USA","USA","CA","CA","CA","CA","CA","CA",
                             "CA","CA","CA","CA","USA","USA","USA","CA",
                             "CA","CA","CA","CA","CA","CA","CA","CA",
                             "USA","USA","USA","CA","CA","CA","CA","CA",
                             "CA","CA","CA","CA","CA","USA","USA","USA",
                             "CA","CA","CA","CA","CA","CA","CA","CA","CA",
                             "CA","USA","USA","CA","CA","CA","CA","CA",
                             "CA","CA","CA","CA"),
                Category = c("Soft Goods",
                             "Soft Goods","Soft Goods","Hard Goods","Hard Goods",
                             "Hard Goods","Hard Goods","Hard Goods",
                             "Hard Goods","Hard Goods","Hard Goods","Hard Goods",
                             "Hard Goods","Hard Goods","Hard Goods","Soft Goods",
                             "Soft Goods","Soft Goods","Hard Goods",
                             "Hard Goods","Hard Goods","Hard Goods","Hard Goods",
                             "Hard Goods","Hard Goods","Hard Goods","Hard Goods",
                             "Soft Goods","Soft Goods","Soft Goods",
                             "Hard Goods","Hard Goods","Hard Goods","Hard Goods",
                             "Hard Goods","Hard Goods","Hard Goods",
                             "Hard Goods","Hard Goods","Hard Goods","Soft Goods",
                             "Soft Goods","Soft Goods","Hard Goods","Hard Goods",
                             "Hard Goods","Hard Goods","Hard Goods",
                             "Hard Goods","Hard Goods","Hard Goods","Hard Goods",
                             "Hard Goods","Soft Goods","Soft Goods","Hard Goods",
                             "Hard Goods","Hard Goods","Hard Goods",
                             "Hard Goods","Hard Goods","Hard Goods","Hard Goods",
                             "Hard Goods"),
                    Type = c("A","C","A","B",
                             "A","A","A","A","A","C","C","A","A","A1",
                             "B","A","C","A","A","A","A","A","C","A",
                             "A","A1","B","A","C","A","A","A","A","A",
                             "C","C","A","A","A1","B","C","A","A","A",
                             "A","A","A","C","C","A","A","A1","B","C",
                             "A","A","A","A","A","C","A","A","A1","B"),
                    Code = c("Code_1","Exists",
                             "Exists","Code_1","Code_1","Code_1","Code_2",
                             "Code_3","Code_4","Exists","Code","Exists",
                             "Non Exists","Exists","Exists","Code_3","Exists",
                             "Exists","Code_1","Code_2","Code_3","Code_4",
                             "Exists","Exists","Non Exists","Exists","Exists",
                             "Code_3","Exists","Exists","Code_1","Code_2",
                             "Code_3","Code_4","Exists","Code","Exists",
                             "Non Exists","Exists","Exists","Exists","Exists",
                             "Non Exists","Code_1","Code_2","Code_3",
                             "Code_4","Exists","Code","Exists","Non Exists",
                             "Exists","Exists","Exists","Exists","Code_1",
                             "Code_2","Code_3","Code_4","Exists","Exists",
                             "Non Exists","Exists","Exists"),
                   Sales = c(6,5,14,15,10,
                             23,25,28,22,63,2,331.83,7,18,6,6,5,13,
                             11,24,21,25,64,319.83,22,19,6,6,5,13,10,
                             11,22,20,63,4,321.83,24,17,6,5,12,6,3,
                             9,12,20,67,2,329.19,20,16,7,7,19,34,3,
                             10,11,68,329.19,22,14,7)
      )

# All Codes per Type 
codes_by_type <- df %>%
  distinct(Country, Category, Type, Code)

AmeliaMN · November 10, 2023, 10:32pm

Without more contextual knowledge, I would guess

df %>%
    complete(Code, nesting(Type, ACCT_DT, Country))

but again, I don't know what things you want completed and which you don't.

ksingh19 · November 13, 2023, 10:14pm

Thanks for looking into it!

I am looking to complete the entire table by month based on all combinations we have in codes_by_type.
i.e. if a combination has value for any given months, then it should have values for all months even if it has to be NAs.

AmeliaMN · November 18, 2023, 9:42pm

What's the variable in your reprex that is the analogue to month?

ksingh19 · November 22, 2023, 1:15pm

ACCT_DT is the date with year and months in it. Thanks!