I have a dataframe "sales". The first column is the name of the item, and the remaining column is data on sale with value 0 = No sale, 1 = Instore sale, 2 = Online sale and NA = If missing data.
sales <- data.frame(
Itemnumber = c("a","b","c","d","e","f","g","h","i","j","k","l","m","n","o"),
salesJan = c(0,1,1,1,1,2,0,0,0,0,0,1,2,1,2),
salesFeb = c(0,0,NA,0,1,2,1,1,0,0,1,1,1,2,1),
salesMar = c(0,2,1,1,2,1,2,0,0,0,0,0,2,1,2),
salesApr = c(0,1,1,1,1,1,NA,0,0,0,2,1,2,1,2),
salesMay = c(0,1,1,1,1,1,1,1,2,2,1,2,1,2,1),
salesJune = c(0,1,2,2,2,2,1,2,1,2,0,1,0,1,2),
salesJuly = c(0,2,1,2,1,1,0,0,1,2,0,1,2,0,1),
salesAug = c(0,0,0,0,0,0,1,2,1,0,0,1,2,1,NA),
salesSep = c(0,1,2,0,0,0,0,0,1,2,1,0,1,2,1),
salesOct = c(1,0,1,2,NA,1,2,1,0,1,2,0,1,2,0),
salesNov = c(0,1,2,1,1,1,0,0,1,2,0,NA,1,2,1),
salesDec = c(0,1,2,NA,1,2,0,0,0,0,1,2,1,0,0)
)
sales
#> Itemnumber salesJan salesFeb salesMar salesApr salesMay salesJune salesJuly
#> 1 a 0 0 0 0 0 0 0
#> 2 b 1 0 2 1 1 1 2
#> 3 c 1 NA 1 1 1 2 1
#> 4 d 1 0 1 1 1 2 2
#> 5 e 1 1 2 1 1 2 1
#> 6 f 2 2 1 1 1 2 1
#> 7 g 0 1 2 NA 1 1 0
#> 8 h 0 1 0 0 1 2 0
#> 9 i 0 0 0 0 2 1 1
#> 10 j 0 0 0 0 2 2 2
#> 11 k 0 1 0 2 1 0 0
#> 12 l 1 1 0 1 2 1 1
#> 13 m 2 1 2 2 1 0 2
#> 14 n 1 2 1 1 2 1 0
#> 15 o 2 1 2 2 1 2 1
#> salesAug salesSep salesOct salesNov salesDec
#> 1 0 0 1 0 0
#> 2 0 1 0 1 1
#> 3 0 2 1 2 2
#> 4 0 0 2 1 NA
#> 5 0 0 NA 1 1
#> 6 0 0 1 1 2
#> 7 1 0 2 0 0
#> 8 2 0 1 0 0
#> 9 1 1 0 1 0
#> 10 0 2 1 2 0
#> 11 0 1 2 0 1
#> 12 1 0 0 NA 2
#> 13 2 1 1 1 1
#> 14 1 2 2 2 0
#> 15 NA 1 0 1 0
Created on 2021-09-05 by the reprex package (v2.0.1)
Now I need to compute a list of 12 new columns showing "No of sale months in the last 4 months" for each month. So this also means that from Jan-April, the data will just be Jan:only Jan, Feb: Jan + Feb, March: Jan,Feb and March and from April to Dec, it will compute the last 4 months.
This is how I want my final data to look (NOTE: the numbers in "saleslast4XXX" columns are just made up and the column should go all the way to Dec i.e. saleslast4Dec.
salesnew <- data.frame(
Itemnumber = c("a","b","c","d","e","f","g","h","i","j","k","l","m","n","o"),
salesJan = c(0,1,1,1,1,2,0,0,0,0,0,1,2,1,2),
salesFeb = c(0,0,NA,0,1,2,1,1,0,0,1,1,1,2,1),
salesMar = c(0,2,1,1,2,1,2,0,0,0,0,0,2,1,2),
salesApr = c(0,1,1,1,1,1,NA,0,0,0,2,1,2,1,2),
salesMay = c(0,1,1,1,1,1,1,1,2,2,1,2,1,2,1),
salesJune = c(0,1,2,2,2,2,1,2,1,2,0,1,0,1,2),
salesJuly = c(0,2,1,2,1,1,0,0,1,2,0,1,2,0,1),
salesAug = c(0,0,0,0,0,0,1,2,1,0,0,1,2,1,NA),
salesSep = c(0,1,2,0,0,0,0,0,1,2,1,0,1,2,1),
salesOct = c(1,0,1,2,NA,1,2,1,0,1,2,0,1,2,0),
salesNov = c(0,1,2,1,1,1,0,0,1,2,0,NA,1,2,1),
salesDec = c(0,1,2,NA,1,2,0,0,0,0,1,2,1,0,0),
saleslast4Jan = c(0,1,1,1,1,2,0,0,0,0,0,1,2,1,2),
saleslast4Feb = c(0,1,1,1,1,2,0,0,0,0,0,1,2,1,2),
saleslast4March = c(0,1,1,1,1,1,0,0,0,0,0,1,2,1,2),
saleslast4April = c(0,1,1,1,2,2,0,0,0,0,0,1,0,1,2)
)
salesnew
#> Itemnumber salesJan salesFeb salesMar salesApr salesMay salesJune salesJuly
#> 1 a 0 0 0 0 0 0 0
#> 2 b 1 0 2 1 1 1 2
#> 3 c 1 NA 1 1 1 2 1
#> 4 d 1 0 1 1 1 2 2
#> 5 e 1 1 2 1 1 2 1
#> 6 f 2 2 1 1 1 2 1
#> 7 g 0 1 2 NA 1 1 0
#> 8 h 0 1 0 0 1 2 0
#> 9 i 0 0 0 0 2 1 1
#> 10 j 0 0 0 0 2 2 2
#> 11 k 0 1 0 2 1 0 0
#> 12 l 1 1 0 1 2 1 1
#> 13 m 2 1 2 2 1 0 2
#> 14 n 1 2 1 1 2 1 0
#> 15 o 2 1 2 2 1 2 1
#> salesAug salesSep salesOct salesNov salesDec saleslast4Jan saleslast4Feb
#> 1 0 0 1 0 0 0 0
#> 2 0 1 0 1 1 1 1
#> 3 0 2 1 2 2 1 1
#> 4 0 0 2 1 NA 1 1
#> 5 0 0 NA 1 1 1 1
#> 6 0 0 1 1 2 2 2
#> 7 1 0 2 0 0 0 0
#> 8 2 0 1 0 0 0 0
#> 9 1 1 0 1 0 0 0
#> 10 0 2 1 2 0 0 0
#> 11 0 1 2 0 1 0 0
#> 12 1 0 0 NA 2 1 1
#> 13 2 1 1 1 1 2 2
#> 14 1 2 2 2 0 1 1
#> 15 NA 1 0 1 0 2 2
#> saleslast4March saleslast4April
#> 1 0 0
#> 2 1 1
#> 3 1 1
#> 4 1 1
#> 5 1 2
#> 6 1 2
#> 7 0 0
#> 8 0 0
#> 9 0 0
#> 10 0 0
#> 11 0 0
#> 12 1 1
#> 13 2 0
#> 14 1 1
#> 15 2 2
The observation and column names in my real dataset is very long. So I am looking to create a function that uses a range (e.g in the last 4 columns) rather than something simple like : mutate(saleslast4Jan = salesJan) or mutate(saleslast4March = salesJan + salesFeb + salesMarch + salesApril).
I also need to be careful in :
- Reading 2 as 1 if I am doing a sum because I need to count both online and in-store sale as "sale" and dont need to differentiate. Can I do this without recoding 2 as 1?
- Data for Jan-March where the data wont be coming from the last 4 months but simply a sum of whatever there is.
- Ignoring NA.
Thank you for the help !