maximum value in previous year per name

Hi, I have a following problem. I need to calculate maximum value per each name from a previous year.

Let:

dfA <- data.frame(year = c(2003,2003,2004,2004,2002,2004,2004,2005, 2010), 
                 name = c("A","A","A","A","B","B","B","B", "B" ) , 
                 value = c(5,7,1,2,10,6,8,15, NA))

Desired output is

dfB <- data.frame(year = c(2003,2003,2004,2004,2002,2004,2004,2005, 2010), 
                  name = c("A","A","A","A","B","B","B","B", "B" ) , 
                  value = c(5,7,1,2,10,6,8,15, NA),
                  max_lag = c(NA,NA,7,7,NA,NA,NA,8, NA))

I tried this, but it calculated maximum from current years:

dfA$year_lag <- dfA$year -1

dfB <- dfA %>% arrange(name, year_lag) %>%
  group_by(name, year_lag) %>%
  dplyr::mutate(max_lag= max(value, na.rm = TRUE) )

Thanks for help!

I don't understand the logic behind your computation. You want the max of what, exactly? Why are there runs of NAs at the start of both series (A and B) longer than 1?

I made a very literal interpretation of what you asked for in your english text.

library(tidyverse)
library(sqldf)
 dfA<- tibble(year = c(2003,2003,2004,2004,2002,2004,2004,2005, 2010), 
                  name = c("A","A","A","A","B","B","B","B", "B" ) , 
                  value = c(5,7,1,2,10,6,8,15, NA))

#maximum value per each name from a previous year.
maxes <- dfA %>% 
  group_by(name,year) %>% 
  summarise(maxval=max(value,na.rm=TRUE)) %>% 
  ungroup() %>% 
  rename(yyear=year)

 

sqldf("select a.*,max(b.maxval)
             from dfA as a
             left join maxes as b
             on a.year>b.yyear and a.name=b.name
             group by a.year,a.name,a.value")

#   year    name value max(b.maxval)
# 1 2002    B    10         NA
# 2 2003    A     5         NA
# 3 2003    A     7         NA
# 4 2004    A     1          7
# 5 2004    A     2          7
# 6 2004    B     6         10
# 7 2004    B     8         10
# 8 2005    B    15         10
# 9 2010    B    NA         15

Thank you both, @ulfelder and @nirgrahamuk for replies.

I finally figured out myself, how to do it:

dfA<- tibble(year = c(2003,2003,2004,2004,2002,2004,2004,2005, 2010), 
             name = c("A","A","A","A","B","B","B","B", "B" ) , 
             value = c(5,7,1,2,10,6,8,15, NA))

#maximum value per each name from a previous year.
maxes <- dfA %>% 
  group_by(name,year) %>% 
  summarise(maxval=max(value,na.rm=TRUE)) %>% 
  ungroup() %>% 
  rename(yyear=year)


maxes$year <- maxes$yyear +1

dfB <- left_join(dfA, maxes, by = c("name" , "year"))

ah, I see the difference. I think I would have described this as a need to calculate maximum value per each name from the previous year. (rather than a previous year). glad you have found a solution :slight_smile:

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.