Alternative to using last() in summarise() function

SteveXD · May 1, 2018, 8:38pm

Hi there,

Say I have the following data frame:

df <- data.frame(name = rep(letters[seq(1,2)],each = 3),
                 item = rep(c('red','blue','green'), 2), 
                 value = as.integer(c(1,3,2,7,6,5)),
                 weight = c(1/3, 1/3, 1/3, 1/4, 1/2, 1/4))

I want to summarise the dataframe by 'name', where for each name I want to show (1) the value corresponding to the 'green' item and (2) the weighted average of the values. I can do this using this code:

df %>% 
  group_by(name) %>%
  summarise(green = last(value),
            weighted = sum(value * weight))

I was wondering if there was an alternative to using the last() function within summarise(), as I don't always know the order that my data will be in (i.e. I don't know that green will always be the last item).

Would the best way be to manually re-order the 'item' column first? Was wondering if there were more elegant solutions.

Grateful for any thoughts!

nwerth · May 1, 2018, 8:46pm

df %>% 
  group_by(name) %>%
  summarise(green = value[item == "green"],
            weighted = sum(value * weight))

I suggest looking over the official An Introduction to R, specifically the section on indexing vectors (which the link leads to).

SteveXD · May 1, 2018, 8:50pm

Thank you @nwerth! I overlooked the possibility of doing this within the summarise() function