How do I identify the first iteration of a value by group?

I want to identify the first time a value of zero occurs in the following example code by group (spec, freq) and create change that corresponding row within column "threshold" to "yes". In other words, for each unique group of greq and spec there would be just one "yes". Ideally, I want to modify this to:

  • the first zero followed by another zero, or
  • the row above the first zero within the group

Example code:

df <- data.frame(response = c(1, 1, 1, 0, 1, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 1,
                              0, 1, 0, 0),
                 spec = c("a", "a", "a", "a", "a", "a", "a","a","a","a",
                          "b", "b", "b", "b", "b", "b", "b", "b", "b", "b"),
                 freq = c(100, 100, 100, 100, 100, 200, 200, 200, 200, 200, 
                          100, 100, 100, 100, 100, 200, 200, 200, 200, 200))

df$thresh <- "no" 

Any help would be very appreciated. Note response values are arbitrary binary values and can be changed if that makes things easier.

Below is one way to identify the first 0 in a group using a tidyverse approach. Additionally, when you modify this to look at rows before or after a specific row, the lag() and lead() functions are really handy.

df <- data.frame(response = c(1, 1, 1, 0, 1, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 1,
                              0, 1, 0, 0),
                 spec = c("a", "a", "a", "a", "a", "a", "a","a","a","a",
                          "b", "b", "b", "b", "b", "b", "b", "b", "b", "b"),
                 freq = c(100, 100, 100, 100, 100, 200, 200, 200, 200, 200, 
                          100, 100, 100, 100, 100, 200, 200, 200, 200, 200),
                 thresh = 'no')

library(dplyr)

df %>%
  group_by(freq, spec) %>%
  mutate(check = ifelse(response == 0, 1, 0),
         val = ifelse(check == 0, NA, row_number())) %>%
  mutate(thresh = ifelse(check == 1 & val == min(val, na.rm = T), 'yes', thresh)) %>%
  ungroup() %>%
  select(-check, -val)
#> # A tibble: 20 × 4
#>    response spec   freq thresh
#>       <dbl> <chr> <dbl> <chr> 
#>  1        1 a       100 no    
#>  2        1 a       100 no    
#>  3        1 a       100 no    
#>  4        0 a       100 yes   
#>  5        1 a       100 no    
#>  6        0 a       200 yes   
#>  7        1 a       200 no    
#>  8        0 a       200 no    
#>  9        0 a       200 no    
#> 10        1 a       200 no    
#> 11        0 b       100 yes   
#> 12        0 b       100 no    
#> 13        0 b       100 no    
#> 14        0 b       100 no    
#> 15        1 b       100 no    
#> 16        1 b       200 no    
#> 17        0 b       200 yes   
#> 18        1 b       200 no    
#> 19        0 b       200 no    
#> 20        0 b       200 no

Created on 2023-02-06 with reprex v2.0.2.9000

Thanks a ton but I am getting an error:

Error in `n()`:
! Must be used inside dplyr verbs.

Once I can run this, I will have more questions I am sure.

Can you copy and paste the code you executed? When I copy and paste the code I shared into a new session, it executed fine, so I'm curious if something changed when you tried.

df <- data.frame(response = c(1, 1, 1, 0, 1, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 1,
                              0, 1, 0, 0),
                 spec = c("a", "a", "a", "a", "a", "a", "a","a","a","a",
                          "b", "b", "b", "b", "b", "b", "b", "b", "b", "b"),
                 freq = c(100, 100, 100, 100, 100, 200, 200, 200, 200, 200, 
                          100, 100, 100, 100, 100, 200, 200, 200, 200, 200),
                 thresh = 'no')

library(dplyr)

df %>%
  group_by(freq, spec) %>%
  mutate(check = ifelse(response == 0, 1, 0),
         val = ifelse(check == 0, NA, row_number())) %>%
  mutate(thresh = ifelse(check == 1 & val == min(val, na.rm = T), 'yes', thresh)) %>%
  ungroup() %>%
  select(-check, -val)
rlang::last_error()
<error/rlang_error>
Error in `n()`:
! Must be used inside dplyr verbs.
---
Backtrace:
  1. ... %>% select(-check, -val)
  4. plyr::mutate(...)
  5. base::eval(cols[[col]], .data, parent.frame())
  6. base::eval(cols[[col]], .data, parent.frame())
  7. dplyr::row_number()
  8. dplyr::n()
  9. dplyr:::peek_mask("n")
 10. dplyr:::context_peek("mask", fun)
Run `rlang::last_trace()` to see the full context.

More info: this works on a different computer with R studio version 1.3.1093 but does not work on my primary computer version 1.4.1717. I have no concept if the version is at issue.

The version of the IDE has no effect on the behavior of the package but different dplyr versions could be causing the problem

Yes, @andresrcs is correct. I'm using dplyr 1.1.0.

I am using dplyr 2.2.1

Version 1.1.0 just came out last week. What do you get if you run the following?

packageVersion('dplyr')
#> [1] '1.1.0'

1.0.9 - apologies, misread an output.

All good. Does it work if you change mutate to dplyr::mutate?

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.