loops: If function: how to select a row when the previous row is the same

My DB is like this:

DB <- data.frame(position = c(1,2,3,4,5,6,7,8,9,10),
location = c(BO, BO, BO, BO, BO, AL, AL, AL, AL, AL),
year = c(2016, 2016, 2016, 2016, 2016, 2015, 2015, 2015, 2015, 2015),
adult = c( 1, 3, 5, 6, 8, 5, 10, 15, 20, 25))

I have to do the difference between values in adult column just if they share the same location and year (the value - previous value)

I am trying with this run but it doesn't work

for(x in  DB$position){
  if (X == 1)
  {A = DB$adult[1]
  } else {
    if(DB$location[x] == DB$location[x-1] & DB$year[x] == DB$year[x-1]){
      A1=c(A, diff(DB$adult))
    } 
    A1=DB$adult[1]
  }
}

What do I wrong?

Thanks a lot

Rather than use a for loop, I suggest you use the functions from the dplyr package that are built for this sort of thing.

DB <- data.frame(position = c(1,2,3,4,5,6,7,8,9,10),
                 location = c('BO', 'BO', 'BO', 'BO', 'BO', 'AL', 'AL', 'AL', 'AL', 'AL'),
                 year = c(2016, 2016, 2016, 2016, 2016, 2015, 2015, 2015, 2015, 2015),
                 adult = c( 1, 3, 5, 6, 8, 5, 10, 15, 20, 25))
library(dplyr)
DBnew <- DB |> group_by(location,year) |> 
  mutate(Lag = lag(adult),
         AdultDiff = adult - Lag)
DBnew
#> # A tibble: 10 × 6
#> # Groups:   location, year [2]
#>    position location  year adult   Lag AdultDiff
#>       <dbl> <chr>    <dbl> <dbl> <dbl>     <dbl>
#>  1        1 BO        2016     1    NA        NA
#>  2        2 BO        2016     3     1         2
#>  3        3 BO        2016     5     3         2
#>  4        4 BO        2016     6     5         1
#>  5        5 BO        2016     8     6         2
#>  6        6 AL        2015     5    NA        NA
#>  7        7 AL        2015    10     5         5
#>  8        8 AL        2015    15    10         5
#>  9        9 AL        2015    20    15         5
#> 10       10 AL        2015    25    20         5

Created on 2023-07-18 with reprex v2.0.2

The problem is that, for the data in the 1st position, I don't want NA but I want the value in adult in position 1.

DBnew <- DB |> group_by(location,year) |> 
  mutate(Lag = lag(adult),
         AdultDiff = ifelse(is.na(adult - Lag),adult,adult - Lag))

?

It still gives me NA unfortunatly,
do you think there is a solution?

The method of calculating (adult - Lag) implicitly assumes that the first value of Lag is 0 for each group. Do you want to make that assumption explicit and set the value to 0 in the Lag column?

library(tidyverse)

DB <- data.frame(position = c(1,2,3,4,5,6,7,8,9,10),
                 location = c('BO', 'BO', 'BO', 'BO', 'BO', 'AL', 'AL', 'AL', 'AL', 'AL'),
                 year = c(2016, 2016, 2016, 2016, 2016, 2015, 2015, 2015, 2015, 2015),
                 adult = c( 1, 3, 5, 6, 8, 5, 10, 15, 20, 25))

# implicitly assume the first lagged value is 0, so adult - Lag = adult - 0 = adult

DB |> group_by(location,year) |> 
  mutate(Lag = lag(adult),
         AdultDiff = if_else(is.na(adult - Lag), adult, adult - Lag))
#> # A tibble: 10 × 6
#> # Groups:   location, year [2]
#>    position location  year adult   Lag AdultDiff
#>       <dbl> <chr>    <dbl> <dbl> <dbl>     <dbl>
#>  1        1 BO        2016     1    NA         1
#>  2        2 BO        2016     3     1         2
#>  3        3 BO        2016     5     3         2
#>  4        4 BO        2016     6     5         1
#>  5        5 BO        2016     8     6         2
#>  6        6 AL        2015     5    NA         5
#>  7        7 AL        2015    10     5         5
#>  8        8 AL        2015    15    10         5
#>  9        9 AL        2015    20    15         5
#> 10       10 AL        2015    25    20         5

# explicitly assume that the first lagged value of adult is 0

DB |> group_by(location,year) |> 
  mutate(Lag = if_else(is.na(lag(adult)), 0, lag(adult)),
         AdultDiff = adult - Lag
         )
#> # A tibble: 10 × 6
#> # Groups:   location, year [2]
#>    position location  year adult   Lag AdultDiff
#>       <dbl> <chr>    <dbl> <dbl> <dbl>     <dbl>
#>  1        1 BO        2016     1     0         1
#>  2        2 BO        2016     3     1         2
#>  3        3 BO        2016     5     3         2
#>  4        4 BO        2016     6     5         1
#>  5        5 BO        2016     8     6         2
#>  6        6 AL        2015     5     0         5
#>  7        7 AL        2015    10     5         5
#>  8        8 AL        2015    15    10         5
#>  9        9 AL        2015    20    15         5
#> 10       10 AL        2015    25    20         5

Created on 2023-07-19 with reprex v2.0.2

This topic was automatically closed 42 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.