How to alter every row after a certain condition has been met

lachyricho12 · October 1, 2021, 4:56am

I am trying to alter a variable in every row that occurs after a specific condition is met as indicated by a second variable. Preferably by using dplyr if possible, but I could not find any function/combination that has allowed me to do so thus far. A simplified version of my data can be generated as follows:

N <- c(1,1,1,1,1,1)
R <- c("N", "N", "Y", "N", "N", "N")
Dat <- as.data.frame(cbind(N, R))

What I need, is for when the 'R' variable indicates a 'Y', that +1 is added to the 'N' variable in that row plus every row that occurs after. Essentially the resulting data frame should look like this:

      N        R
1    1        N
2    1        N
3    2        Y
4    2        N
5    2        N
6    2        N

Any help or guidance would be much appreciated.

technocrat · October 1, 2021, 6:18am

N <- c(1,1,1,1,1,1)
R <- c("N", "N", "Y", "N", "N", "N")

start <- which(R == "Y")

for (i in seq_along(N)) if (i >= start) N[i] = N[i] + 1
N
#> [1] 1 1 2 2 2 2

Comment: dplyr is a useful tool, and it can be made more useful by first addressing the problem with tools in {base}, which often provide a more direct solution.

The snippet takes advantage of two aspects of the problem statement:

In the R vector we have a single unknown—the index position of the first occurrence of "Y"
In the N vector, we use the index from #1 to change the value of N at that index by adding 1, and continue to do so until the end of N.

which always applies a logical test and returns an index.

lachyricho12 · October 2, 2021, 2:09am

Thanks for the response. My apologies, I should have been more specific in how I posed the question. I thought if I kept it simple I would be able to adapt it to my code.

Essentially, the reason I was hoping for a dplyr solution is because I need the loop to also obey a grouping variable.

A better representation of the data is closer to this:

Groups <- c("A", "A", "A", "A", "B", "B", "B", "B", "C", "C")
N <- c(1,1,1,1,1,1,1,1,1,1)
R <- c("N", "N", "Y", "N", "N", "N", "Y", "N", "N", "N")
Dat <- as.data.frame(cbind(Groups, N, R))

With this as the required solution

   Groups N R
1       A 1 N
2       A 1 N
3       A 2 Y
4       A 2 N
5       B 1 N
6       B 1 N
7       B 2 Y
8       B 2 N
9       C 1 N
10      C 1 N

So the loop needs to restart with each new group.

Apologies I should have been more specific from the start.

technocrat · October 4, 2021, 5:38am

The hardest part of analysis is posing the question clearly. In R that usually involves focusing on the what than the how.

Every R problem can be thought of with advantage as the interaction of three objects— an existing object, x , a desired object,y , and a function, f, that will return a value of y given x as an argument. In other words, school algebra— f(x) = y. Any of the objects can be composites.

Here, x is the data frame, D, composed of three variables, two character and one numeric, where the first of the character variables, Groups serves as a grouping variable.

The desired result, y is another data frame that differs from D only in respect of its N variable. Within each group, every element of N that has a value of 1 is set to 2 if the corresponding value of R is "Y" or if a preceding value of N has been so set; otherwise, the value of N remains unchanged.

To compose f, three functions are applied, as described in the comments.

## Data

Groups <- c("A", "A", "A", "A", "B", "B", "B", "B", "C", "C")
N <- c(1,1,1,1,1,1,1,1,1,1)
R <- c("N", "N", "Y", "N", "N", "N", "Y", "N", "N", "N")
D <- data.frame(Groups, N, R)

# inspect
D
#>    Groups N R
#> 1       A 1 N
#> 2       A 1 N
#> 3       A 1 Y
#> 4       A 1 N
#> 5       B 1 N
#> 6       B 1 N
#> 7       B 1 Y
#> 8       B 1 N
#> 9       C 1 N
#> 10      C 1 N

## Functions

# make a list of data frames by Groups variable
# where x is data frame and y is grouping variable, UN-quoted
get_group   <- function(x,y) split(x,y) 

# convert R variable from char to logical
make_lgl <- function(x) ifelse(x == "N", FALSE, TRUE)

# for numeric calculation, FALSE evaluates to 0 and
# TRUE evaluates to 1; therefore, we can convert the
# cumsum of the transformed R var to detect when
# the condition R == "Y" has been met and change the
# value in N from 1 to 2 for the current and all
# subsequent values, x is a data frame, y is it's column
# name (quoted) with the condition to be tested, e.g.,
# flip_N(D,"R"); returns a vector of like length
# that will be used to replace an existing vector
flip_N <- function(x,y) ifelse(cumsum(make_lgl(x[y])) >= 1,2,1)

# Main

# create a list of data frames composed of D split by group
the_groups <- get_group(D,Groups)

# iterate over the_groups, modifying in place

for(i in seq_along(get_group(D,Groups))) {
  the_groups[i][[1]][[2]] = flip_N(the_groups[i][[1]],"R")
} 

# unsplit the_groups into a single data frame

unsplit(the_groups,Groups)
#>    Groups N R
#> 1       A 1 N
#> 2       A 1 N
#> 3       A 2 Y
#> 4       A 2 N
#> 5       B 1 N
#> 6       B 1 N
#> 7       B 2 Y
#> 8       B 2 N
#> 9       C 1 N
#> 10      C 1 N

The way that comes to mind for a dplyr solution is to use dplyr::group_by and tidyr::nest(), which will produce

 D %>% dplyr::group_by(Groups) %>% tidyr::nest()
# A tibble: 3 × 2
# Groups:   Groups [3]
  Groups data            
  <chr>  <list>          
1 A      <tibble [4 × 2]>
2 B      <tibble [4 × 2]>
3 C      <tibble [2 × 2]>

I'll take another look at a dplyr solution if no one else posts one.

nirgrahamuk · October 4, 2021, 9:11am

Technocrat invited me to look at this. I think he already identified that cumsum() is the critical feature.
This is quite concise I think.

D %>% group_by(Groups) %>%
  mutate(N=N+cumsum(R=="Y"))

apologies if I omitted anything of importance, I can try again if you draw my attention to it.

(Sidenote, this approach does assume that R's contents within a group are sticking to a rule where there is not more than one Y event)

system · October 11, 2021, 9:12am

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.