The hardest part of analysis is posing the question clearly. In R
that usually involves focusing on the what than the how.
Every R
problem can be thought of with advantage as the interaction of three objects— an existing object, x , a desired object,y , and a function, f, that will return a value of y given x as an argument. In other words, school algebra— f(x) = y. Any of the objects can be composites.
Here, x
is the data frame, D
, composed of three variables, two character and one numeric, where the first of the character variables, Groups
serves as a grouping variable.
The desired result, y
is another data frame that differs from D
only in respect of its N
variable. Within each group, every element of N
that has a value of 1
is set to 2
if the corresponding value of R
is "Y" or if a preceding value of N
has been so set; otherwise, the value of N
remains unchanged.
To compose f
, three functions are applied, as described in the comments.
## Data
Groups <- c("A", "A", "A", "A", "B", "B", "B", "B", "C", "C")
N <- c(1,1,1,1,1,1,1,1,1,1)
R <- c("N", "N", "Y", "N", "N", "N", "Y", "N", "N", "N")
D <- data.frame(Groups, N, R)
# inspect
D
#> Groups N R
#> 1 A 1 N
#> 2 A 1 N
#> 3 A 1 Y
#> 4 A 1 N
#> 5 B 1 N
#> 6 B 1 N
#> 7 B 1 Y
#> 8 B 1 N
#> 9 C 1 N
#> 10 C 1 N
## Functions
# make a list of data frames by Groups variable
# where x is data frame and y is grouping variable, UN-quoted
get_group <- function(x,y) split(x,y)
# convert R variable from char to logical
make_lgl <- function(x) ifelse(x == "N", FALSE, TRUE)
# for numeric calculation, FALSE evaluates to 0 and
# TRUE evaluates to 1; therefore, we can convert the
# cumsum of the transformed R var to detect when
# the condition R == "Y" has been met and change the
# value in N from 1 to 2 for the current and all
# subsequent values, x is a data frame, y is it's column
# name (quoted) with the condition to be tested, e.g.,
# flip_N(D,"R"); returns a vector of like length
# that will be used to replace an existing vector
flip_N <- function(x,y) ifelse(cumsum(make_lgl(x[y])) >= 1,2,1)
# Main
# create a list of data frames composed of D split by group
the_groups <- get_group(D,Groups)
# iterate over the_groups, modifying in place
for(i in seq_along(get_group(D,Groups))) {
the_groups[i][[1]][[2]] = flip_N(the_groups[i][[1]],"R")
}
# unsplit the_groups into a single data frame
unsplit(the_groups,Groups)
#> Groups N R
#> 1 A 1 N
#> 2 A 1 N
#> 3 A 2 Y
#> 4 A 2 N
#> 5 B 1 N
#> 6 B 1 N
#> 7 B 2 Y
#> 8 B 2 N
#> 9 C 1 N
#> 10 C 1 N
The way that comes to mind for a dplyr
solution is to use dplyr::group_by
and tidyr::nest()
, which will produce
D %>% dplyr::group_by(Groups) %>% tidyr::nest()
# A tibble: 3 × 2
# Groups: Groups [3]
Groups data
<chr> <list>
1 A <tibble [4 × 2]>
2 B <tibble [4 × 2]>
3 C <tibble [2 × 2]>
I'll take another look at a dplyr
solution if no one else posts one.