I want to create a variable that depends on itself as well as other variable values.
I think the simplest that I can describe the problem is as per below.
Variable want
is a cumulative sum. If the variable c
== F then it adds b
otherwise it adds a
so the first 4 elements is the regular cumsum(1:4)
but then the next element is want[4] + a[5]
which is 11
next element is want[5] + b[6]
which is 17
and so on
df <- data.frame(
a = rep(1,10),
b = 1:10,
c = c(F,F,F,F,T,F,T,T,F,T),
want = c(1,3,6,10,11,17,18,19,28,29)
)
FJCC
February 29, 2024, 2:53am
2
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
df <- data.frame(
a = rep(1,10),
b = 1:10,
c = c(F,F,F,F,T,F,T,T,F,T),
want = c(1,3,6,10,11,17,18,19,28,29)
)
df
#> a b c want
#> 1 1 1 FALSE 1
#> 2 1 2 FALSE 3
#> 3 1 3 FALSE 6
#> 4 1 4 FALSE 10
#> 5 1 5 TRUE 11
#> 6 1 6 FALSE 17
#> 7 1 7 TRUE 18
#> 8 1 8 TRUE 19
#> 9 1 9 FALSE 28
#> 10 1 10 TRUE 29
df |> mutate(want2 = cumsum(ifelse(c,a,b)))
#> a b c want want2
#> 1 1 1 FALSE 1 1
#> 2 1 2 FALSE 3 3
#> 3 1 3 FALSE 6 6
#> 4 1 4 FALSE 10 10
#> 5 1 5 TRUE 11 11
#> 6 1 6 FALSE 17 17
#> 7 1 7 TRUE 18 18
#> 8 1 8 TRUE 19 19
#> 9 1 9 FALSE 28 28
#> 10 1 10 TRUE 29 29
Created on 2024-02-28 with reprex v2.0.2
Thanks for this reply @FJCC and for making me look silly .
I simplified the problem too much so can I please ask you to look at another small extension.
Variable want
is a cumulative sum. If the variable c
== F then it adds b
otherwise it adds a
so the first 3 elements is the regular cumsum(1:3)
next because the previous value is > 5
set want[4]
to 0
next condition is T so want[5] <- want[4] + 1
next element is want[5] + b[6]
which is 7
because the previous want value is greater than 5 set 'want[7]` to 0
and so on
df <- data.frame(
a = rep(1,10),
b = 1:10,
c = c(F,F,F,F,T,F,T,T,F,T),
want = c(1,3,6,0,1,7,0,1,10,0)
)
FJCC
March 1, 2024, 1:05am
4
This is a more complicated solution. It relies on the fact that a FALSE comparison returns 0 and a TRUE comparison returns 1.
df <- data.frame(
a = rep(1,10),
b = 1:10,
c = c(F,F,F,F,T,F,T,T,F,T),
want = c(1,3,6,0,1,7,0,1,10,0)
)
df
#> a b c want
#> 1 1 1 FALSE 1
#> 2 1 2 FALSE 3
#> 3 1 3 FALSE 6
#> 4 1 4 FALSE 0
#> 5 1 5 TRUE 1
#> 6 1 6 FALSE 7
#> 7 1 7 TRUE 0
#> 8 1 8 TRUE 1
#> 9 1 9 FALSE 10
#> 10 1 10 TRUE 0
df$want2 <- NA
for(i in 1:nrow(df)) {
if(i == 1) df$want2[1] = df$a[1] * df$c[1] + df$b[1] * !df$c[1]
else df$want2[i] = (df$want2[i-1] + (df$a[i] * df$c[i]) + (df$b[i] * !df$c[i])) * (df$want2[i-1] <= 5)
}
df
#> a b c want want2
#> 1 1 1 FALSE 1 1
#> 2 1 2 FALSE 3 3
#> 3 1 3 FALSE 6 6
#> 4 1 4 FALSE 0 0
#> 5 1 5 TRUE 1 1
#> 6 1 6 FALSE 7 7
#> 7 1 7 TRUE 0 0
#> 8 1 8 TRUE 1 1
#> 9 1 9 FALSE 10 10
#> 10 1 10 TRUE 0 0
Created on 2024-02-29 with reprex v2.0.2
system
Closed
March 26, 2024, 5:54am
5
This topic was automatically closed 7 days after the last reply. New replies are no longer allowed. If you have a query related to it or one of the replies, start a new topic and refer back with a link.