help creating complex indicator in r

i have a binary column called "in_zoo". its true or false. i want to group by monkey and arrange by date and monkey. so lowest date on top for a given group monkey. for a given row, i want to do a cumulative sum(e.g.cumsum in R) to see how many rows PRIOR have a "TRUE" in "in_zoo" by creating new variable called "past_zoo". i do not want to include the current row's value of "zoo" in the creation of "past_zoo" . I need to ensure the new indicator does not include the current row value.

this is what I have so far but I know its not right:

data %>%
group_by(monkey) %>%
arrange(monkey, date) %>%
mutate(past_zoo = lag(cumsum(in_zoo), default = FALSE)) %>%
ungroup()

any help would be so appreciated, thank you all.

for the output I need , see example see here:

monkey date in_zoo past_zoo
Adam   1/1/2010 TRUE 0
Adam   1/19/2010 FALSE 1
Adam   1/25/2010 TRUE 1
Adam   1/31/2010 TRUE 2
Adam   2/1/2010 FALSE 3

note the first row "past_zoo" value MUST always be 0 for a given monkey and their earliest date.

Thank u all so much

Thanks for providing code. Could you kindly take further steps to make it easier for other forum users to help you? Share some representative data that will enable your code to run and show the problematic behaviour.

How do I share data for a reprex?

You might use tools such as the library datapasta, or the base function dput() to share a portion of data in code form, i.e. that can be copied from forum and pasted to R session.

Reprex Guide

Advice.
If you have code that process many files, in iterative fashion, sre the repetitions relevant to the problem, or would you see the issue with a single file ? If so your example should only concern a single file.
If your issue does not relate to file reading , i.e . You have no problem loading your raw data, your problem is manipulating/processing it, then you should modify your example to exclude all file loading code and substitute that code with example data that you prepared following the guide.

1 Like

thank u nimra!! . I add reprex below. Basically I need help creating the code to generate "past_zoo" indicator but thats what I want it to look like. thank you all so much.

structure(list(monkey = c("Adam", "Adam", "Adam", "Adam", "Adam", 
"Ryan", "Ryan", "Ryan", "Ryan"), date = structure(c(1262304000, 
1262304000, 1264377600, 1264896000, 1264982400, 1267056000, 1268870400, 
1270080000, 1304985600), class = c("POSIXct", "POSIXt"), tzone = "UTC"), 
    in_zoo = c(TRUE, FALSE, TRUE, TRUE, FALSE, TRUE, TRUE, FALSE, 
    FALSE), past_zoo = c(0, 1, 1, 2, 3, 0, 1, 2, 2)), class = c("tbl_df", 
"tbl", "data.frame"), row.names = c(NA, -9L))

Can you please say more about how you know it to not be right ? It seems to produce what it seemed you were asking to produce...

I just added that column in manually in Excel, I didnt produce it in R

yet, if its omitted from the input and your code is run, it gets produced ...

library(tidyverse)

example_data <- structure(list(monkey = c("Adam", "Adam", "Adam", "Adam", "Adam", 
                          "Ryan", "Ryan", "Ryan", "Ryan"), date = structure(c(1262304000, 
                                                                              1262304000, 1264377600, 1264896000, 1264982400, 1267056000, 1268870400, 
                                                                              1270080000, 1304985600), class = c("POSIXct", "POSIXt"), tzone = "UTC"), 
               in_zoo = c(TRUE, FALSE, TRUE, TRUE, FALSE, TRUE, TRUE, FALSE, 
                          FALSE), past_zoo = c(0, 1, 1, 2, 3, 0, 1, 2, 2)), class = c("tbl_df", 
                                                                                      "tbl", "data.frame"), row.names = c(NA, -9L))
example_data_no_past_zoo <- example_data |> select(-past_zoo)

new_calced <- example_data_no_past_zoo %>%
  group_by(monkey) %>%
  arrange(monkey, date) %>%
  mutate(past_zoo = lag(cumsum(in_zoo), default = FALSE)) %>%
  ungroup()

example_data$past_zoo
new_calced$past_zoo
# same ...

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.