How to create a dummy if one variable matches another (by group id) in a time series df?

annab · November 23, 2021, 1:44am

I have a large data frame that looks like this.

I am pasting the dput output of my df below:


Polish_panel_hist_06 <- structure(list(id = c(32, 32, 32, 32, 32, 32, 32, 12668031110,12668031110, 12668031110), survey_date = structure(c(17167, 17167,17167, 17167, 17167, 17167, 17167, 15034, 15034, 15034), class = "Date"),survey_year = c(2017, 2017, 2017, 2017, 2017, 2017, 2017,2011, 2011, 2011), mom_dob = c(1991, 1991, 1991, 1991, 1991,1991, 1991, 1987, 1987, 1987), date = structure(c(10592,10957, 11323, 11688, 12053, 12418, 12784, 14304, 14669, 15034), class = "Date"), date_year = c(1999, 2000, 2001, 2002,2003, 2004, 2005, 2009, 2010, 2011), mom_age = c(7, 8, 9,10, 11, 12, 13, 21, 22, 23), newborn = c(0, 0, 0, 0, 0, 0,0, 0, 0, 0), stock = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1), family500_year = c(1,1, 1, 1, 1, 1, 1, 0, 0, 0), nchild1 = c(2015, 2015, 2015,2015, 2015, 2015, 2015, NA, NA, NA), nchild2 = c(NA, NA,NA, NA, NA, NA, NA, 2010, 2010, 2010), nchild3 = c(NA_real_,NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,NA_real_, NA_real_, NA_real_), nchild4 = c(NA_real_, NA_real_,NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,NA_real_, NA_real_), nchild5 = c(NA_real_, NA_real_, NA_real_,NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,NA_real_), nchild6 = c(NA_real_, NA_real_, NA_real_, NA_real_,NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_), nchild7 = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), nchild8 = c(NA,NA, NA, NA, NA, NA, NA, NA, NA, NA), nchild9 = c(NA, NA,NA, NA, NA, NA, NA, NA, NA, NA), nchild10 = c(NA, NA, NA,NA, NA, NA, NA, NA, NA, NA), educcat = c(2, 2, 2, 2, 2, 2,2, 3, 3, 3), educcat_college = c(0, 0, 0, 0, 0, 0, 0, 1,1, 1), hh_income_net = c(3410, 3410, 3410, 3410, 3410, 3410,3410, 7978.7001953125, 7978.7001953125, 7978.7001953125),hh_income_annual_usd = c(10912, 10912, 10912, 10912, 10912,10912, 10912, 25531.840625, 25531.840625, 25531.840625),hh_income_annual_log = c(9.29761838008324, 9.29761838008324,9.29761838008324, 9.29761838008324, 9.29761838008324, 9.29761838008324,9.29761838008324, 10.1476816041898, 10.1476816041898, 10.1476816041898), marital_stat = c(10, 10, 10, 10, 10, 10, 10, 20, 20, 20), maritalcat = c(0, 0, 0, 0, 0, 0, 0, 1, 1, 1), rural = c(1,1, 1, 1, 1, 1, 1, 0, 0, 0), age_sq = c(625, 625, 625, 625,625, 625, 625, 529, 529, 529), emp_stat = c(3, 3, 3, 3, 3,3, 3, 6, 6, 6), occupation = c("98", "98", "98", "98", "98","98", "98", "48", "48", "48"), disability_stat = c(2, 2,2, 2, 2, 2, 2, 2, 2, 2), weight = c(1039, 1039, 1039, 1039,1039, 1039, 1039, 1457, 1457, 1457), region = c(2, 2, 2,2, 2, 2, 2, 12, 12, 12), birth_country = c(1, 1, 1, 1, 1,1, 1, 1, 1, 1), birth_citizenship = c(1, 1, 1, 1, 1, 1, 1,1, 1, 1)), row.names = c(NA, -10L), class = c("tbl_df", "tbl","data.frame"))

It's a df containing the fertility history of each woman. There are 18 rows for each woman because I'm reconstructing her births from the past 18 years.

Now, what I'm trying to do is to generate a dummy variable for newborn, where the value of nchild1, nchild2, nchild3, etc matches date_year. For example, for woman id #62, if nchild1 is 2004, I want newborn to be 1 in the same row as date_year=2004 AND 1 in the same row as date_year=2005 (and the rest to be 0s). Relatedly, I want stock (i.e. the number of children) to turn to 1 the year the first child is born (in this case, 2004) and remain 1 until the last year observed (2017).


## example df of what I have now:

`id   date_year  newborn   stock   nchild1  nchild2  nchchild3   ` `62       1996      0        2      2004      2005       NA     ` `62       1997      0        2      2004      2005       NA     ` `62       1998      0        2      2004      2005       NA      ` `62       1999      0        2      2004      2005       NA     ` `62       2000      0        2      2004      2005       NA     ` `62       2001      0        2      2004      2005       NA     ` `62       2002      0        2      2004      2005       NA      ` `62       2003      0        2      2004      2005       NA       ` `62       2004      0        2      2004      2005       NA     ` `62       2005      0        2      2004      2005       NA     ` `62       2006      0        2      2004      2005       NA     ` `62       2007      0        2      2004      2005       NA     ` `62       2008      0        2      2004      2005       NA     ` `62       2009      0        2      2004      2005       NA     ` `62       2010      0        2      2004      2005       NA      ` `62       2011      0        2      2004      2005       NA` `62       2012      0        2      2004      2005       NA     ` `62       2013      0        2      2004      2005       NA` `62       2014      0        2      2004      2005       NA`


## desired df

`id   date_year  newborn   stock   nchild1  nchild2  nchchild3   ` `62       1996      0        0      2004      2005       NA     ` `62       1997      0        0      2004      2005       NA     ` `62       1998      0        0      2004      2005       NA      ` `62       1999      0        0      2004      2005       NA     ` `62       2000      0        0      2004      2005       NA     ` `62       2001      0        0      2004      2005       NA     ` `62       2002      0        0      2004      2005       NA      ` `62       2003      0        0      2004      2005       NA       ` `62       2004      1        1      2004      2005       NA     ` `62       2005      1        2      2004      2005       NA     ` `62       2006      0        2      2004      2005       NA     ` `62       2007      0        2      2004      2005       NA     ` `62       2008      0        2      2004      2005       NA     ` `62       2009      0        2      2004      2005       NA     ` `62       2010      0        2      2004      2005       NA      ` `62       2011      0        2      2004      2005       NA` `62       2012      0        2      2004      2005       NA     ` `62       2013      0        2      2004      2005       NA` `62       2014      0        2      2004      2005       NA `

So far I've tried to write code using ifelse, but I'm having trouble figuring out how to denote the condition that date_year==nchild1 | nchild2 | nchild3, etc.


`df$newborn <- ifelse(df$newborn == 1, df$nchild1==df$date_year, df$newborn)` `````

Any and all help would be appreciated, thanks so much.

**Edited to make sure the dput matches the actual df.**

williaml · November 23, 2021, 3:30am

Hi, you talk of id #62, but that isn't in your dput. Can you fix that? Also can you better format the example and desired dfs? It isn't possible to read them.

system · December 14, 2021, 3:31am

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.