I have a large data frame that looks like this.
I am pasting the dput output of my df below:
Polish_panel_hist_06 <- structure(list(id = c(32, 32, 32, 32, 32, 32, 32, 12668031110,12668031110, 12668031110), survey_date = structure(c(17167, 17167,17167, 17167, 17167, 17167, 17167, 15034, 15034, 15034), class = "Date"),survey_year = c(2017, 2017, 2017, 2017, 2017, 2017, 2017,2011, 2011, 2011), mom_dob = c(1991, 1991, 1991, 1991, 1991,1991, 1991, 1987, 1987, 1987), date = structure(c(10592,10957, 11323, 11688, 12053, 12418, 12784, 14304, 14669, 15034), class = "Date"), date_year = c(1999, 2000, 2001, 2002,2003, 2004, 2005, 2009, 2010, 2011), mom_age = c(7, 8, 9,10, 11, 12, 13, 21, 22, 23), newborn = c(0, 0, 0, 0, 0, 0,0, 0, 0, 0), stock = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1), family500_year = c(1,1, 1, 1, 1, 1, 1, 0, 0, 0), nchild1 = c(2015, 2015, 2015,2015, 2015, 2015, 2015, NA, NA, NA), nchild2 = c(NA, NA,NA, NA, NA, NA, NA, 2010, 2010, 2010), nchild3 = c(NA_real_,NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,NA_real_, NA_real_, NA_real_), nchild4 = c(NA_real_, NA_real_,NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,NA_real_, NA_real_), nchild5 = c(NA_real_, NA_real_, NA_real_,NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,NA_real_), nchild6 = c(NA_real_, NA_real_, NA_real_, NA_real_,NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_), nchild7 = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), nchild8 = c(NA,NA, NA, NA, NA, NA, NA, NA, NA, NA), nchild9 = c(NA, NA,NA, NA, NA, NA, NA, NA, NA, NA), nchild10 = c(NA, NA, NA,NA, NA, NA, NA, NA, NA, NA), educcat = c(2, 2, 2, 2, 2, 2,2, 3, 3, 3), educcat_college = c(0, 0, 0, 0, 0, 0, 0, 1,1, 1), hh_income_net = c(3410, 3410, 3410, 3410, 3410, 3410,3410, 7978.7001953125, 7978.7001953125, 7978.7001953125),hh_income_annual_usd = c(10912, 10912, 10912, 10912, 10912,10912, 10912, 25531.840625, 25531.840625, 25531.840625),hh_income_annual_log = c(9.29761838008324, 9.29761838008324,9.29761838008324, 9.29761838008324, 9.29761838008324, 9.29761838008324,9.29761838008324, 10.1476816041898, 10.1476816041898, 10.1476816041898), marital_stat = c(10, 10, 10, 10, 10, 10, 10, 20, 20, 20), maritalcat = c(0, 0, 0, 0, 0, 0, 0, 1, 1, 1), rural = c(1,1, 1, 1, 1, 1, 1, 0, 0, 0), age_sq = c(625, 625, 625, 625,625, 625, 625, 529, 529, 529), emp_stat = c(3, 3, 3, 3, 3,3, 3, 6, 6, 6), occupation = c("98", "98", "98", "98", "98","98", "98", "48", "48", "48"), disability_stat = c(2, 2,2, 2, 2, 2, 2, 2, 2, 2), weight = c(1039, 1039, 1039, 1039,1039, 1039, 1039, 1457, 1457, 1457), region = c(2, 2, 2,2, 2, 2, 2, 12, 12, 12), birth_country = c(1, 1, 1, 1, 1,1, 1, 1, 1, 1), birth_citizenship = c(1, 1, 1, 1, 1, 1, 1,1, 1, 1)), row.names = c(NA, -10L), class = c("tbl_df", "tbl","data.frame"))
It's a df containing the fertility history of each woman. There are 18 rows for each woman because I'm reconstructing her births from the past 18 years.
Now, what I'm trying to do is to generate a dummy variable for newborn, where the value of nchild1, nchild2, nchild3, etc matches date_year. For example, for woman id #62, if nchild1 is 2004, I want newborn to be 1 in the same row as date_year=2004 AND 1 in the same row as date_year=2005 (and the rest to be 0s). Relatedly, I want stock (i.e. the number of children) to turn to 1 the year the first child is born (in this case, 2004) and remain 1 until the last year observed (2017).
## example df of what I have now:
`id date_year newborn stock nchild1 nchild2 nchchild3 ` `62 1996 0 2 2004 2005 NA ` `62 1997 0 2 2004 2005 NA ` `62 1998 0 2 2004 2005 NA ` `62 1999 0 2 2004 2005 NA ` `62 2000 0 2 2004 2005 NA ` `62 2001 0 2 2004 2005 NA ` `62 2002 0 2 2004 2005 NA ` `62 2003 0 2 2004 2005 NA ` `62 2004 0 2 2004 2005 NA ` `62 2005 0 2 2004 2005 NA ` `62 2006 0 2 2004 2005 NA ` `62 2007 0 2 2004 2005 NA ` `62 2008 0 2 2004 2005 NA ` `62 2009 0 2 2004 2005 NA ` `62 2010 0 2 2004 2005 NA ` `62 2011 0 2 2004 2005 NA` `62 2012 0 2 2004 2005 NA ` `62 2013 0 2 2004 2005 NA` `62 2014 0 2 2004 2005 NA`
## desired df
`id date_year newborn stock nchild1 nchild2 nchchild3 ` `62 1996 0 0 2004 2005 NA ` `62 1997 0 0 2004 2005 NA ` `62 1998 0 0 2004 2005 NA ` `62 1999 0 0 2004 2005 NA ` `62 2000 0 0 2004 2005 NA ` `62 2001 0 0 2004 2005 NA ` `62 2002 0 0 2004 2005 NA ` `62 2003 0 0 2004 2005 NA ` `62 2004 1 1 2004 2005 NA ` `62 2005 1 2 2004 2005 NA ` `62 2006 0 2 2004 2005 NA ` `62 2007 0 2 2004 2005 NA ` `62 2008 0 2 2004 2005 NA ` `62 2009 0 2 2004 2005 NA ` `62 2010 0 2 2004 2005 NA ` `62 2011 0 2 2004 2005 NA` `62 2012 0 2 2004 2005 NA ` `62 2013 0 2 2004 2005 NA` `62 2014 0 2 2004 2005 NA `
So far I've tried to write code using ifelse, but I'm having trouble figuring out how to denote the condition that date_year==nchild1 | nchild2 | nchild3, etc.
`df$newborn <- ifelse(df$newborn == 1, df$nchild1==df$date_year, df$newborn)` `````
Any and all help would be appreciated, thanks so much.
**Edited to make sure the dput matches the actual df.**