Dear everyone,
I'm trying to solve one data problem but I cannot seem to find a solution. I tried many approaches but it seems that I'm stuck so I thought I'll reach out for some help.
I'm working with European Union database (available online https://ec.europa.eu/info/funding-tenders/opportunities/portal/screen/opportunities/horizon-dashboard). Below is a very short example of the data.
structure(list(project_nbr = c(740477, 653212, 833389, 101021274,
883371, 883441), general_pic = c(998709188, 998709188, 998709188,
998709188, 998709188, 998709188), signature_date = c("17/04/2017",
"23/07/2015", "29/04/2019", "26/04/2021", "29/04/2020", "22/04/2020"
)), row.names = c(NA, -6L), class = c("tbl_df", "tbl", "data.frame"
))
'project_nbr' is simply a project identification number
'general_pic' is organizations identification number
'signature_date' is a date when an organization signed for a project
So one can see that the same organization can participate in many different project and - you cannot see it in the example data but - different projects always contain different organizations.
Now, I would like to create a variable that shows if an organization participated in 2014, 2015 or 2016 then it would be counted as NOT a newcomer but if it participated only after 2016 (no included) then it would be counted as a newcomer. I can do something like this:
df <- structure(list(project_nbr = c(740477, 653212, 833389, 101021274,
883371, 883441), general_pic = c(998709188, 998709188, 998709188,
998709188, 998709188, 998709188), signature_date = c("17/04/2017",
"23/07/2015", "29/04/2019", "26/04/2021", "29/04/2020", "22/04/2020"
)), row.names = c(NA, -6L), class = c("tbl_df", "tbl", "data.frame"
))
df %>%
mutate(newcomer = if_else(str_detect(signature_date, "2014|2015|2016"), "No", "Yes"))
But as you can see below, different rows get different results (e.g. row 2 in 'newcomer' variable == "No" while all other rows == "Yes") even though it is the same organization.
What I would like to have instead is a 'newcomer' variable that would say "No" if an organization participated in year 2014-2016 for all rows. Like in the example below.
If someone has some idea how to approach this problem it is very welcomed!
Thank you!