Assigning values to the whole column and not just rows.

Dear everyone,

I'm trying to solve one data problem but I cannot seem to find a solution. I tried many approaches but it seems that I'm stuck so I thought I'll reach out for some help.

I'm working with European Union database (available online https://ec.europa.eu/info/funding-tenders/opportunities/portal/screen/opportunities/horizon-dashboard). Below is a very short example of the data.

structure(list(project_nbr = c(740477, 653212, 833389, 101021274, 
883371, 883441), general_pic = c(998709188, 998709188, 998709188, 
998709188, 998709188, 998709188), signature_date = c("17/04/2017", 
"23/07/2015", "29/04/2019", "26/04/2021", "29/04/2020", "22/04/2020"
)), row.names = c(NA, -6L), class = c("tbl_df", "tbl", "data.frame"
))

'project_nbr' is simply a project identification number
'general_pic' is organizations identification number
'signature_date' is a date when an organization signed for a project

So one can see that the same organization can participate in many different project and - you cannot see it in the example data but - different projects always contain different organizations.

Now, I would like to create a variable that shows if an organization participated in 2014, 2015 or 2016 then it would be counted as NOT a newcomer but if it participated only after 2016 (no included) then it would be counted as a newcomer. I can do something like this:

df <- structure(list(project_nbr = c(740477, 653212, 833389, 101021274, 
883371, 883441), general_pic = c(998709188, 998709188, 998709188, 
998709188, 998709188, 998709188), signature_date = c("17/04/2017", 
"23/07/2015", "29/04/2019", "26/04/2021", "29/04/2020", "22/04/2020"
)), row.names = c(NA, -6L), class = c("tbl_df", "tbl", "data.frame"
))

df %>%
  mutate(newcomer = if_else(str_detect(signature_date, "2014|2015|2016"), "No", "Yes"))

But as you can see below, different rows get different results (e.g. row 2 in 'newcomer' variable == "No" while all other rows == "Yes") even though it is the same organization.

image

What I would like to have instead is a 'newcomer' variable that would say "No" if an organization participated in year 2014-2016 for all rows. Like in the example below.

image

If someone has some idea how to approach this problem it is very welcomed!
Thank you!

Hi @Paulius,

I think this should do it:

library(tidyverse)

df <- structure(
  list(project_nbr = c(740477, 653212, 833389, 101021274, 883371, 883441),
       general_pic = c(998709188, 998709188, 998709188, 998709188, 
                       998709188, 998709188), 
       signature_date = c("17/04/2017", "23/07/2015", "29/04/2019", 
                          "26/04/2021", "29/04/2020", "22/04/2020")), 
  row.names = c(NA, -6L), class = c("tbl_df", "tbl", "data.frame"))

df %>%
  group_by(general_pic) %>% 
  mutate(
    newcomer = !any(str_detect(signature_date, "2014|2015|2016"))
  ) %>% 
  ungroup()
#> # A tibble: 6 × 4
#>   project_nbr general_pic signature_date newcomer
#>         <dbl>       <dbl> <chr>          <lgl>   
#> 1      740477   998709188 17/04/2017     FALSE   
#> 2      653212   998709188 23/07/2015     FALSE   
#> 3      833389   998709188 29/04/2019     FALSE   
#> 4   101021274   998709188 26/04/2021     FALSE   
#> 5      883371   998709188 29/04/2020     FALSE   
#> 6      883441   998709188 22/04/2020     FALSE
1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.