Fill the values of a column of a base with certain values of another column of another table.

I have this database:

    id    x       y   z         
   <int> <chr> <dbl> <date>    
     1    A       1    NA        
     2    C       3    NA        
     3    C       3    NA        
     4    C       2    NA        
     5    B       2    2019-08-04
     6    C       1    2019-09-18
     7    B       3    2019-12-17
     8    A       2    2019-11-02
     9    A       3    2020-03-16
    10    A       1    2020-01-31

And this data frame:

      id      date
      <dbl> <date>    
      1      2012-09-25
      1      2012-03-26
      1      2012-11-12
      2      2013-01-24
      2      2012-05-04
      2      2012-02-24
      3      2012-05-30
      3      2012-02-15
      4      2012-03-13
      4      2012-05-18

I need to fill the NA values ​​of the z column with the oldest dates of the data.frame ids. How can it be done? My desired result is this:

     id   x       y     z         
   <int> <chr> <dbl> <date>    
     1    A       1    2012-03-26        
     2    C       3    2012-02-24        
     3    C       3    2012-02-15        
     4    C       2    2012-03-13        
     5    B       2    2019-08-04
     6    C       1    2019-09-18
     7    B       3    2019-12-17
     8    A       2    2019-11-02
     9    A       3    2020-03-16
    10    A       1    2020-01-31

Hi Juan,

I used the group_by and summarize functions from the tidyverse package to calculate the earliest dates and then joined them to the table with missing data.


install.packages("tidyverse")

library(tidyverse)

# Create the first table

id1 <- c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)

x <- c("A", "C", "C", "C", "B", "C", "B", "A", "A", "A")

y <- c(1, 3, 3, 2, 2, 1, 3, 2, 3, 1)

z <- c(NA, NA, NA, NA, "2019-08-04", "2019-09-18", "2019-12-17", "2019-11-02", "2020-03-16", "2020-01-31")

table1 <- data.frame(id = id1, x, y, z = as.Date(z))

# Create the second table

id2 <- c(1, 1, 1, 2, 2, 2, 3, 3, 4, 4)

date <- c("2012-09-25", "2012-03-26", "2012-11-12", "2012-05-04", "2012-02-24", "2012-05-30", "2012-02-15", "2012-02-15", "2012-03-13", "2012-05-18")

table2 <- data.frame(id = id2, date = as.Date(date))

# Calculate earliest dates

firstDates <-
    table2 |> 
    group_by(id) |> 
    summarize(date = min(date))

# Replace NAs

table1 |> 
    left_join(firstDates, by = "id") |> 
    mutate(z = case_when(
        is.na(z) ~ date, 
        TRUE ~ z
    )) |> 
    select(-c(date))

Let me know if you have any questions about what I did.

Matt

Hi, thanks, that's a good answer. but what if the data frame ids are not in the same order as the database ids? In that case the answer does not work, how could I do?

The left_join() function will take care of aligning the id values in the two data sets. The order of the ids does not matter.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.