Writing an IF AND code based on 4 different columns with multiple identical rows.

Enigma909 · April 20, 2021, 9:20am

Hello, I am having a coding problem.

I have 4 different Columns. The first is ID’s, the second is Names, the third is Start_Date and the fourth is End_Date. For ID 1 there can be multiple rows with different names and dates. I am trying to write a code that starts out by examining for ID 1, if the name A comes before name B, what is the difference between End_Date for A and Start_Date for B. For example:
ID 1, A, 20-4-2021, 20-5-2021
ID 1, B, 21-4-2021, 22-5-2021
ID 1, C, 22-5-2021, 25-5-2021
So in this example a new column should say 1 day under ID 1.
I was thinking about using a IF AND function, but I am unsure of how to write in R, and whether R is the right program for the task.

Thank you so much in advance.

pieterjanvc · April 21, 2021, 12:18pm

Hi,

Welcome to the RStudio community!

Tip, when you post, try to generate a reprex: A reprex consists of the minimal code and data needed to recreate the issue/question you're having. You can find instructions how to build and share one here:

FAQ: How to do a minimal reproducible example ( reprex ) for beginners Guides & FAQs

A minimal reproducible example consists of the following items: A minimal dataset, necessary to reproduce the issue The minimal runnable code necessary to reproduce the issue, which can be run on the given dataset, and including the necessary information on the used packages. Let's quickly go over each one of these with examples: Minimal Dataset (Sample Data) You need to provide a data frame that is small enough to be (reasonably) pasted on a post, but big enough to reproduce your issue. Let's say, as an example, that you are working with the iris data frame head(iris) #> Sepal.Length Sepal.Width Petal.Length Petal.Width Species #> 1 5.1 3.5 1.4 0.…

If I understand your questions correctly, what you want can be solved by some functions from the tidyverse package (especially the lag / lead ones).

library(tidyverse)

myData = data.frame(
  ID = rep(1:2, each = 3),
  Names = LETTERS[1:3],
  Start_Date = as.Date(c("20-4-2021", "21-4-2021", "22-5-2021"), format = "%d-%m-%Y"),
  End_Date = as.Date(c("20-5-2021", "22-5-2021", "25-5-2021"), format = "%d-%m-%Y")
)

myData = myData %>% 
  group_by(ID) %>% 
  mutate(
    timeDiff = difftime(lag(Start_Date), End_Date, units = "days")
  )

myData
#> # A tibble: 6 x 5
#> # Groups:   ID [2]
#>      ID Names Start_Date End_Date   timeDiff
#>   <int> <chr> <date>     <date>     <drtn>  
#> 1     1 A     2021-04-20 2021-05-20  NA days
#> 2     1 B     2021-04-21 2021-05-22 -32 days
#> 3     1 C     2021-05-22 2021-05-25 -34 days
#> 4     2 A     2021-04-20 2021-05-20  NA days
#> 5     2 B     2021-04-21 2021-05-22 -32 days
#> 6     2 C     2021-05-22 2021-05-25 -34 days

^{Created on 2021-04-21 by the reprex package (v2.0.0)}

The group_by function makes sure we only evaluate the data per ID (I created an ID 2 to show this)
I assume you always want to make the difference between the current and previous row, so this can be done using the lag function which shifts data of certain columns by any amount (default 1). Not the NA values where there is no previous row.
The difftime function actually calculated the different in days

If you like to learn more about the Tidyverse, check this link

Hope this helps,
PJ

system · May 12, 2021, 12:18pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.