Calculate duration while counting overlalling years once only

Hi,

I have a large job-exposure database, and a subject can be exposed to an agent through different jobs. For each job, I have the start year and end year. But my problem is that there are overlapping periods between the jobs, and I want to find the total duration of exposure to the agent with R while counting for the overlapping years only once. I added an example with the subject 2 , who were exposed to agent A with his job 2, job 3 and job 5 and we have the the YEARIN and YEAROUT for each job. I wanted to use the function mutate but I'm not sure how to take into account the overlapping years.

Thank you,

M

See the FAQ: How to do a minimal reproducible example `reprex` for beginners. This is probably doable, but make it easier by posting some usable data?

I will read it thank you so much!

here is a usable data :

ID JOB AGENT YEARIN YEAROUT
2 1 A 1998 2009
2 2 A 1996 2000
2 7 B 1979 1979
2 8 B 1978 1982
7 1 B 1973 2007
7 1 A 1979 2007
15 1 A 1975 2009
18 1 D 1980 1981
18 2 D 1980 1985
18 4 D 1982 1987
18 2 A 1978 1979
18 3 A 1976 1982
18 2 B 1976 1979
20 3 C 1988 1993
20 4 C 1996 2002
20 6 C 1985 2000
21 3 A 1989 1992

For these data, each `AGENT` has complete overlap, so I haven't taken this any further.

``````# functions

find_overlaps <- function(x) {
# begin helper functions
# identify agents
id_agents <- function(x) unique(x\$AGENT)

# detect if a pair of rows overlap
is_overlap <- function(x,y) length(intersect(spans[x][[1]],spans[y][[1]])) > 0

# create subsets consisting of a single agent
get_agent <- function(x){
agent = dat[dat\$AGENT == x,]
agent = agent[order(agent\$YEARIN),]
as.matrix(agent[4:5])
}

# list of all the years included in each row
get_spans <- function(x) apply(x,1,make_span)

# for each row create a list of all years from YEARIN to YEAROUT
make_span <- function(x) x[1]:x[2]

# all combination of rows taken two at a time
find_tests <- function(x) combinat::combn(1:length(x),2)
# end helper functions

spans = get_spans(get_agent(x))
the_tests = find_tests(spans)
apply(the_tests,2,is_overlap)
}

# data

dat <- data.frame(
ID =
c(2, 2, 2, 2, 7, 7, 15, 18, 18, 18, 18, 18, 18, 20, 20, 20, 21),
JOB =
c(1, 2, 7, 8, 1, 1, 1, 1, 2, 4, 2, 3, 2, 3, 4, 6, 3),
AGENT =
c("A", "A", "B", "B", "B", "A", "A", "D", "D", "D", "A", "A", "B", "C", "C", "C", "A"),
YEARIN =
c(1998, 1996, 1979, 1978, 1973, 1979, 1975, 1980, 1980, 1982, 1978, 1976, 1976, 1988, 1996, 1985, 1989),
YEAROUT =
c(2009, 2000, 1979, 1982, 2007, 2007, 2009, 1981, 1985, 1987, 1979, 1982, 1979, 1993, 2002, 2000, 1992))

# main

lapply(unique(dat\$AGENT),find_overlaps)
#> [[1]]
#>  [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
#> [16] TRUE TRUE TRUE TRUE TRUE TRUE
#>
#> [[2]]
#> [1] TRUE TRUE TRUE TRUE TRUE TRUE
#>
#> [[3]]
#> [1] TRUE TRUE TRUE
#>
#> [[4]]
#> [1] TRUE TRUE TRUE
``````

Interpretation: The return value is a list of `AGENTS`, each of which has more than one record (row). For each agent, the Booleans return the result of comparison of each combination of each `AGENT` with each other agent. `TRUE` indicates overlap. `FALSE`, if there were any indicates the absence. Missing is a function to return the earliest and latest of the overlapping years.

Hi ,

Thank you so much for your help! It will help to find the overlaps, now I will check how to calculate the duration while counting for the overlapping years once only.

Have a good day,
M

To do this, simply find the number of distinct years.

Perfect, thank you so much for your help!!

1 Like

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.