Calculate duration while counting overlalling years once only

Hi,

I have a large job-exposure database, and a subject can be exposed to an agent through different jobs. For each job, I have the start year and end year. But my problem is that there are overlapping periods between the jobs, and I want to find the total duration of exposure to the agent with R while counting for the overlapping years only once. I added an example with the subject 2 , who were exposed to agent A with his job 2, job 3 and job 5 and we have the the YEARIN and YEAROUT for each job. I wanted to use the function mutate but I'm not sure how to take into account the overlapping years.

Thank you,

M

See the FAQ: How to do a minimal reproducible example reprex for beginners. This is probably doable, but make it easier by posting some usable data?

I will read it thank you so much!

here is a usable data :

ID JOB AGENT YEARIN YEAROUT
2 1 A 1998 2009
2 2 A 1996 2000
2 7 B 1979 1979
2 8 B 1978 1982
7 1 B 1973 2007
7 1 A 1979 2007
15 1 A 1975 2009
18 1 D 1980 1981
18 2 D 1980 1985
18 4 D 1982 1987
18 2 A 1978 1979
18 3 A 1976 1982
18 2 B 1976 1979
20 3 C 1988 1993
20 4 C 1996 2002
20 6 C 1985 2000
21 3 A 1989 1992

For these data, each AGENT has complete overlap, so I haven't taken this any further.

# functions

find_overlaps <- function(x) {
# begin helper functions
# identify agents
  id_agents <- function(x) unique(x$AGENT)
  
  # detect if a pair of rows overlap
  is_overlap <- function(x,y) length(intersect(spans[x][[1]],spans[y][[1]])) > 0
  
  # create subsets consisting of a single agent
  get_agent <- function(x){
    agent = dat[dat$AGENT == x,]
    agent = agent[order(agent$YEARIN),]
    as.matrix(agent[4:5])
  }
  
  # list of all the years included in each row
  get_spans <- function(x) apply(x,1,make_span)
  
  # for each row create a list of all years from YEARIN to YEAROUT
  make_span <- function(x) x[1]:x[2]
  
  # all combination of rows taken two at a time
  find_tests <- function(x) combinat::combn(1:length(x),2)
# end helper functions  
  
  spans = get_spans(get_agent(x))
  the_tests = find_tests(spans)
  apply(the_tests,2,is_overlap)
}

# data

dat <- data.frame(
  ID =
    c(2, 2, 2, 2, 7, 7, 15, 18, 18, 18, 18, 18, 18, 20, 20, 20, 21),
  JOB =
    c(1, 2, 7, 8, 1, 1, 1, 1, 2, 4, 2, 3, 2, 3, 4, 6, 3),
  AGENT =
    c("A", "A", "B", "B", "B", "A", "A", "D", "D", "D", "A", "A", "B", "C", "C", "C", "A"),
  YEARIN =
    c(1998, 1996, 1979, 1978, 1973, 1979, 1975, 1980, 1980, 1982, 1978, 1976, 1976, 1988, 1996, 1985, 1989),
  YEAROUT =
    c(2009, 2000, 1979, 1982, 2007, 2007, 2009, 1981, 1985, 1987, 1979, 1982, 1979, 1993, 2002, 2000, 1992))

# main

lapply(unique(dat$AGENT),find_overlaps)
#> [[1]]
#>  [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
#> [16] TRUE TRUE TRUE TRUE TRUE TRUE
#> 
#> [[2]]
#> [1] TRUE TRUE TRUE TRUE TRUE TRUE
#> 
#> [[3]]
#> [1] TRUE TRUE TRUE
#> 
#> [[4]]
#> [1] TRUE TRUE TRUE

Interpretation: The return value is a list of AGENTS, each of which has more than one record (row). For each agent, the Booleans return the result of comparison of each combination of each AGENT with each other agent. TRUE indicates overlap. FALSE, if there were any indicates the absence. Missing is a function to return the earliest and latest of the overlapping years.

Hi ,

Thank you so much for your help! It will help to find the overlaps, now I will check how to calculate the duration while counting for the overlapping years once only.

Have a good day,
M

To do this, simply find the number of distinct years.

Perfect, thank you so much for your help!!

1 Like

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.