Calculate duration while counting overlalling years once only

R_help · November 15, 2022, 10:05pm

Hi,

I have a large job-exposure database, and a subject can be exposed to an agent through different jobs. For each job, I have the start year and end year. But my problem is that there are overlapping periods between the jobs, and I want to find the total duration of exposure to the agent with R while counting for the overlapping years only once. I added an example with the subject 2 , who were exposed to agent A with his job 2, job 3 and job 5 and we have the the YEARIN and YEAROUT for each job. I wanted to use the function mutate but I'm not sure how to take into account the overlapping years.

Thank you,

M

technocrat · November 15, 2022, 10:50pm

See the FAQ: How to do a minimal reproducible example reprex for beginners. This is probably doable, but make it easier by posting some usable data?

R_help · November 15, 2022, 11:17pm

I will read it thank you so much!

here is a usable data :

ID	JOB	AGENT	YEARIN	YEAROUT
2	1	A	1998	2009
2	2	A	1996	2000
2	7	B	1979	1979
2	8	B	1978	1982
7	1	B	1973	2007
7	1	A	1979	2007
15	1	A	1975	2009
18	1	D	1980	1981
18	2	D	1980	1985
18	4	D	1982	1987
18	2	A	1978	1979
18	3	A	1976	1982
18	2	B	1976	1979
20	3	C	1988	1993
20	4	C	1996	2002
20	6	C	1985	2000
21	3	A	1989	1992

technocrat · November 16, 2022, 9:27pm

For these data, each AGENT has complete overlap, so I haven't taken this any further.

# functions

find_overlaps <- function(x) {
# begin helper functions
# identify agents
  id_agents <- function(x) unique(x$AGENT)
  
  # detect if a pair of rows overlap
  is_overlap <- function(x,y) length(intersect(spans[x][[1]],spans[y][[1]])) > 0
  
  # create subsets consisting of a single agent
  get_agent <- function(x){
    agent = dat[dat$AGENT == x,]
    agent = agent[order(agent$YEARIN),]
    as.matrix(agent[4:5])
  }
  
  # list of all the years included in each row
  get_spans <- function(x) apply(x,1,make_span)
  
  # for each row create a list of all years from YEARIN to YEAROUT
  make_span <- function(x) x[1]:x[2]
  
  # all combination of rows taken two at a time
  find_tests <- function(x) combinat::combn(1:length(x),2)
# end helper functions  
  
  spans = get_spans(get_agent(x))
  the_tests = find_tests(spans)
  apply(the_tests,2,is_overlap)
}

# data

dat <- data.frame(
  ID =
    c(2, 2, 2, 2, 7, 7, 15, 18, 18, 18, 18, 18, 18, 20, 20, 20, 21),
  JOB =
    c(1, 2, 7, 8, 1, 1, 1, 1, 2, 4, 2, 3, 2, 3, 4, 6, 3),
  AGENT =
    c("A", "A", "B", "B", "B", "A", "A", "D", "D", "D", "A", "A", "B", "C", "C", "C", "A"),
  YEARIN =
    c(1998, 1996, 1979, 1978, 1973, 1979, 1975, 1980, 1980, 1982, 1978, 1976, 1976, 1988, 1996, 1985, 1989),
  YEAROUT =
    c(2009, 2000, 1979, 1982, 2007, 2007, 2009, 1981, 1985, 1987, 1979, 1982, 1979, 1993, 2002, 2000, 1992))

# main

lapply(unique(dat$AGENT),find_overlaps)
#> [[1]]
#>  [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
#> [16] TRUE TRUE TRUE TRUE TRUE TRUE
#> 
#> [[2]]
#> [1] TRUE TRUE TRUE TRUE TRUE TRUE
#> 
#> [[3]]
#> [1] TRUE TRUE TRUE
#> 
#> [[4]]
#> [1] TRUE TRUE TRUE

Interpretation: The return value is a list of AGENTS, each of which has more than one record (row). For each agent, the Booleans return the result of comparison of each combination of each AGENT with each other agent. TRUE indicates overlap. FALSE, if there were any indicates the absence. Missing is a function to return the earliest and latest of the overlapping years.

R_help · November 17, 2022, 2:47pm

Hi ,

Thank you so much for your help! It will help to find the overlaps, now I will check how to calculate the duration while counting for the overlapping years once only.

Have a good day,
M

technocrat · November 17, 2022, 9:39pm

To do this, simply find the number of distinct years.

R_help · November 17, 2022, 9:55pm

Perfect, thank you so much for your help!!

system · December 8, 2022, 9:56pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.