Find closest non-overlapping ranges from start to end

I would like to find the closest ranges that do not overlap from the first start to the last end position.

#Example data
df <- data.frame(
  start = c(7,8,14,34,67,92,125,155,170,200),
  end = c(13,33,25,66,91,124,155,161,181,214)
)

   start end
1      7  13
2      8  33
3     14  25
4     34  66
5     67  91
6     92 124
7    125 155
8    155 161
9    170 181
10   200 214

#Overlapping rows
  start end
1     8  33
2   155 161

#Desired output where overlapping rows are filtered away
  start end
1     7  13
2    14  25
3    34  66
4    67  91
5    92 124
6   125 155
7   170 181
8   200 214

Just to be clear, in the original data row 3 overlaps with row 4 as well.

Are you assuming any particular sorting of the data?

What is your criteria for "overlapping" and choosing which to keep? For instance, if your data were,

df <- data.frame(start = c(7,8,10,12,14),
                 end = c(16,9,11,13,15))
df
#>   start end
#> 1     7  16
#> 2     8   9
#> 3    10  11
#> 4    12  13
#> 5    14  15

Would you keep the first row or all the rest?

What about,

df <- data.frame(start = c(7,8,100),
                 end = c(8,100,101))
df
#>   start end
#> 1     7   8
#> 2     8 100
#> 3   100 101

Created on 2020-09-06 by the reprex package (v0.3.0)

I am trying to find the closest continues range with out overlap starting from the lowest starting number. So in your first example I would keep the first row and the rest is dropped because they overlap with the first range. In your second example only the first and last row is kept.

Anyone have an idea?

You're almost certainly going to need to do it in a loop.

find_nonover <- function(df) {
  to_drop <- logical(nrow(df))
  for (i in seq_along(df[["end"]])) {
    if (i %in% which(to_drop)) next
    to_drop <- to_drop | c(logical(i), df[i, "end"] >= df[["start"]][-seq_len(i)])
  }
  list(nonover = df[!to_drop, ],
       over    = df[to_drop, ])
}
df <- data.frame(
  start = c(7,8,14,34,67,92,125,155,170,200),
  end = c(13,33,25,66,91,124,155,161,181,214)
)
df
#>    start end
#> 1      7  13
#> 2      8  33
#> 3     14  25
#> 4     34  66
#> 5     67  91
#> 6     92 124
#> 7    125 155
#> 8    155 161
#> 9    170 181
#> 10   200 214

find_nonover(df)
#> $nonover
#>    start end
#> 1      7  13
#> 3     14  25
#> 4     34  66
#> 5     67  91
#> 6     92 124
#> 7    125 155
#> 9    170 181
#> 10   200 214
#> 
#> $over
#>   start end
#> 2     8  33
#> 8   155 161
df <- data.frame(start = c(7,8,10,12,14),
                 end = c(16,9,11,13,15))
df
#>   start end
#> 1     7  16
#> 2     8   9
#> 3    10  11
#> 4    12  13
#> 5    14  15
find_nonover(df)
#> $nonover
#>   start end
#> 1     7  16
#> 
#> $over
#>   start end
#> 2     8   9
#> 3    10  11
#> 4    12  13
#> 5    14  15
df <- data.frame(start = c(7,8,100),
                 end = c(8,100,101))
df
#>   start end
#> 1     7   8
#> 2     8 100
#> 3   100 101
find_nonover(df)
#> $nonover
#>   start end
#> 1     7   8
#> 3   100 101
#> 
#> $over
#>   start end
#> 2     8 100

Created on 2020-09-07 by the reprex package (v0.3.0)

1 Like

This topic was automatically closed 42 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.