How to remove overlaps using R

So I have some data which talks about unique visits for each site. Now, these have overlaps since people visit multiple websites. SO a user who visited site1 could have also visited site3 and site4. So for the unique visits in site1 might be 20M instead of 23M. So in order to remove the overlaps, we did get a percentage overlap between each pair of websites. Based on this how can I calculate the actual unique visits for each website .

tibble::tribble(
      ~X1, ~unique_visits, ~site1, ~site2, ~site3, ~site4, ~site5, ~site6, ~site7,
  "site1",       23873274,    100,   96.1,   95.6,     95,   91.6,   96.6,   92.9,
  "site2",        4249486,   54.1,    100,   46.2,   46.5,   56.9,   77.5,   43.2,
  "site3",         887786,   47.2,   40.5,    100,   41.5,   38.5,   55.3,     85,
  "site4",        3727497,   41.7,   36.3,   36.9,    100,   38.6,   56.3,   51.5,
  "site5",        1833995,   23.7,   26.2,   20.2,   22.8,    100,     28,   87.5,
  "site6",        1617476,    5.1,    7.3,    5.9,    6.8,    5.7,    100,    2.9,
  "site7",         760829,    0.2,    0.2,    0.4,    0.3,    0.8,    0.1,    100
  )

What do you mean by unique visits? From the data it looks like there were 23873274 people who visited site1, and 96.1% of them also visited site2, 95.6% of them also visited site3, etc. Do you want to know how many ONLY visited site1? Hm, tricky.

You might ask on https://math.stackexchange.com/

I think it's related to this problem
http://www.gmatfree.com/module-999/venn-diagrams-and-the-overlapping-set-equation/

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.