I have this df where Annual.Visits is just a number
data.source <- data.frame(
stringsAsFactors = FALSE,
RegNo = c("1MAM","1MAM","1MAM","1MAM",
"1MAM","1MAM","1MAM","1MAM","1MAM","1MAM","1MAM",
"1MAM","1MAM","1MAM","181DL2","181DL2","181DL2",
"181DL2","181DL2","181DL2","3GRL","3GRL","3GRL","3GRL",
"3GRL","3GRL","3GRL","3GRL","3GRL","3GRL","3GRL",
"3GRL","3GRL","3GRL","3GRL","3GRL","3GRL","3GRL",
"3GRL","589UOO","589UOO","589UOO","589UOO","589UOO",
"589UOO","589UOO","589UOO","589UOO","589UOO",
"589UOO"),
Year = c(2022,2022,2022,2022,2022,
2022,2021,2021,2021,2020,2020,2019,2019,2019,2019,
2019,2019,2019,2019,2019,2022,2022,2021,2021,
2021,2021,2021,2021,2020,2020,2020,2020,2020,2019,
2019,2019,2019,2018,2018,2022,2022,2022,2020,
2020,2020,2020,2020,2020,2019,2019),
Annual.Visits = c(6,6,6,6,6,6,3,3,3,2,2,
3,3,3,6,6,6,6,6,6,2,2,6,6,6,6,6,6,5,5,
5,5,5,4,4,4,4,2,2,3,3,3,6,6,6,6,6,6,2,
2)
)
data.source
first stage is this
stage1 <- data.frame(
stringsAsFactors = FALSE,
check.names = FALSE,
...1 = c("181DL2", "1MAM", "3GRL", "589UOO"),
`2018` = c(NA, NA, 2, NA),
`2019` = c(6, 3, 4, 2),
`2020` = c(NA, 2, 5, 6),
`2021` = c(NA, 3, 6, NA),
`2022` = c(NA, 6, 2, 3)
)
stage1
as a result I think I need to get this:
result <- data.frame(
stringsAsFactors = FALSE,
check.names = FALSE,
...1 = c("181DL2", "1MAM", "3GRL", "589UOO"),
`2018` = c(NA, 0, 1, 0),
`2019` = c(NA, 1, 1, 1),
`2020` = c(NA, 1, 1, 1),
`2021` = c(NA, 1, 1, 0)
)
result
this is very simple in excel but complicated in R.
As a result I know that 1 of 2018 RegNo were repeated in 2022, 3 of 2018 RegNo were repeated in 2022, 3 of 2020 RegNo were repeated in 2022 and 2 of RegNo were repeated in 2022. Therefore 25% of 2018 RegNo were repeated in 2022, 75% of 2019 RegNo were repeated in 2022, 75% of 2020 RegNo were repeated in 2022 and 50% of 2021 RegNo were repeated in 2022..
Excel formula between stage1 and result is very simple in Excel:
=IF($G5>0,IF(AND(B5>0,$G5>0),1,0),"")