Percent of cases with same values in two columns

Hi,
I want to get the percent of cases with receivedt = collectdt.
I created the following sample df and tried the code below but it did not work.

LAB <- data.frame(name = c("Jon", "Bill", "Maria", "Ben", "Tina", "Sally", "George", "Simon"),
                  collectdt = c(06/01/2011, 06/05/2011, 09/08/2012, 11/05/2020, 06/01/2011, 09/08/2012, 02/09/2015, 10/14/2022),
                  receivedt = c(06/01/2011, 11/05/2020, 01/12/2019, NA, 08/03/2017, 05/07/2011, 12/08/2021, 10/14/2022)
                 )

LAB<-LAB %>%
mutate(collectdt = if_else(is.na(collectdt), receivedt, as.Date(collectdt)))%>%

Am I using the wrong code or missing something?

I appreciate your help.
Thank you

While it may not be important for this particular task (when is receivedt = collectdt?), this reprex will get those two variables in date format.

I do not understand what your line of code is supposed to accomplish. It does not affect anything if collectdt is never NA, as in your sample data.

library(tidyverse)

# You need the dates as characters (in quotes). R will evaluate 06/01/2011 as 6÷1÷2011:

06/01/2011
#> [1] 0.00298359

LAB <- data.frame(name = c("Jon", "Bill", "Maria", "Ben", "Tina", "Sally", "George", "Simon"),
                  collectdt = c("06/01/2011", "06/05/2011", "09/08/2012", "11/05/2020", "06/01/2011", "09/08/2012", "02/09/2015", "10/14/2022"),
                  receivedt = c("06/01/2011", "11/05/2020", "01/12/2019", NA, "08/03/2017", "05/07/2011", "12/08/2021", "10/14/2022")
)

# two ways to convert characters to dates:

LAB <- LAB |> 
  mutate(collectdt = as.Date(collectdt, "%m/%d/%y"),  # base R's as.Date() function
         receivedt = mdy(receivedt)                   # lubridate's mdy() function
         )

LAB
#>     name  collectdt  receivedt
#> 1    Jon 2020-06-01 2011-06-01
#> 2   Bill 2020-06-05 2020-11-05
#> 3  Maria 2020-09-08 2019-01-12
#> 4    Ben 2020-11-05       <NA>
#> 5   Tina 2020-06-01 2017-08-03
#> 6  Sally 2020-09-08 2011-05-07
#> 7 George 2020-02-09 2021-12-08
#> 8  Simon 2020-10-14 2022-10-14

Created on 2023-07-04 with reprex v2.0.2

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
LAB <- data.frame(name = c("Jon", "Bill", "Maria", "Ben", "Tina", "Sally", "George", "Simon"),
                  collectdt = c("06/01/2011", "06/05/2011", "09/08/2012", "11/05/2020", "06/01/2011", "09/08/2012", "02/09/2015", "10/14/2022"),
                  receivedt = c("06/01/2011", "11/05/2020", "01/12/2019", NA, "08/03/2017", "05/07/2011", "12/08/2021", "10/14/2022")
)
LAB <- LAB |> 
  mutate(collectdt = as.Date(collectdt, "%m/%d/%y"),
         receivedt = as.Date(receivedt, "%m/%d/%y"))
     
LAB[which(LAB$collectdt == LAB$receivedt),] |> nrow() / nrow(LAB)
#> [1] 0.25

Created on 2023-07-04 with reprex v2.0.2

This topic was automatically closed 42 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.