Need help deciphering code inherited from someone else

Greetings! I have been assigned to carry on work that a recently resigned colleague had been doing. Her knowledge of R was fairly advanced, while mine remains rather limited. I'm struggling to fully understand what some of her code does, exactly. I'm trying to look up functions and packages as I go through her code but that is taking a long time and there is some time pressure here. The whole thing is a bit frustrating. Anywho, I'm hoping folks here can maybe help me understand what various chunks of code are doing and would appreciate any and all help!

First, the packages in use:

library(stringr)
library(reshape2)
library(lubridate)
library(zoo)
library(odbc)
library(DBI)
library(sqldf)
library(dplyr)
library(dbplyr)

Next, in the following bit my understanding is that a date range is set and two data objects (num & denom) are created from visits.YTD (which is created with a SQL query earlier), then combined in a data frame "score.2022.all", but there's a lot of nuances in there that I don't understand. Apologies if this is too much to ask or unclear.

attr.CMS.2022 <- dbGetQuery(con, "SELECT * FROM [PWA].[REDACTED].ACOMSSPassignedbeneficiaries2022PatID") 

  
# ---Num all snapshots---
begin.date <- base::as.Date('2022-01-01')
end.date   <- base::as.Date('2022-12-31') # can be date of snapshot; next update should be for CY 2023 
theDate <- begin.date
num <-NA
date.chain <-NA
excluded.num <- c()

for(i in 1:(end.date-begin.date+1)){
  zzz<-visits.YTD %>% 
       filter(CONTACT_DATE == theDate) %>%
       inner_join(attr.CMS.2022, by=c("PAT_ID" = "EpicPatientID")) %>%
       filter(!PAT_ID %in% excluded.num) %>%
       distinct(PAT_ID) %>%
       tally()

   yyy <-visits.YTD %>% 
       filter(CONTACT_DATE == theDate) %>% 
       inner_join(attr.CMS.2022, by=c("PAT_ID" = "EpicPatientID")) %>%
       select(PAT_ID) %>%
       distinct(PAT_ID)
   
    num[i] <- zzz
    date.chain[i] <- as.character(theDate)
    excluded.num <- unique(unlist(c(excluded.num, yyy), use.names = F))
    theDate <- theDate+1
}

num <-unlist(num, use.names = F)
date <-unlist(date.chain, use.names = F)



# ---Denom all snapshots---
begin.date <- base::as.Date("2022-01-01")
end.date   <- base::as.Date("2022-12-31") # next update should be for CY 2023
theDate <- begin.date
denom <-NA
date.chain <-NA
excluded.den <- c()

for(i in 1:(end.date-begin.date+1)){
  zzz<-visits.YTD %>% 
       filter(CONTACT_DATE == theDate) %>%
       filter(!PAT_ID %in% excluded.den) %>%
       distinct(PAT_ID) %>%
       tally()

     yyy <-visits.YTD %>% 
       filter(CONTACT_DATE == theDate) %>% 
       select(PAT_ID) %>%
       distinct(PAT_ID)
     
    denom[i] <- zzz
    excluded.den <- unique(unlist(c(excluded.den, yyy), use.names = F))
    date.chain[i] <- as.character(theDate)
    theDate <- theDate+1
}

denom <-unlist(denom, use.names = F)
date <-unlist(date.chain, use.names = F)


score.2022.all <- data.frame(num, date)
score.2022.all$denom <- denom

#---put the numerator and denom together

score.2022.all <- score.2022.all %>%
                  arrange(date) %>%
                  mutate(num     = ifelse(as.Date(date, "%Y-%m-%d") < Sys.Date(), num,           NA)) %>%
                  mutate(denom   = ifelse(as.Date(date, "%Y-%m-%d") < Sys.Date(), denom,         NA)) %>%
                  mutate(numcu   = ifelse(as.Date(date, "%Y-%m-%d") < Sys.Date(), cumsum(num),   NA)) %>%
                  mutate(denomcu = ifelse(as.Date(date, "%Y-%m-%d") < Sys.Date(), cumsum(denom), NA)) %>%
                  mutate(score   = ifelse(as.Date(date, "%Y-%m-%d") < Sys.Date(), numcu/denomcu, NA)) %>%
                  mutate(dailyscore = num/denom)
    
score.2022.all$rolling7 <- zoo::rollmean(score.2022.all$dailyscore, 7, align='right', fill=NA)
score.2022.all[score.2022.all=="NaN"] <- NA


dbWriteTable(con, "ACOProviders", score.2022.all)
dbGetQuery(con, "DROP TABLE [PWA].[REDACTED].ACOQPAttrEligibleVisits2022ScoreAll")
dbGetQuery(con, "SELECT * INTO [PWA].[REDACTED].ACOQPAttrEligibleVisits2022ScoreAll FROM ACOProviders")
dbGetQuery(con, "DROP TABLE ACOProviders")

Here is a commented version of your code. Since I don't have your data, I can't be sure I interpreted everything correctly. Does this help? The point seems to be the ratio of new patients that are in attr.CMS.2022 and not in attr.CMS.2022

attr.CMS.2022 <- dbGetQuery(con, "SELECT * FROM [PWA].[REDACTED].ACOMSSPassignedbeneficiaries2022PatID") 


# ---Num all snapshots---
begin.date <- base::as.Date('2022-01-01')
end.date   <- base::as.Date('2022-12-31') # can be date of snapshot; next update should be for CY 2023 
theDate <- begin.date
num <-NA
date.chain <-NA
excluded.num <- c()

for(i in 1:(end.date-begin.date+1)){ #step through each date of 2022
  zzz<-visits.YTD %>% 
    filter(CONTACT_DATE == theDate) %>% #keep data from the chosen date
    inner_join(attr.CMS.2022, by=c("PAT_ID" = "EpicPatientID")) %>% #keep only those PAT_ID that match EpciPatientID in attr.CMS.2022
    filter(!PAT_ID %in% excluded.num) %>% #keep only those PAT_ID that are not in excluded.num, i.e. have not been found earlier in the year
    distinct(PAT_ID) %>% #what are the unique PAT_IDs
    tally() #counts how many unique PAT_IDs there are
  
  yyy <-visits.YTD %>% 
    filter(CONTACT_DATE == theDate) %>% #keep data from the chosen date
    inner_join(attr.CMS.2022, by=c("PAT_ID" = "EpicPatientID")) %>% #keep only those PAT_ID that match EpciPatientID in attr.CMS.2022
    select(PAT_ID) %>% #look at only the PAT_ID column. I think this is not necessary
    distinct(PAT_ID) #what are the unique PAT_IDs
  
  num[i] <- zzz #store the count of unique PAT_IDs that where not filtered out
  date.chain[i] <- as.character(theDate) #store the chosen date
  excluded.num <- unique(unlist(c(excluded.num, yyy), use.names = F)) #store the PAT_IDs that have been used
  theDate <- theDate+1 #increment the chosen date
}

num <-unlist(num, use.names = F) #make vector of the PAT_IDs
date <-unlist(date.chain, use.names = F) #make vector of the dates



# ---Denom all snapshots---
begin.date <- base::as.Date("2022-01-01")
end.date   <- base::as.Date("2022-12-31") # next update should be for CY 2023
theDate <- begin.date
denom <-NA
date.chain <-NA
excluded.den <- c()

for(i in 1:(end.date-begin.date+1)){
  #The calculations are like above but being in attr.CMS.2022 is not necessary
  zzz<-visits.YTD %>% 
    filter(CONTACT_DATE == theDate) %>% #keep data from the chosen date
    filter(!PAT_ID %in% excluded.den) %>% #keep that have not been used earlier. List is filled below
    distinct(PAT_ID) %>% #What are the unique PAT_IDs
    tally() #counts how many unique PAT_IDs there are
  
  yyy <-visits.YTD %>% 
    filter(CONTACT_DATE == theDate) %>% #keep data from the chosen date
    select(PAT_ID) %>% #consider only the PAT_ID column
    distinct(PAT_ID) #What are the unique values of PAT_ID
  
  denom[i] <- zzz #store the count of PAT_IDs that passed the filtering
  excluded.den <- unique(unlist(c(excluded.den, yyy), use.names = F)) #store the PAT_IDs found so far
  date.chain[i] <- as.character(theDate) #store the chosen date
  theDate <- theDate+1 #increment the date
}

denom <-unlist(denom, use.names = F) #make vector of the PAT_IDs
date <-unlist(date.chain, use.names = F) #make a vector of the dates

#make a data frame of the counts of PAT_IDS from the first and second loop
score.2022.all <- data.frame(num, date)
score.2022.all$denom <- denom

#---put the numerator and denom together

score.2022.all <- score.2022.all %>%
  arrange(date) %>% #keep data only earlier than the Sys.Date()
  mutate(num     = ifelse(as.Date(date, "%Y-%m-%d") < Sys.Date(), num,           NA)) %>%
  mutate(denom   = ifelse(as.Date(date, "%Y-%m-%d") < Sys.Date(), denom,         NA)) %>%
  mutate(numcu   = ifelse(as.Date(date, "%Y-%m-%d") < Sys.Date(), cumsum(num),   NA)) %>% #calculate cumulative sum of num 
  mutate(denomcu = ifelse(as.Date(date, "%Y-%m-%d") < Sys.Date(), cumsum(denom), NA)) %>% #calculate cumulative sum of denom
  mutate(score   = ifelse(as.Date(date, "%Y-%m-%d") < Sys.Date(), numcu/denomcu, NA)) %>%
  mutate(dailyscore = num/denom)

score.2022.all$rolling7 <- zoo::rollmean(score.2022.all$dailyscore, 7, align='right', fill=NA)
score.2022.all[score.2022.all=="NaN"] <- NA

#Store the data in database table
dbWriteTable(con, "ACOProviders", score.2022.all)
dbGetQuery(con, "DROP TABLE [PWA].[REDACTED].ACOQPAttrEligibleVisits2022ScoreAll")
dbGetQuery(con, "SELECT * INTO [PWA].[REDACTED].ACOQPAttrEligibleVisits2022ScoreAll FROM ACOProviders")
dbGetQuery(con, "DROP TABLE ACOProviders")

1 Like

I have a broader take from @FJCC

A useful way to approach R is if it were nothing more than school algebra—f(x) = y where

x is what is to hand, y is what is desired and f transforms the one into the other. Each of these is an object (in R everything is an object) and objects can be composite. Take the simple case of wanting to know if an object is numeric.

x <- 1
y <- is.numeric(x)
y
#> [1] TRUE

Created on 2022-12-23 by the reprex package (v2.0.1)

Or we can illustrate the g(f(x) composite case

x <- 1
y <- is.numeric(as.character(x))
y 
#> [1] FALSE

With that in mind, let's look at your code. x is an sql table [PWA].[REDACTED] and y is another version of the same table as modified between the top and bottom. f in both cases. The first thing to note is that R and sql objects are not interchangeable and need to be translated back and forth. That's what dbGetQuery and dbWriteTable do, with the later doing double duty manipulating the sql tables completed from within sql. So far, we've read an sql table into an R data frame and written an R data frame back into the sql a database as an sql table.

In between is all R and the applicable f is a series of transformations of the beginning data frame into the ending data frame. I suggest you start with looking at attr.CMS.2022 to see what you dredged up. colnames(attr.CMS.2022) will give you the names of variables, str(attr.CMS.2022) will show you what type, and summary(attr.CMS.2022) will display the count of any missing pieces and a numerical summary of the numeric data. head(attr.CMS.2022) will show the layout.

Next come some constants and initialized values of variables to be populated. The initialized variables are how operations of the for loop escape. The first for loop takes some object called visits.YTD through a series of transformations and stuffs the results into num, date.chain and excluded_num. It's not obvious what the connection between visits.YTD and the source object attr.CMS.2022 is. Comments suggest they are separate, and perhaps another sql import is necessary.

The second for loop populates denom in addition.

The following code makes use of the three return values to create score.2022.all for written back to the sql database. You may want to ask your local sql resource whether

dbGetQuery(con, "SELECT * INTO [PWA].[REDACTED].ACOQPAttrEligibleVisits2022ScoreAll FROM ACOProviders")

works because it looks like it was just dropped, but I'm rusty there.

1 Like

One thing I have begun doing that I would commend to you and others, is to avoid attaching packages except for the most common (tidyverse), but rather call the packages explicitly. I really hate looking at someone's code, including my own from months ago, and wondering just what package a function came from.

2 Likes

@FJCC, Very helpful, thank you! I went through the code line by line and understood much of it, but your comments are filling in the blanks and helping me understand what's going on in much more detail and depth.

@technocrat, great stuff here! Helps me understand the overall approach and R in general.

@Ajackson, good idea! I was struggling with that my self when trying to understand what the script does.

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.