Follow-up on a closed post: translating non-idiomatic R to the tidyverse

A recent post (Does anyone mind looking at my code? I am not sure if it does what I want it to do) involved a question about understanding some R code obviously written by a procedural programmer. It was full of for loops and slicing and almost impossible to follow.

The OP is a medical researcher who didn't have confidence in the results because the code wasn't readily followable. Take a look at the link above. Another problem, besides rewriting the code in idiomatic R was the organization of scores of csv files needed, each of which had multiple records for the same patient. For example, if a patient had multiple diagnoses, there would be separate rows for each. After much, patient discussion on the OP's part, I was able to see that for the information sought no contortions to collapse rows was needed.

To illustrate the contrast, my idiomatic script is included below:

# import libraries
library(tidyverse)
setwd("/Users/rc/projects/working directory/RDS AY 2015")

# set print display options
options(pillar.sigfig = 10) # control display of tibble

# Get patient demographics
patients <- as.tibble(read.csv("RDS_DEMO.csv", stringsAsFactors = FALSE))
patients <- patients %>% select(INC_KEY, AGE, GENDER)
patients <- patients %>% select(INC_KEY, AGE, GENDER) %>% mutate(ADULT = ifelse(AGE > 17, "TRUE", "FALSE"))
 
# eliminate duplicates
patients <- distinct(patients)

# Get discharge status
discharges <- as.tibble(read.csv("RDS_DISCHARGE.csv", stringsAsFactors = FALSE))
discharges <- discharges %>% select(INC_KEY,HOSPDISP) %>%  filter(HOSPDISP != "Not Applicable BIU 1" & HOSPDISP != "Not Known/Not Recorded BIU 2") %>% mutate(Expired = ifelse(HOSPDISP == "Deceased/Expired", "TRUE", "FALSE")) %>% select(-HOSPDISP)

# Get intervention codes
p_codes <- as.tibble(read.csv("RDS_PCODE.csv", stringsAsFactors = FALSE))
p_codes <- p_codes %>% select(INC_KEY, PCODE)
pelvic <- p_codes %>% filter(PCODE >= 57.6 & PCODE <= 57.89 | PCODE == 57.93) %>% mutate(PCODE = as.logical(PCODE))
colnames(pelvic) <- c("INC_KEY","PELVIC") #  57.93 56.6:57.89 & 57.93

# Diagnostic codes
d_codes <- as.tibble(read.csv("RDS_DCODE.csv", stringsAsFactors = FALSE))
d_codes <- d_codes %>% select(INC_KEY, DCODE)
Rupture <- d_codes %>% filter(DCODE > 866.99 & DCODE < 867.2)

# Join patients, discharge, then patients, treatment, then T/F pelvic then DCODES, add year and reorder
patients <- inner_join(patients, discharges, by = "INC_KEY")

# Treatment codes

# ID patients with no pelvic procedure
pelvic_false <- setdiff(patients$INC_KEY, pelvic$INC_KEY)

# Add field for PELVIC t/f
patients <- patients %>% mutate(PELVIC = ifelse(INC_KEY %in% pelvic_false, "FALSE", "TRUE")) %>% mutate(PELVIC = as.logical(PELVIC))

# Add fields for 867 series
patients <- inner_join(patients, Rupture, by = "INC_KEY")

#Add year
patients <- patients %>% mutate(YEAR = 2015)

# Rename columns
colnames(patients) <- c("INC_KEY", "Age", "Sex", "Adult", "Expired", "Intervention", "Rupture", "Year")
# Change Expired to logical
patients <- patients %>% mutate(Expired = as.logical(Expired))

# Censor anomolies

patients <- patients %>% filter(Age >= 0)

# SAVE
write.csv(patients, "../combined/patients_2015.csv")

1 Like

2 posts were merged into an existing topic: Does anyone mind looking at my code? I am not sure if it does what I want it to do