A recent post (Does anyone mind looking at my code? I am not sure if it does what I want it to do) involved a question about understanding some R code obviously written by a procedural programmer. It was full of for loops and slicing and almost impossible to follow.
The OP is a medical researcher who didn't have confidence in the results because the code wasn't readily followable. Take a look at the link above. Another problem, besides rewriting the code in idiomatic R was the organization of scores of csv files needed, each of which had multiple records for the same patient. For example, if a patient had multiple diagnoses, there would be separate rows for each. After much, patient discussion on the OP's part, I was able to see that for the information sought no contortions to collapse rows was needed.
To illustrate the contrast, my idiomatic script is included below:
# import libraries library(tidyverse) setwd("/Users/rc/projects/working directory/RDS AY 2015") # set print display options options(pillar.sigfig = 10) # control display of tibble # Get patient demographics patients <- as.tibble(read.csv("RDS_DEMO.csv", stringsAsFactors = FALSE)) patients <- patients %>% select(INC_KEY, AGE, GENDER) patients <- patients %>% select(INC_KEY, AGE, GENDER) %>% mutate(ADULT = ifelse(AGE > 17, "TRUE", "FALSE")) # eliminate duplicates patients <- distinct(patients) # Get discharge status discharges <- as.tibble(read.csv("RDS_DISCHARGE.csv", stringsAsFactors = FALSE)) discharges <- discharges %>% select(INC_KEY,HOSPDISP) %>% filter(HOSPDISP != "Not Applicable BIU 1" & HOSPDISP != "Not Known/Not Recorded BIU 2") %>% mutate(Expired = ifelse(HOSPDISP == "Deceased/Expired", "TRUE", "FALSE")) %>% select(-HOSPDISP) # Get intervention codes p_codes <- as.tibble(read.csv("RDS_PCODE.csv", stringsAsFactors = FALSE)) p_codes <- p_codes %>% select(INC_KEY, PCODE) pelvic <- p_codes %>% filter(PCODE >= 57.6 & PCODE <= 57.89 | PCODE == 57.93) %>% mutate(PCODE = as.logical(PCODE)) colnames(pelvic) <- c("INC_KEY","PELVIC") # 57.93 56.6:57.89 & 57.93 # Diagnostic codes d_codes <- as.tibble(read.csv("RDS_DCODE.csv", stringsAsFactors = FALSE)) d_codes <- d_codes %>% select(INC_KEY, DCODE) Rupture <- d_codes %>% filter(DCODE > 866.99 & DCODE < 867.2) # Join patients, discharge, then patients, treatment, then T/F pelvic then DCODES, add year and reorder patients <- inner_join(patients, discharges, by = "INC_KEY") # Treatment codes # ID patients with no pelvic procedure pelvic_false <- setdiff(patients$INC_KEY, pelvic$INC_KEY) # Add field for PELVIC t/f patients <- patients %>% mutate(PELVIC = ifelse(INC_KEY %in% pelvic_false, "FALSE", "TRUE")) %>% mutate(PELVIC = as.logical(PELVIC)) # Add fields for 867 series patients <- inner_join(patients, Rupture, by = "INC_KEY") #Add year patients <- patients %>% mutate(YEAR = 2015) # Rename columns colnames(patients) <- c("INC_KEY", "Age", "Sex", "Adult", "Expired", "Intervention", "Rupture", "Year") # Change Expired to logical patients <- patients %>% mutate(Expired = as.logical(Expired)) # Censor anomolies patients <- patients %>% filter(Age >= 0) # SAVE write.csv(patients, "../combined/patients_2015.csv")