I have a data sheet containing information on ~500 people but multiple rows for each person (containing one prescription for each row, some people have 10 or more each) so entire datasheet has ~5000 rows. I need to work out which people had a specific combination of drugs. The datasheet is called "Prescribing" and the column containing the drug names is "drug_name". For example, I would like to find out who is prescribed "morphine" and "paracetamol" in combination. Do I create a new column for each drug in question (with 0/1 entry to indicate if they had a prescription for the drug in question or not) and then aggregate the file so one row per person, and if so how do I do this? Or is there a different way to do it in R? I am using Tidyverse.
Thank you for this. I have created a dummy dataset because my dataset is in a secure vpn. I had this saved as a csv file but can't upload this to the site I'm afraid - but essentially there are two columns that are important, one with the identifier (each number means one person with several entries each) and one with drug name. I need to work out which people, for example, have morphine and paracetamol in their list of rows of prescriptions. Sorry, I'm afraid that I can't seem to work out how to reprex this (as my R programme is on vpn so can't copy and paste anything).
library(tidyverse)
set.seed(42)
example_df <- tibble(
id = sample.int(1000,1000,replace=TRUE),
drug = paste0(sample(letters,size = 1000,replace=TRUE),sample.int(2,1000,replace=TRUE))
)
# look at common drugs...
table(example_df$drug) %>% sort(decreasing = TRUE) %>% head
#composing the function we want to use
find_ids_with_2_drugs <- function(df,drug_1,drug_2){
df_1 <- filter(df,
drug==drug_1)
df_2 <- filter(df,
drug==drug_2)
inner_join(df_1,
df_2,
by="id")
}
#using it
find_ids_with_2_drugs(example_df,"j1","s1")
# result gives 138 as ameeting the condition
#verify 138
filter(example_df,
id==138)