Multiple rows per identifier, how to identify patterns of data within rows

Hi there,

I have a data sheet containing information on ~500 people but multiple rows for each person (containing one prescription for each row, some people have 10 or more each) so entire datasheet has ~5000 rows. I need to work out which people had a specific combination of drugs. The datasheet is called "Prescribing" and the column containing the drug names is "drug_name". For example, I would like to find out who is prescribed "morphine" and "paracetamol" in combination. Do I create a new column for each drug in question (with 0/1 entry to indicate if they had a prescription for the drug in question or not) and then aggregate the file so one row per person, and if so how do I do this? Or is there a different way to do it in R? I am using Tidyverse.

Thanks a lot for your help

Can you please share a small part of the data set in a copy-paste friendly format?

In case you don't know how to do it, there are many options, which include:

  1. If you have stored the data set in some R object, dput function is very handy.

  2. In case the data set is in a spreadsheet, check out the datapasta package. Take a look at this link.

Thank you for this. I have created a dummy dataset because my dataset is in a secure vpn. I had this saved as a csv file but can't upload this to the site I'm afraid - but essentially there are two columns that are important, one with the identifier (each number means one person with several entries each) and one with drug name. I need to work out which people, for example, have morphine and paracetamol in their list of rows of prescriptions. Sorry, I'm afraid that I can't seem to work out how to reprex this (as my R programme is on vpn so can't copy and paste anything).

Thanks again, really appreciate it

|ID|Drug_name|
|1|Aspirin|
|1|Paracetamol|
|1|Morphine|
|1|Simvastatin|
|1|Buprenorphine|
|1|Lisinopril|
|1|Clopidogrel|
|1|Doxazocin|
|1|Warfarin|
|1|Buscopan|
|2|Methylprednisolone|
|2|Prednisolone|
|2|Morphine|
|2|Simvastatin|
|2|Hydrocortisone|
|2|Naproxen|
|2|Aspirin|
|2|Paraffin Emollient|
|3|Hydrocortisone|
|3|Morphine|
|3|Paracetamol|
|3|Salicylate|
|3|Tacrolimus|
|3|Naseptin|
|3|Ceftriaxone|
|3|Levetiracetam|
|4|Nicorandil|
|4|Paracetamol|
|4|Beclomethasone|
|4|Chlorphenamine|
|4|Pregabalin|
|4|Valproic acid|
|4|Nortriptyline|
|4|Tryptophan|

You can do this sort of thing

library(tidyverse)
set.seed(42)
example_df <- tibble(
  id = sample.int(1000,1000,replace=TRUE),
  drug = paste0(sample(letters,size = 1000,replace=TRUE),sample.int(2,1000,replace=TRUE))
)

# look at common drugs...
table(example_df$drug) %>% sort(decreasing = TRUE) %>% head

#composing the function we want to use
find_ids_with_2_drugs <- function(df,drug_1,drug_2){
  
  df_1 <- filter(df,
                 drug==drug_1)
  
  df_2 <- filter(df,
                 drug==drug_2)
  
  inner_join(df_1,
             df_2,
             by="id")
}

#using it
find_ids_with_2_drugs(example_df,"j1","s1")
# result gives 138 as ameeting the condition

#verify 138
filter(example_df,
       id==138)

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.