Multiple rows per identifier, how to identify patterns of data within rows

Hi there,

I have a data sheet containing information on ~500 people but multiple rows for each person (containing one prescription for each row, some people have 10 or more each) so entire datasheet has ~5000 rows. I need to work out which people had a specific combination of drugs. The datasheet is called "Prescribing" and the column containing the drug names is "drug_name". For example, I would like to find out who is prescribed "morphine" and "paracetamol" in combination. Do I create a new column for each drug in question (with 0/1 entry to indicate if they had a prescription for the drug in question or not) and then aggregate the file so one row per person, and if so how do I do this? Or is there a different way to do it in R? I am using Tidyverse.

Thanks a lot for your help

Can you please share a small part of the data set in a copy-paste friendly format?

In case you don't know how to do it, there are many options, which include:

  1. If you have stored the data set in some R object, dput function is very handy.

  2. In case the data set is in a spreadsheet, check out the datapasta package. Take a look at this link.

Thank you for this. I have created a dummy dataset because my dataset is in a secure vpn. I had this saved as a csv file but can't upload this to the site I'm afraid - but essentially there are two columns that are important, one with the identifier (each number means one person with several entries each) and one with drug name. I need to work out which people, for example, have morphine and paracetamol in their list of rows of prescriptions. Sorry, I'm afraid that I can't seem to work out how to reprex this (as my R programme is on vpn so can't copy and paste anything).

Thanks again, really appreciate it

|2|Paraffin Emollient|
|4|Valproic acid|

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.

You can do this sort of thing

example_df <- tibble(
  id =,1000,replace=TRUE),
  drug = paste0(sample(letters,size = 1000,replace=TRUE),,1000,replace=TRUE))

# look at common drugs...
table(example_df$drug) %>% sort(decreasing = TRUE) %>% head

#composing the function we want to use
find_ids_with_2_drugs <- function(df,drug_1,drug_2){
  df_1 <- filter(df,
  df_2 <- filter(df,

#using it
# result gives 138 as ameeting the condition

#verify 138