Data Mining of Drug Safety Reports


I am working on a large dataset for a research project. The project includes safety case reports (Adverse Event) and I am trying to apply an inclusion criteria which is:

the adverse event must come from two different sources. the sources are coded in an excel sheet as (1) and (0). To be included the adverse event must come from both sources (1 & 0) to enter into the final analysis. The data is available in excel.

What would be the appropriate codes to run on R in order to apply the criteria?


How do you identify an adverse event. There is an ID for that ?

data has three variables:

  1. Case number
  2. Source: 0 & 1
  3. Adverse event: for example headache, infection etc. Some adverse events are repeated many times.

I am not interested in the case number. I need to identify adverse events that came from two sources (0&1) regardless of the frequency.


set.seed(42) # for reproducible random data
(example_data <- data.frame(
  case_num =1:100,
  source = c(rep(0,50),rep(1,50)),
  adverse_event = factor(sample(c(letters,LETTERS),size=100,replace=TRUE))

  result_df <- group_by(example_data,
                        adverse_event) %>%
    summarise(total_cases  = n(),
              both_sources = sum(source==0) > 0 & sum(source==1) > 0 )

