Create table with count of specific conditions

Hi everyone,

I have something I want to do but I am not sure where to even begin.

I have a large dataset which lists the number of drug overdoses for a given period. The way it is lain out, most of the columns are drug names, and under those columns, certain cells may have "C" or "P" in them, indicating whether the drug was the cause of the overdose or simply present in the system, respectively.

What I'd like to do is create some kind of a table where you have number of occurances of Fentanyl + 1 other drug, fentanyl + 2 other drugs (these other drugs can be C or P) etc. So, for example, one occurance of fentanyl + one other drug would be where in a given cell, you'd find "C" under the fentanyl column, then somewhere else in that row you'd find "C" or "P" in JUST ONE other drug. For Fentanyl + 2, it'd be Fentanyl = "C" and "C" or "P" in EXACTLY TWO other drugs.

So, Ideally I'd have something like this

"Fentanyl + 1: 13
Fentanyl + 2: 32
Fentanyl + 3: 34
"

etc, up until the maximum number of occurances.

Please let me know if I can clarify in any way.

The Final Table (either in the dataset or separately) would be a two column table. One column is the specific attribute (Fentanyl + 1 other drug (cause or present)) (Fentanyl + 2 other drugs (cause or present)) etc. The number of rows will correspond to the number of different drugs.
So it might look something like what is below:

[Attribute] [count]
Fentanyl +1 9
Fentanyl + 2 0
Fentanyl + 3 30
Fentanyl + 4 8

Here is the "DPUT" structure. This is NOT the full data frame, as the dataset is actually humungous. So, this is just to provide an idea of how the data look.

structure(list(X = 2859:2863, District = c(1L, 1L, 1L, 1L, 1L
), Year = c(2013L, 2013L, 2013L, 2013L, 2013L), Sex = c("F", 
"F", "F", "F", "F"), Manner.of.Death = c("A", "A", "A", "A", 
"A"), Cause.of.Death = c("Acute Cocaine Intoxication", "Blunt Impact of Head", 
"Blunt Impact of Head", "Blunt Impact of Head", "Blunt Impact of Head and Neck"
), Race = c("W", "W", "W", "W", "W"), ID = c(122L, 32L, 54L, 
141L, 86L), Age = c("33", "34", "23", "17", "19"), U47700 = c("NO VALUE", 
"NO VALUE", "NO VALUE", "NO VALUE", "NO VALUE"), Oxymorphone = c("NO VALUE", 
"NO VALUE", "NO VALUE", "NO VALUE", "NO VALUE"), Buprenorphine = c("NO VALUE", 
"NO VALUE", "NO VALUE", "NO VALUE", "NO VALUE"), Zolpidem = c("NO VALUE", 
"NO VALUE", "NO VALUE", "NO VALUE", "NO VALUE"), County = c("57", 
"17", "17", "57", "17"), Nordiazepam = c("NO VALUE", "NO VALUE", 
"NO VALUE", "NO VALUE", "NO VALUE"), Estazolam = c("NO VALUE", 
"NO VALUE", "NO VALUE", "NO VALUE", "NO VALUE"), Flurazepam = c("NO VALUE", 
"NO VALUE", "NO VALUE", "NO VALUE", "NO VALUE"), Oxazepam = c("NO VALUE", 
"NO VALUE", "NO VALUE", "NO VALUE", "NO VALUE"), Triazolam = c("NO VALUE", 
"NO VALUE", "NO VALUE", "NO VALUE", "NO VALUE"), Lorazepam = c("NO VALUE", 
"NO VALUE", "NO VALUE", "NO VALUE", "NO VALUE"), Chlordiazepoxide = c("NO VALUE", 
"NO VALUE", "NO VALUE", "NO VALUE", "NO VALUE"), Clonazepam = c("NO VALUE", 
"NO VALUE", "NO VALUE", "NO VALUE", "NO VALUE"), Helium = c("NO VALUE", 
"NO VALUE", "NO VALUE", "NO VALUE", "NO VALUE"), Codeine = c("NO VALUE", 
"NO VALUE", "NO VALUE", "NO VALUE", "NO VALUE"), Temazepam = c("NO VALUE", 
"NO VALUE", "NO VALUE", "NO VALUE", "NO VALUE"), Midazolam = c("NO VALUE", 
"NO VALUE", "NO VALUE", "NO VALUE", "NO VALUE"), HalogenatedInhalants = c("NO VALUE", 
"NO VALUE", "NO VALUE", "NO VALUE", "NO VALUE"), HallucinogenicPhenethylaminesPiperazine = c("NO VALUE", 
"NO VALUE", "NO VALUE", "NO VALUE", "NO VALUE"), SyntheticCannabinoids = c("NO VALUE", 
"NO VALUE", "NO VALUE", "NO VALUE", "NO VALUE"), Cathinones = c("NO VALUE", 
"NO VALUE", "NO VALUE", "NO VALUE", "NO VALUE"), HallucinogenicTryptamines = c("NO VALUE", 
"NO VALUE", "NO VALUE", "NO VALUE", "NO VALUE"), HydrocarbonInhalants = c("NO VALUE", 
"NO VALUE", "NO VALUE", "NO VALUE", "NO VALUE"), SympathomimeticAmines = c("NO VALUE", 
"NO VALUE", "NO VALUE", "NO VALUE", "NO VALUE"), Analytes = c("NO VALUE", 
"NO VALUE", "NO VALUE", "NO VALUE", "NO VALUE"), FentanylAnalogs = c("NO VALUE", 
"NO VALUE", "NO VALUE", "NO VALUE", "NO VALUE"), GHB = c("NO VALUE", 
"NO VALUE", "NO VALUE", "NO VALUE", "NO VALUE"), Oxycodone = c("NO VALUE", 
"NO VALUE", "NO VALUE", "NO VALUE", "NO VALUE"), Fentanyl = c("NO VALUE", 
"NO VALUE", "NO VALUE", "NO VALUE", "NO VALUE"), Morphine = c("NO VALUE", 
"NO VALUE", "NO VALUE", "NO VALUE", "P"), Methadone = c("NO VALUE", 
"NO VALUE", "NO VALUE", "NO VALUE", "NO VALUE"), Meperidine = c("NO VALUE", 
"NO VALUE", "NO VALUE", "NO VALUE", "NO VALUE"), Month = c(4L, 
5L, 2L, 8L, 4L), Cannabinoids = c("NO VALUE", "NO VALUE", "NO VALUE", 
"NO VALUE", "P"), Tramadol = c("NO VALUE", "NO VALUE", "NO VALUE", 
"P", "NO VALUE"), Diazepam = c("NO VALUE", "NO VALUE", "NO VALUE", 
"NO VALUE", "NO VALUE"), Heroin = c("NO VALUE", "NO VALUE", "NO VALUE", 
"NO VALUE", "NO VALUE"), Ethanol = c("NO VALUE", "P", "P", "NO VALUE", 
"P"), Hydrocodone = c("NO VALUE", "NO VALUE", "NO VALUE", "NO VALUE", 
"NO VALUE"), Methamphetamine = c("NO VALUE", "NO VALUE", "NO VALUE", 
"NO VALUE", "NO VALUE"), Alprazolam = c("NO VALUE", "NO VALUE", 
"NO VALUE", "NO VALUE", "NO VALUE"), Hydromorphone = c("NO VALUE", 
"NO VALUE", "NO VALUE", "NO VALUE", "NO VALUE"), Case.Number = c("MLA 13-194", 
"MLO 13-293", "MLO 13-094", "MLO 13-463", "MLA 13-253"), Amphetamine = c("NO VALUE", 
"NO VALUE", "NO VALUE", "NO VALUE", "NO VALUE"), Ketamine = c("NO VALUE", 
"NO VALUE", "NO VALUE", "NO VALUE", "NO VALUE"), Cocaine = c("C", 
"NO VALUE", "NO VALUE", "NO VALUE", "NO VALUE"), PCP = c("NO VALUE", 
"NO VALUE", "NO VALUE", "NO VALUE", "NO VALUE"), PCPAnalogs = c("NO VALUE", 
"NO VALUE", "NO VALUE", "NO VALUE", "NO VALUE"), NitrousOxide = c("NO VALUE", 
"NO VALUE", "NO VALUE", "NO VALUE", "NO VALUE"), Flunitrazepam = c("NO VALUE", 
"NO VALUE", "NO VALUE", "NO VALUE", "NO VALUE"), CarisoprodolMeprobamate = c("", 
"", "", "", ""), RecID.x = c(NA_real_, NA_real_, NA_real_, NA_real_, 
NA_real_), RecID.y = c(NA_real_, NA_real_, NA_real_, NA_real_, 
NA_real_), Gabapentin.x = c("", "", "", "", ""), Mitragynine.x = c("", 
"", "", "", ""), PCPandPCPAnalogs.x = c("", "", "", "", ""), 
    IllicitFentanyl.x = c("", "", "", "", ""), RecID.x.1 = c(NA_real_, 
    NA_real_, NA_real_, NA_real_, NA_real_), Gabapentin.y = c("", 
    "", "", "", ""), Mitragynine.y = c("", "", "", "", ""), PCPandPCPAnalogs.y = c("", 
    "", "", "", ""), IllicitFentanyl.y = c("", "", "", "", ""
    ), Table.Names = c("2013 Annual Drug Raw Data.xlsx/Raw_Data", 
    "2013 Annual Drug Raw Data.xlsx/Raw_Data", "2013 Annual Drug Raw Data.xlsx/Raw_Data", 
    "2013 Annual Drug Raw Data.xlsx/Raw_Data", "2013 Annual Drug Raw Data.xlsx/Raw_Data"
    ), RecID.y.1 = c(2013121, 201331, 201353, 2013140, 201385
    ), Chlorodifluoromethane = c("NO VALUE", "NO VALUE", "NO VALUE", 
    "NO VALUE", "NO VALUE"), Desalkyflurazepam = c("NO VALUE", 
    "NO VALUE", "NO VALUE", "NO VALUE", "NO VALUE"), X7.Aminoclonazepam = c("NO VALUE", 
    "NO VALUE", "NO VALUE", "NO VALUE", "NO VALUE"), Phentermine = c("NO VALUE", 
    "NO VALUE", "NO VALUE", "NO VALUE", "NO VALUE"), Alpha.hydroxyalprazolam = c("NO VALUE", 
    "NO VALUE", "NO VALUE", "NO VALUE", "NO VALUE"), Toluene = c("NO VALUE", 
    "NO VALUE", "NO VALUE", "NO VALUE", "NO VALUE"), Alpha.hydroxytriazolam = c("NO VALUE", 
    "NO VALUE", "NO VALUE", "NO VALUE", "NO VALUE"), Specify.Other.Inhalants = c("NO VALUE", 
    "NO VALUE", "NO VALUE", "NO VALUE", "NO VALUE"), Other.Benzo = c("NO VALUE", 
    "NO VALUE", "NO VALUE", "NO VALUE", "NO VALUE"), Index = c(121, 
    31, 53, 140, 85), MDA = c("NO VALUE", "NO VALUE", "NO VALUE", 
    "NO VALUE", "NO VALUE"), MDMA = c("NO VALUE", "NO VALUE", 
    "NO VALUE", "NO VALUE", "NO VALUE"), Other.Inhalants = c("NO VALUE", 
    "NO VALUE", "NO VALUE", "NO VALUE", "NO VALUE"), Other.Opioids = c("NO VALUE", 
    "NO VALUE", "NO VALUE", "NO VALUE", "NO VALUE"), MDEA = c("NO VALUE", 
    "NO VALUE", "NO VALUE", "NO VALUE", "NO VALUE"), Freon = c("NO VALUE", 
    "NO VALUE", "NO VALUE", "NO VALUE", "NO VALUE"), Specify.Other.Amphetamines = c("NO VALUE", 
    "NO VALUE", "NO VALUE", "NO VALUE", "NO VALUE"), Other.Amphetamines = c("NO VALUE", 
    "NO VALUE", "NO VALUE", "NO VALUE", "NO VALUE"), Multiple.Drugs = c("NO VALUE", 
    "NO VALUE", "NO VALUE", "NO VALUE", "NO VALUE"), Propoxyphene = c("NO VALUE", 
    "NO VALUE", "NO VALUE", "NO VALUE", "NO VALUE"), Specify.Other.Opioids = c("NO VALUE", 
    "NO VALUE", "NO VALUE", "NO VALUE", "NO VALUE"), Specify.Other.Benzo = c("NO VALUE", 
    "NO VALUE", "NO VALUE", "NO VALUE", "NO VALUE"), Difluoroethane = c("NO VALUE", 
    "NO VALUE", "NO VALUE", "NO VALUE", "NO VALUE"), Carisoprodol = c("NO VALUE", 
    "NO VALUE", "NO VALUE", "NO VALUE", "NO VALUE"), Day = c(1L, 
    1L, 1L, 1L, 1L), date = c("2013-04-01", "2013-05-01", "2013-02-01", 
    "2013-08-01", "2013-04-01"), quarter = c("2013Q2", "2013Q2", 
    "2013Q1", "2013Q3", "2013Q2")), row.names = c(NA, 5L), class = "data.frame")```
1 Like

Hi, in this forum you can get the solutions. Here exist an advanced R User.

Im recomend you put and example of final table that you want for better understand.

Here is a rough version of a solution. I had to guess at which columns represent drugs and you may want to drop some of the columns that I kept as data descriptors. I assumed that the ID column uniquely identifies a person. I also did the calculation using Ethanol instead of Fentanyl because the Fentanyl column is NO VALUE in all rows.
The idea is to keep only those rows where Ethanol is either C or P, reshape the data so one column lists all of the drugs, keep only the rows where a drug is C or P, and finally count the number of rows that survive for each ID. I returned the value of n() - 1 because every person will have Ethanol and n() - 1 is then the number of other drugs found.

library(tidyverse)
DFlong <- DF |> filter(Ethanol  %in% c("C","P")) |> 
   pivot_longer(cols = -c("X","District","Year","Sex","Manner.of.Death",
                          "Cause.of.Death","Race","ID","Age","U47700",
                          "Day","date","quarter","County","Month",
                          "Case.Number","RecID.x","RecID.y","RecID.x.1",
                          "Table.Names","RecID.y.1","Index"),
                names_to = "Drug")
Counts <- DFlong |> filter(value %in% c("C","P")) |> 
   group_by(ID) |> summarize(CountNotEth = n() - 1)
Counts
# A tibble: 3 × 2
     ID CountNotEth
  <int>       <dbl>
1    32           0
2    54           0
3    86           2

I would approach the issue in this way; p.s. ive abstracted away from your actual data to easier to consume example data. p.s. I think you may have data quality issues; drugnames ending .x, and .y make me think of merging data sources gone wrong... you may wish to investigate that.


library(tidyverse)
library(glue)
set.seed(42)
( example_start <- map_dfc(LETTERS[1:5],~tibble("drug_{.x}":=sample(x = c("NO VALUE",
                                                                    "C","P"),
                                                                    size=20,replace=TRUE,
                                                                    prob=c(.7,.25,.05)))))
(overdose_drug_names <- names(example_start))

list_of_frames_involving_each_drug <- map(overdose_drug_names,~
      {
        filter(example_start,
               !!sym(.x)  !="NO VALUE") 
      }) |> set_names(overdose_drug_names)


(frames_involv_each_drug_with_counts <- imap(
  list_of_frames_involving_each_drug,
  ~{
    mutate(rowwise(.x),
           c_count = sum(str_count(c_across(-all_of(.y) & 
                                              where(is.character)),
                                   pattern = "C"
           )),
           p_count = sum(str_count(c_across(-all_of(.y) & 
                                              where(is.character)),
                                   pattern = "P"
           ))
    ) |> ungroup() |> mutate(
      summary_info = glue("{.y} c:{c_count} p:{p_count}")
    )|> relocate(summary_info,c_count,p_count) |> arrange(c_count,p_count) 
  }
))
#lets look at what there is for 'drug_C'
frames_involv_each_drug_with_counts$drug_C

# A tibble: 6 x 8
  summary_info   c_count p_count drug_A   drug_B   drug_C drug_D   drug_E  
  <glue>           <int>   <int> <chr>    <chr>    <chr>  <chr>    <chr>   
1 drug_C c:0 p:0       0       0 NO VALUE NO VALUE P      NO VALUE NO VALUE
2 drug_C c:0 p:0       0       0 NO VALUE NO VALUE P      NO VALUE NO VALUE
3 drug_C c:1 p:0       1       0 C        NO VALUE C      NO VALUE NO VALUE
4 drug_C c:1 p:0       1       0 NO VALUE NO VALUE C      NO VALUE C       
5 drug_C c:2 p:0       2       0 C        C        P      NO VALUE NO VALUE
6 drug_C c:4 p:0       4       0 C        C        C      C        C

Thank you for this! So, just to make sure I understand, in the 3rd row, for ID 86, there were 2 additional cases of "C" and /or "P"?

Yes.
Do note @nirgrahamuk's comments about data quality. I agree that some of the column names suggest problems merging data.

Sorry, one other quick question: What does the above symbol do exactly? I am familiar with the %>%, but I've never seen the one you just posted.

That is the native "pipe" operator which is a relatively new R feature. %>% is magrittr's pipe operator, they are mostly interchangeable with some exceptions because of minor syntax differences

1 Like

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.