Counting combination of variables in a data set

Hi there,

I have included a sample of the first 20 people in my data set:

head(Dataset2, n=20)
data.frame(
                                     Id = c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L,
                                            10L, 11L, 12L, 13L, 14L, 15L, 16L,
                                            17L, 18L, 19L, 20L),
                                    Age = c(47L, 47L, 51L, 50L, 38L, 22L, 37L,
                                            53L, 23L, 38L, 59L, 24L, 33L, 44L,
                                            49L, 51L, 31L, 54L, 32L, 57L),
                                  CPAMs = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L,
                                            0L, 0L, 0L, 1L, 1L, 0L, 0L, 0L, 1L,
                                            1L, 0L),
                 vaccinator = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
                                            1L, 0L, 1L, 1L, 0L, 0L, 1L, 1L, 0L,
                                            0L, 1L),
   Pharmacist= c(1L, 1L, 0L, 1L, 0L, 0L, 1L, 1L, 1L,
                                            1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
                                            1L, 1L),
              MUR = c(0L, 1L, 1L, 1L, 0L, 0L, 1L, 1L, 0L,
                                            0L, 1L, 0L, 1L, 0L, 0L, 0L, 0L, 1L,
                                            0L, 1L),
       MTA = c(0L, 1L, 0L, 1L, 0L, 0L, 1L, 1L, 0L,
                                            0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L,
                                            1L, 0L),
                  Prescriber = c(0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
                                            0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
                                            0L, 0L),
                                 Others = c(0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L,
                                            0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L,
                                            0L, 0L),
                             None.apply = c(0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L,
                                            0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
                                            0L, 0L),
               Number.of.years.practice = c(25, 24, 29, 26, 17, 1, 13, 31, 1,
                                            15, 30, 2, 10, 22, 28, 30, 1.5, 32,
                                            10, 34),               
                     Chronic.conditions = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
                                            1L, 1L, 0L, 1L, 1L, 1L, 1L, 0L, 1L,
                                            1L, 1L),
     Elderly.patients.in.their.own.home = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 0L,
                                            1L, 0L, 0L, 1L, 1L, 1L, 1L, 0L, 1L,
                                            1L, 1L),
    Elderly.residential.care.facilities = c(0L, 1L, 1L, 1L, 1L, 1L, 1L, 0L, 1L,
                                            0L, 0L, 0L, 1L, 1L, 0L, 0L, 0L, 0L,
                                            1L, 1L),
      Other.residential.care.facilities = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
                                            0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 1L,
                                            1L, 1L),
               Low.socioeconomic.status = c(1L, 1L, 1L, 1L, 1L, 0L, 1L, 1L, 1L,
                                            0L, 1L, 1L, 1L, 1L, 1L, 1L, 0L, 1L,
                                            0L, 1L),
                         Young.families = c(1L, 1L, 0L, 0L, 0L, 1L, 1L, 1L, 1L,
                                            1L, 0L, 0L, 1L, 1L, 1L, 1L, 0L, 1L,
                                            1L, 1L),
                                 Gender = as.factor(c("Female", "Male",
                                                      "Female", "Female",
                                                      "Female", "Male", "Male",
                                                      "Female", "Male", "Female",
                                                      "Male", "Male", "Female",
                                                      "Female", "Female", "Female",
                                                      "Male", "Male", "Female",
                                                      "Male")),
)

I am trying to work out the following please with R studio:
1). The following 7 variables: CPAMs, vaccinator, Pharmacist, MUR, MTA, Prescriber, Others, None.apply
are qualifications that a respondent could have (0=no, 1 = yes).
I know how to use the count function to work out how many people have either CPAMs, or how many have vaccinator qualifications, etc.
How to I use R studio to work out how many people in the data set have combinations for the 7 types of qualifications available? For example I might find that in my data set most people have vaccinator and Pharmacist qualifications.

2). The following 6 variables: Chronic.conditions, Elderly.patients.in.their.own.home, Elderly.residential.care.facilities, Other.residential.care.facilities, Low.socioeconomic.status, Young.families
are 6 different client types that a respondent may encounter.
Again, I know how to use the count function to work out how many respondents encounter either Chronic.conditions, or Young.families, etc.
How to I use R studio to work out how many respondents in the data set have encountered combinations for the 6 types of client type? For example I might find that in my data set most respondents have encountered Chronic.conditions, or Young.families.

Hopefully I have explained this well enough.
Any help is appreciated.

Thanks:-)

If I understand you correctly, I can only think of doing this with nested for loops.

DF <- data.frame(
  Id = c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L,
         10L, 11L, 12L, 13L, 14L, 15L, 16L,
         17L, 18L, 19L, 20L),
  Age = c(47L, 47L, 51L, 50L, 38L, 22L, 37L,
          53L, 23L, 38L, 59L, 24L, 33L, 44L,
          49L, 51L, 31L, 54L, 32L, 57L),
  CPAMs = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L,
            0L, 0L, 0L, 1L, 1L, 0L, 0L, 0L, 1L,
            1L, 0L),
  vaccinator = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
                 1L, 0L, 1L, 1L, 0L, 0L, 1L, 1L, 0L,
                 0L, 1L),
  Pharmacist= c(1L, 1L, 0L, 1L, 0L, 0L, 1L, 1L, 1L,
                1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
                1L, 1L),
  MUR = c(0L, 1L, 1L, 1L, 0L, 0L, 1L, 1L, 0L,
          0L, 1L, 0L, 1L, 0L, 0L, 0L, 0L, 1L,
          0L, 1L),
  MTA = c(0L, 1L, 0L, 1L, 0L, 0L, 1L, 1L, 0L,
          0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L,
          1L, 0L),
  Prescriber = c(0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
                 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
                 0L, 0L),
  Others = c(0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L,
             0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L,
             0L, 0L),
  None.apply = c(0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L,
                 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
                 0L, 0L),
  Number.of.years.practice = c(25, 24, 29, 26, 17, 1, 13, 31, 1,
                               15, 30, 2, 10, 22, 28, 30, 1.5, 32,
                               10, 34),               
  Chronic.conditions = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
                         1L, 1L, 0L, 1L, 1L, 1L, 1L, 0L, 1L,
                         1L, 1L),
  Elderly.patients.in.their.own.home = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 0L,
                                         1L, 0L, 0L, 1L, 1L, 1L, 1L, 0L, 1L,
                                         1L, 1L),
  Elderly.residential.care.facilities = c(0L, 1L, 1L, 1L, 1L, 1L, 1L, 0L, 1L,
                                          0L, 0L, 0L, 1L, 1L, 0L, 0L, 0L, 0L,
                                          1L, 1L),
  Other.residential.care.facilities = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
                                        0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 1L,
                                        1L, 1L),
  Low.socioeconomic.status = c(1L, 1L, 1L, 1L, 1L, 0L, 1L, 1L, 1L,
                               0L, 1L, 1L, 1L, 1L, 1L, 1L, 0L, 1L,
                               0L, 1L),
  Young.families = c(1L, 1L, 0L, 0L, 0L, 1L, 1L, 1L, 1L,
                     1L, 0L, 0L, 1L, 1L, 1L, 1L, 0L, 1L,
                     1L, 1L),
  Gender = as.factor(c("Female", "Male",
                       "Female", "Female",
                       "Female", "Male", "Male",
                       "Female", "Male", "Female",
                       "Male", "Male", "Female",
                       "Female", "Female", "Female",
                       "Male", "Male", "Female",
                       "Male"))
)

ColNames <- colnames(DF)[3:10]
ColNames
#> [1] "CPAMs"      "vaccinator" "Pharmacist" "MUR"        "MTA"       
#> [6] "Prescriber" "Others"     "None.apply"

PairsList <- vector(mode = "list")
for (i in 1:7) {
  for (j in (i+1):8) {
    Col1 <- ColNames[i]
    Col2 <- ColNames[j]
    CombName <- paste(Col1, Col2, sep = "_")
    PairsList[[CombName]] <- sum(DF[[Col1]] & DF[[Col2]])
  }
}
PairsList
#> $CPAMs_vaccinator
#> [1] 1
#> 
#> $CPAMs_Pharmacist
#> [1] 5
#> 
#> $CPAMs_MUR
#> [1] 3
#> 
#> $CPAMs_MTA
#> [1] 3
#> 
#> $CPAMs_Prescriber
#> [1] 0
#> 
#> $CPAMs_Others
#> [1] 0
#> 
#> $CPAMs_None.apply
#> [1] 0
#> 
#> $vaccinator_Pharmacist
#> [1] 6
#> 
#> $vaccinator_MUR
#> [1] 2
#> 
#> $vaccinator_MTA
#> [1] 1
#> 
#> $vaccinator_Prescriber
#> [1] 0
#> 
#> $vaccinator_Others
#> [1] 1
#> 
#> $vaccinator_None.apply
#> [1] 0
#> 
#> $Pharmacist_MUR
#> [1] 8
#> 
#> $Pharmacist_MTA
#> [1] 6
#> 
#> $Pharmacist_Prescriber
#> [1] 1
#> 
#> $Pharmacist_Others
#> [1] 2
#> 
#> $Pharmacist_None.apply
#> [1] 0
#> 
#> $MUR_MTA
#> [1] 5
#> 
#> $MUR_Prescriber
#> [1] 1
#> 
#> $MUR_Others
#> [1] 1
#> 
#> $MUR_None.apply
#> [1] 0
#> 
#> $MTA_Prescriber
#> [1] 1
#> 
#> $MTA_Others
#> [1] 1
#> 
#> $MTA_None.apply
#> [1] 0
#> 
#> $Prescriber_Others
#> [1] 0
#> 
#> $Prescriber_None.apply
#> [1] 0
#> 
#> $Others_None.apply
#> [1] 0

Created on 2019-11-03 by the reprex package (v0.3.0.9000)

Assuming FJCC has interpreted correctly your question, this would be a purrr based alternative solution.

library(tidyverse)

combn(colnames(df)[3:10], 2) %>%
    t() %>%
    as_tibble() %>%
    mutate(Count = map2_int(V1, V2, ~sum(df[[.x]] & df[[.y]])))
#> Warning: `as_tibble.matrix()` requires a matrix with column names or a `.name_repair` argument. Using compatibility `.name_repair`.
#> This warning is displayed once per session.
#> # A tibble: 28 x 3
#>    V1         V2         Count
#>    <chr>      <chr>      <int>
#>  1 CPAMs      vaccinator     1
#>  2 CPAMs      Pharmacist     5
#>  3 CPAMs      MUR            3
#>  4 CPAMs      MTA            3
#>  5 CPAMs      Prescriber     0
#>  6 CPAMs      Others         0
#>  7 CPAMs      None.apply     0
#>  8 vaccinator Pharmacist     6
#>  9 vaccinator MUR            2
#> 10 vaccinator MTA            1
#> # … with 18 more rows
1 Like

This is really cool, but a bit unclear how map2_int is working. Can you help me understand what this step is doing with the tribble from line 3 and the df?
thanks!

Taking advantage that the data frame is one-hot encoded, I'm using map2() to sum the logical union of each pair-wise combination, basically the same as FJCC's for-loop solution but using purrr instead.

image

1 Like

Thank you FJCC and andresrcs for those solutions. This works for all 2 item combinations of the variables.

1). For the 7 variables: CPAMs, vaccinator, Pharmacist, MUR, MTA, Prescriber, Others
I would like to work out the count of respondents with all possible combinations (where order does not matter) of these variables (i.e. count of the combination of 2 variables, 3 variables, 4 variables, 5 variables, 6 variables, and 7 variables respectively, where order does not matter).
The output should look something like:

V1       V2          V3          V4       V5       V6         V7        Count
 <chr>   <chr>       <chr>       <chr>   <chr>    <chr>      <chr>      <int>  
CPAMs    vaccinator      
CPAMs                Pharmacist    
CPAMs                             MUR      
CPAMs                                     MTA     
CPAMs                                             Prescriber  
CPAMs                                                         Others
         vaccinator  Pharmacist   
         vaccinator               MUR     
         vaccinator                       MTA     
         vaccinator                               Prescriber  
         vaccinator                                           Others
                     Pharmacist   MUR     
                     Pharmacist           MTA     
                     Pharmacist                   Prescriber  
                     Pharmacist                               Others
                                  MUR     MTA     
                                  MUR             Prescriber  
                                  MUR                         Others
                                          MTA     Prescriber  
                                          MTA                 Others
CPAMs    vaccinator  Pharmacist   
CPAMs    vaccinator               MUR     
CPAMs    vaccinator                       MTA     
CPAMs    vaccinator                               Prescriber  
CPAMs    vaccinator                                           Others
         vaccinator  Pharmacist   MUR    
         vaccinator  Pharmacist           MTA     
         vaccinator  Pharmacist                   Prescriber  
         vaccinator  Pharmacist                               Others

Please note that there a lot of combinations and I have not included all possible combinations (where order does not matter) in the intended output above as the list would be very long. Hopefully you can see what I am trying to do here.

3). For the the following 6 variables: Chronic.conditions, Elderly.patients.in.their.own.home, Elderly.residential.care.facilities, Other.residential.care.facilities, Low.socioeconomic.status, Young.families
I would like to work out the count of respondents with all possible combinations (where order does not matter) of 6 different client types (i.e. count of the combination of 2 variables, 3 variables, 4 variables, 5 variables, and 6 variables respectively, where order does not matter).
The output should look something like what was required in question 2 above.

Thanks:-)

This is becoming too broad, it feels like you giving us homework, I recommend you to try to solve this yourself applying what you have learned from our answers and if you get stuck, come back here with specific coding questions, we are more inclined towards helping you with specific coding problems rather than doing your work for you.

Thank you andresrcs,

It's not actually homework, but I do realise that it is asking a lot from me.
Being a novice, it's just that I don't know where to start to be able to get to the required output with my data.

Thanks any way, I'll try to work this out in R.
I know it will take me ages but worst case scenario, I'll try and work out the count for all possible combinations using Excel.

Thank you.

Hi andresrcs.

I have worked out a solution for this problem using R studio.

Again, apologies if it sounded like I was asking for help with homework—I really wasn't.

Thanks for the help and advice:-)

1 Like

Please post the solution if you can, and mark it as the solution. It could help others in the future.