analysis of multiple response

Hello everyone,
I need help please:
I would like to analyze multiple response questions using a large database with several types of these questions. but I have a little trouble. the scripts I have don't seem to address my concern.
First of all:
I want to calculate the frequency of each variable with multiple response (put in different colors)
and second: cross each of the two multiple response variables (put in different colors) with the market variable
Here is an extract from the data file.
data_market.pdf (374.6 KB)

And it would be more better if there is a script used to detect questions of select one and select multiple able to make frequencies of the variables when we are in front of a large database.

What have you tried so far? what is your specific problem?, we are more inclined towards helping you with specific coding problems rather than doing your work for you.

Could you please turn this into a self-contained REPRoducible EXample (reprex)? A reprex makes it much easier for others to understand your issue and figure out how to help.

If you've never heard of a reprex before, you might want to start by reading this FAQ:

Thank you @nirgrahamuk for your quick feedback and sorry for my delay

Actually I have a database which has multiple variables of type select one and select multiple. First I want to calculate the frequency of each multiple response variable
and second cross each multiple response variable with a single choice variable.
Here is a small example with a
script. However this script does not provide the results as I wish.

  1. For multiple response variables: the output of the frequencies of the options of the multiple response variables (X and Y) seems incorrect. it considers as if we have only one variable with 6 options.
  2. For the crosstab: I would like to have an output like this:
    sex x1 x2 x3 y1 y2 y3
    man 0.25 0.17 0.17 0.25 0.17 0.17
    woman 0.17 0.42 0.42 0.25 0.42 0.25
###Loading data###
## Two multiple response variables: X and Y (with three options each) 
## Two single-choice variables: sex and market

data <- tibble::tribble(
     ~sex, ~market, ~x1, ~x2, ~x3, ~y1, ~y2, ~y3,
  "woman",    "mk",  0L,  0L,  1L,  0L,  0L,  0L,
  "woman",    "mk",  1L,  1L,  1L,  0L,  0L,  1L,
  "woman",    "mk",  1L,  1L,  0L,  0L,  1L,  1L,
    "man",    "mk",  0L,  0L,  0L,  1L,  0L,  0L,
  "woman",    "mk",  1L,  1L,  1L,  0L,  0L,  1L,
    "man",    "st",  1L,  1L,  0L,  1L,  1L,  0L,
  "woman",    "mk",  1L,  1L,  0L,  0L,  1L,  0L,
    "man",    "st",  1L,  1L,  0L,  0L,  1L,  0L,
  "woman",    "st",  0L,  0L,  0L,  0L,  1L,  1L,
    "man",    "st",  0L,  0L,  1L,  0L,  1L,  0L,
    "man",    "st",  0L,  0L,  1L,  1L,  1L,  0L,
  "woman",    "st",  1L,  0L,  0L,  0L,  0L,  0L
  )
            ###Frequency of multiple response variables###

mult_resp = function(data, v1 = c("x1", "x2", "x3", "y1", "y2", "y3")){
    
       data2 = data %>%
         mutate(id = rownames(.)) %>%  #row id for counting n_cases
         select(id, everything()) %>% 
         mutate_at(v1, ~ ifelse(. != 0, 1, 0)) %>%
         gather(question, resp,-id,-market,-sex) 
      
         #count number of cases excluding "all zeros" cases
         n_cases = data2 %>% group_by(id) %>%
           summarise(n = sum(resp)) %>% 
           summarise(sum(n > 0))
        
           #output table
           res = data2 %>% 
             group_by(question) %>%
             summarise(freq = sum(resp)) %>%
             mutate(
               percent = freq/sum(freq) *100,
               percent_of_cases = freq/as.numeric(n_cases)*100
             ) 
           res
         }

mult_resp(data, v1 = c("x1", "x2", "x3", "y1", "y2", "y3"))

             ### relationship between variables#####  

# cross tabulation: variables select one  and select multiple 
tcd <- function(x, y){
  # Otherwise, make a frequency table, dropping NA values
  tab <- table(x, y, useNA = "no")
  # Calculate a proportion
  pt <- prop.table(tab)
  pt
}

tcd(data$sex, data$x1)
tcd(data$market, data$x1)
tcd(data$sex, data$y1)
tcd(data$market, data$y1)

Created on 2023-05-05 with reprex v2.0.2

You might do the following for the multiples

library(tidyverse)
data <- tibble::tribble(
  ~sex, ~market, ~x1, ~x2, ~x3, ~y1, ~y2, ~y3,
  "woman",    "mk",  0L,  0L,  1L,  0L,  0L,  0L,
  "woman",    "mk",  1L,  1L,  1L,  0L,  0L,  1L,
  "woman",    "mk",  1L,  1L,  0L,  0L,  1L,  1L,
  "man",    "mk",  0L,  0L,  0L,  1L,  0L,  0L,
  "woman",    "mk",  1L,  1L,  1L,  0L,  0L,  1L,
  "man",    "st",  1L,  1L,  0L,  1L,  1L,  0L,
  "woman",    "mk",  1L,  1L,  0L,  0L,  1L,  0L,
  "man",    "st",  1L,  1L,  0L,  0L,  1L,  0L,
  "woman",    "st",  0L,  0L,  0L,  0L,  1L,  1L,
  "man",    "st",  0L,  0L,  1L,  0L,  1L,  0L,
  "man",    "st",  0L,  0L,  1L,  1L,  1L,  0L,
  "woman",    "st",  1L,  0L,  0L,  0L,  0L,  0L
)
###Frequency of multiple response variables###

mrep_single <- function(data, x){
      unite(data,col="value",
                      starts_with(x),sep = ";")  |> 
      group_by(value) |> 
      count() |> ungroup() |> mutate(varname=x) |>
    select(varname,value,n)
}
mrep_single(data,"x") 
mrep_single(data,"y")

multi_rep <- function(data,vars){
  map_dfr(vars,
          \(x)mrep_single(data,x))
}

multi_rep(data,c("x","y"))

Hello @nirgrahamuk thank you for your support,
I already see the result looks good but there are two problems: 1. why the result displays 5 modalities for the multiple response variable (X) while in the table there are only three modalities (x1, x2, x3 ). Same for variable Y, which only has three multiple answer options (y1, y2, y3) but in the result I see 6 options?
2. In the value column: why there are three 3 values ​​are separated by semicolons?

This topic was automatically closed 42 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.