multiResponse in ufs or other easy ways of creating multiresponse sets

Hi,
I am trying to create some multiresponse sets containing 0 and 1s using ufs package but its documentation is really poor: https://www.rdocumentation.org/packages/userfriendlyscience/versions/0.7.2/topics/multiResponse

Do you have any experience in creating multiple response sets? Can you suggest anything?

This is example data which may be used:

library(readxl)
data.source <- read_excel("I:/Departments/Insight Department/7 R&D/R Training/Multi response data.xlsx")

data.source
#> # A tibble: 12 x 6
#>    URN   Blank   Oil   MOT   Rep Other
#>    <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
#>  1 aaa       0     0     0     0     1
#>  2 bbb       0     0     0     0     1
#>  3 ccc       0     1     1     0     0
#>  4 ddd       0     1     1     1     0
#>  5 eee       0     0     1     0     0
#>  6 fff       0     0     0     1     0
#>  7 ggg       0     0     0     0     1
#>  8 hhh       0     0     0     0     1
#>  9 iii       0     1     0     1     0
#> 10 jjj       0     0     0     0     1
#> 11 kkk       1     0     0     0     0
#> 12 lll       0     0     0     0     1

Created on 2019-10-11 by the reprex package (v0.3.0)

I need to create one variable (set) containing all 0-1 variables above. What I have done so far, was creating some data frames manually:

categories <- data.frame(Freq=colSums(data.source[2:6]),
                         Pct.of.Resp=(colSums(data.source[2:6])/sum(data.source[2:6]))*100,
                         Pct.ofCases=(colSums(data.source[2:6])/nrow(data.source[2:6]))*100)

no.blank.categories <- data.frame(Freq=colSums(data.source[3:6]),
                                  Pct.of.Resp=(colSums(data.source[3:6])/sum(data.source[3:6]))*100,
                                  Pct.ofCases=(colSums(data.source[3:6])/nrow(data.source[3:6]))*100)

What do you think?

Please provide a proper reproducible example

I am sorry. I am new to reprex which was always an issue for me.

Here you ago. I hope this is a proper reprex :clap::

URN <- c("aaa", "bbb", "ccc", "ddd", "eee", "fff", "ggg", "hhh", "iii", "jjj", "kkk", "lll", "mmm", "nnn", "ooo", "ppp")
Blank <- c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0)
Oil <- c(0, 0, 1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 1)
MOT <- c(0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1)
Rep <- c(0, 0, 0, 1, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 1)
Other <- c(1, 1, 0, 0, 0, 0, 1, 1, 0, 1, 0, 1, 0, 0, 0, 0)
Gender <- c("Male", "Male", "Male", "Male", "Male", "Male", "Female", "Female", "Female", "Female", "Female", "Female", "Male", "Male", "Male", "Male")

response.data <- data.frame(URN, Blank, Oil, MOT, Rep, Other, Gender)

response.data
#>    URN Blank Oil MOT Rep Other Gender
#> 1  aaa     0   0   0   0     1   Male
#> 2  bbb     0   0   0   0     1   Male
#> 3  ccc     0   1   1   0     0   Male
#> 4  ddd     0   1   1   1     0   Male
#> 5  eee     0   0   1   0     0   Male
#> 6  fff     0   0   0   1     0   Male
#> 7  ggg     0   0   0   0     1 Female
#> 8  hhh     0   0   0   0     1 Female
#> 9  iii     0   1   0   1     0 Female
#> 10 jjj     0   0   0   0     1 Female
#> 11 kkk     1   0   0   0     0 Female
#> 12 lll     0   0   0   0     1 Female
#> 13 mmm     1   0   0   0     0   Male
#> 14 nnn     0   1   1   1     0   Male
#> 15 ooo     1   0   0   0     0   Male
#> 16 ppp     0   1   1   1     0   Male



categories <- data.frame(Freq=colSums(response.data[2:6]),
                         Pct.of.Resp=(colSums(response.data[2:6])/sum(response.data[2:6]))*100,
                         Pct.of.Cases=(colSums(response.data[2:6])/nrow(response.data[2:6]))*100)
categories
#>       Freq Pct.of.Resp Pct.of.Cases
#> Blank    3    12.50000       18.75
#> Oil      5    20.83333       31.25
#> MOT      5    20.83333       31.25
#> Rep      5    20.83333       31.25
#> Other    6    25.00000       37.50

no.blank.categories <- data.frame(Freq=colSums(response.data[3:6]),
                                  Pct.of.Resp=(colSums(response.data[3:6])/sum(response.data[3:6]))*100,
                                  Pct.of.Cases=(colSums(response.data[3:6])/nrow(response.data[3:6]))*100)
no.blank.categories
#>       Freq Pct.of.Resp Pct.of.Cases
#> Oil      5    23.80952       31.25
#> MOT      5    23.80952       31.25
#> Rep      5    23.80952       31.25
#> Other    6    28.57143       37.50

Created on 2019-10-14 by the reprex package (v0.3.0)

I need to create one variable containing all 0/1 responses (and its version without blanks) which could be used in further analysis, tabulations etc. What is required is Pct.ofCases which is usually higher than 100%. Multiresponse sets are very common in market research as every question when you select more than one response is treated as multiresponse set. Some of these questions contain more than 30 variables so it's important to find a way in R which is relatively simple in SPSS or other stats tools.

One of examples of required tables is this SPSS table prepared in a couple of minutes:

categories	Total_Count	Total_Pct.of.Cases	Female_Count	Female_Pct.of.Cases	Male_Count	Male_Pct.of.Cases
Blank	3	18.8%	1	16.7%	2	20.0%
Oil	5	31.3%	1	16.7%	4	40.0%
MOT	5	31.3%	0	0.0%	5	50.0%
Rep	5	31.3%	1	16.7%	4	40.0%
Other	6	37.5%	4	66.7%	2	20.0%
Total	16	100.0%	6	100.0%	10	100.0%



I also have an issue with that. When I export my two data frames to excel, category names get missing and I can only see Freq, Pct.of.Resp and Pct.of.Cases in exported files.

Creating dataframes was a temporary solution but the main request (multiresponse variable) still remains unresolved...

perhaps I could do analysis by generating this type of tables:

library(plyr)

idea <- response.data %>% 
  group_by(Gender) %>% 
  summarise(Blank = mean(Blank),
            Oil = mean(Oil),
            MOT = mean(MOT),
            Rep = mean(Rep),
            Other = mean(Other),
            count = n())
idea

but maybe we could specify range of variables in the data frame rather than listing them one by one?
The code above requires a modification as I don't have expected results...

You can use summarise_at() or summarise_if()

I'm not clear on exactly what output you're after, but some summarise_all() (or summarise_at() if you want to explicitly include rather than exclude variables) could be useful:

response.data %>% 
    select(-URN, -Gender) %>% 
    summarise_all(list(sum = sum, mean = mean))

 Blank_sum Oil_sum MOT_sum Rep_sum Other_sum Blank_mean Oil_mean MOT_mean Rep_mean Other_mean
1         3       5       5       5         6     0.1875   0.3125   0.3125   0.3125      0.375

Combine that with a pivot_longer()/gather() followed by pivot_wider()/spread() from tidyr and you're getting close to what you're after:

response.data %>% 
    select(-URN, -Gender) %>% 
    summarise_all(list(Count = sum, Proportion = mean)) %>% 
    pivot_longer(everything(), names_to = c("Category", "summary"), names_sep = "_", "value") %>% 
    pivot_wider(names_from = summary, values_from = value)

# A tibble: 5 x 3
  Category Count Proportion
  <chr>    <dbl>      <dbl>
1 Blank        3      0.188
2 Oil          5      0.312
3 MOT          5      0.312
4 Rep          5      0.312
5 Other        6      0.375

You could also throw in a group_by(Gender) if you're interested in these values by your Gender variable:

response.data %>% 
    select(-URN) %>% 
    group_by(Gender) %>% 
    summarise_all(list(Count = sum, Proportion = mean)) %>% 
    pivot_longer(-Gender, names_to = c("Category", "summary"), names_sep = "_", "value") %>% 
    pivot_wider(names_from = summary, values_from = value)

# A tibble: 10 x 4
   Gender Category Count Proportion
   <fct>  <chr>    <dbl>      <dbl>
 1 Female Blank        1      0.167
 2 Female Oil          1      0.167
 3 Female MOT          0      0    
 4 Female Rep          1      0.167
 5 Female Other        4      0.667
 6 Male   Blank        2      0.2  
 7 Male   Oil          4      0.4  
 8 Male   MOT          5      0.5  
 9 Male   Rep          4      0.4  
10 Male   Other        2      0.2

I'm not sure if this completely solves what you're trying to do, but hopefully it points you in the right direction.

Very interesting solutions. Thank you.

I think I did not clearly highlight what the main point is though.
In this small sample I have only five 0/1 variables but in many cases I have sets of 20+ variables which are part of multiresponse questions.
I need to find a way of creating a separate variable containing all of these 0/1 questions or data frames which could be easily exported to excel tables but without typing names of all of questions 20+ (5 in this sample file).

Perfect option would be something like this: https://www.rdocumentation.org/packages/userfriendlyscience/versions/0.7.2/topics/multiResponse

I couldn't find any responses about possible packages therefore my idea was creating data frames using range of variables.
Finally, I thought my "categories" and "no.blank.categories" are really basic so I was looking for clever solutions create them using some range statements (like [2:6]) to get counts and Pct.of.Cases.

Thank you very much for your help. I think I found a solution.
I can rename all variables related to a multiresponse question by adding it's prefix and selecting variables with this prefix:


library(plyr)
rename(response.data, c("Blank"="TMC.Blank", "Oil"="TMC.Oil", "MOT"="TMC.MOT", "Rep"="TMC.Rep", "Other"="TMC.Other"))

library(tidyr)
idea <- response.data %>% 
  select(starts_with("TMC.")) %>% 
  summarise_all(list(Count = sum, Proportion = mean)) %>% 
  pivot_longer(everything(), names_to = c("Category", "summary"), names_sep = "_", "value") %>% 
  pivot_wider(names_from = summary, values_from = value)

idea2 <- response.data2 %>% 
  select(Gender, starts_with("TMC.")) %>% 
  group_by(Gender) %>% 
  summarise_all(list(Count = sum, Proportion = mean)) %>% 
  pivot_longer(-Gender, names_to = c("Category", "summary"), names_sep = "_", "value") %>% 
  pivot_wider(names_from = summary, values_from = value)

What do you think?

Hi, I'm impressed with the solution but I've started applying that to my real data.

I have a problem with adding one more level to the group_by. Let's say I have one additional variable to add to your solution "Type" (A or B). How can I add it as the first level?

URN <- c("aaa", "bbb", "ccc", "ddd", "eee", "fff", "ggg", "hhh", "iii", "jjj", "kkk", "lll", "mmm", "nnn", "ooo", "ppp")
Blank <- c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0)
Oil <- c(0, 0, 1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 1)
MOT <- c(0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1)
Rep <- c(0, 0, 0, 1, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 1)
Other <- c(1, 1, 0, 0, 0, 0, 1, 1, 0, 1, 0, 1, 0, 0, 0, 0)
Gender <- c("Male", "Male", "Male", "Male", "Male", "Male", "Female", "Female", "Female", "Female", "Female", "Female", "Male", "Male", "Male", "Male")
Type <- c("A", "A", "A", "A", "A", "B", "B", "B", "B", "B", "B", "B", "A", "A", "A", "A")

response.data <- data.frame(URN, Blank, Oil, MOT, Rep, Other, Gender, Type)

library(plyr)
rename(response.data, c("Blank"="TMC.Blank", "Oil"="TMC.Oil", "MOT"="TMC.MOT", "Rep"="TMC.Rep", "Other"="TMC.Other"))

response.data %>% 
  select(Type, Gender, starts_with("TMC.")) %>% 
    group_by(Type, Gender) %>% 
    summarise_all(list(Count = sum, Proportion = mean)) %>% 
    pivot_longer(-Gender, -Type, names_to = c("Category", "summary"), names_sep = "_", "value") %>% 
    pivot_wider(names_from = summary, values_from = value)

Sorry for asking further questions to the solution but I find it really useful...

There are several problems with your code

You should be using dplyr for this, not plyr and if you want to use the pivot_ functions you also have to load tidyr

You have forgotten that tidyverse functions don't perform in-place modifications, so this line essentially does nothing and I think is unnecessary.

Obviously, without the previous renaming taking effect, this is not going to work

Here you are using the wrong syntax, if you want to pass multiple values to the cols argument, then you have to use a vector c(-Gender, -Type)


Solving all the syntax and logic problems, the code works:

library(dplyr)
library(tidyr)

response.data <- data.frame(stringsAsFactors = FALSE,
    Blank = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0),
    Oil = c(0, 0, 1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 1),
    MOT = c(0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1),
    Rep = c(0, 0, 0, 1, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 1),
    Other = c(1, 1, 0, 0, 0, 0, 1, 1, 0, 1, 0, 1, 0, 0, 0, 0),
    URN = c("aaa", "bbb", "ccc", "ddd", "eee", "fff", "ggg",
                      "hhh", "iii", "jjj", "kkk", "lll", "mmm", "nnn",
                      "ooo", "ppp"),
    Gender = c("Male", "Male", "Male", "Male", "Male", "Male",
                         "Female", "Female", "Female", "Female", "Female",
                         "Female", "Male", "Male", "Male", "Male"),
    Type = c("A", "A", "A", "A", "A", "B", "B", "B", "B", "B",
                       "B", "B", "A", "A", "A", "A")
)

response.data %>% 
    select(Type, Gender, everything(), -URN) %>% 
    group_by(Type, Gender) %>% 
    summarise_all(list(Count = sum, Proportion = mean)) %>% 
    pivot_longer(cols = c(-Gender, -Type),
                 names_to = c("Category", "summary"),
                 names_sep = "_", "value") %>% 
    pivot_wider(names_from = summary, values_from = value)
#> # A tibble: 15 x 5
#> # Groups:   Type [2]
#>    Type  Gender Category Count Proportion
#>    <chr> <chr>  <chr>    <dbl>      <dbl>
#>  1 A     Male   Blank        2      0.222
#>  2 A     Male   Oil          4      0.444
#>  3 A     Male   MOT          5      0.556
#>  4 A     Male   Rep          3      0.333
#>  5 A     Male   Other        2      0.222
#>  6 B     Female Blank        1      0.167
#>  7 B     Female Oil          1      0.167
#>  8 B     Female MOT          0      0    
#>  9 B     Female Rep          1      0.167
#> 10 B     Female Other        4      0.667
#> 11 B     Male   Blank        0      0    
#> 12 B     Male   Oil          0      0    
#> 13 B     Male   MOT          0      0    
#> 14 B     Male   Rep          1      1    
#> 15 B     Male   Other        0      0
2 Likes

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.