Recoding values of one variable into multiple variables

Hi,
I have this simple df:


source <- data.frame(
  stringsAsFactors = FALSE,
               URN = c("21GB0421520600348376",
                       "21GB0418447730308396","21GB0424598350378907","21GB0418447730308382","21GB0420497680334731",
                       "21GB0416416410294317","21GB0424598350378906",
                       "21GB0412336390238885","21GB0424594550377080",
                       "21GB0418447730308379","21GB0414375210263918","21GB0424594550377079",
                       "21GB0414375210263920","21GB0418452110311403",
                       "21GB0423572270366131"),
               QF0 = c(2, 6, 33, 35, 32, 4, 33, 1, 32, 64, 2, 64, 2, 8, 33)
)

..where QF0 should be recoded into seven variables with the following logic:
Values and new variables' names:
1 - Routine service / maintenance
2 - Repair
4 - Recall campaign
8 - Tire change
16 - Windscreen, windows, other work related to glass
32 - MOT related visit
64 - Other

Other values are combinations of the above:

The first df record has value 2 therefore values for all these 7 new variables should be:
0 for Routine service / maintenance
1 for Repair
0 for Recall campaign
0 for Tire change
0 for Windscreen, windows, other work related to glass
0 for MOT related visit
0 for Other

The second df record has value 6 (2+4) therefore values for all these 7 new variables should be:
0 for Routine service / maintenance
1 for Repair
1 for Recall campaign
0 for Tire change
0 for Windscreen, windows, other work related to glass
0 for MOT related visit
0 for Other

The third df record has value 33 (32+1) therefore values for all these 7 new variables should be:
1 for Routine service / maintenance
0 for Repair
0 for Recall campaign
0 for Tire change
0 for Windscreen, windows, other work related to glass
1 for MOT related visit
0 for Other

I can do this recoding easily in Excel but is it possible doing this in R?

Once the coging is complete, is it possible to create one multiple response variable in R like in other statistical packages? This variable could be used for all tabulations.

I would say use the tool you are more fluent with

Yes, it is. Do you have any specific coding question about it? As I said to you before, we are more inclined towards helping you with specific coding problems rather than doing your work for you.

source <- data.frame(
  stringsAsFactors = FALSE,
  URN = c("21GB0421520600348376",
          "21GB0418447730308396","21GB0424598350378907","21GB0418447730308382","21GB0420497680334731",
          "21GB0416416410294317","21GB0424598350378906",
          "21GB0412336390238885","21GB0424594550377080",
          "21GB0418447730308379","21GB0414375210263918","21GB0424594550377079",
          "21GB0414375210263920","21GB0418452110311403",
          "21GB0423572270366131"),
  QF0 = c(2, 6, 33, 35, 32, 4, 33, 1, 32, 64, 2, 64, 2, 8, 33)
)

tf <- c(TRUE,FALSE)
combs <- expand.grid(map(1:7,~tf))

(c2 <- sweep(combs,MARGIN=2,2^(0:6),`*`))

c2$rsum <- rowSums(c2)

head(c2)

source %>% left_join(c2,by=c("QF0"="rsum")) %>%
  mutate(across(.cols=c(-URN,-QF0),~as.integer(.>0)))
1 Like

Thank you but:

 tf <- c(TRUE,FALSE)
> combs <- expand.grid(map(1:7,~tf))
Error in map(1:7, ~tf) : could not find function "map"
> 
> (c2 <- sweep(combs,MARGIN=2,2^(0:6),`*`))
Error in sweep(combs, MARGIN = 2, 2^(0:6), `*`) : 
  object 'combs' not found
> 
> c2$rsum <- rowSums(c2)
Error in is.data.frame(x) : object 'c2' not found
> 
> head(c2)
Error in head(c2) : object 'c2' not found
> 
> source %>% left_join(c2,by=c("QF0"="rsum")) %>%
+   mutate(across(.cols=c(-URN,-QF0),~as.integer(.>0)))
Error in is.data.frame(y) : object 'c2' not found
> 

library(tidyverse)
includes library(purrr) which contains purrr::map

Hi Slavek,

The map function is not found because you have not loaded the purrr package or tidyverse. Either will get you to the finish as purrr is part of tidyverse, or just load what you specifically need.

After Install:
library(tidyverse)
or
library(purrr)

Hope this helps
-Q

Thank you, almost resolved but I need 1 and 0 in each new variable instead of values.

result <- source %>% left_join(c2,by=c("QF0"="rsum")) %>%
  mutate(across(.cols=c(-URN,-QF0),~as.integer(.>0)))
result

library(dplyr)
recoding <- result %>% 
  mutate_if(~is.numeric(.) && any(. > 0, na.rm = TRUE),
            ~ if_else(.x>0,1,.x))
recoding

Also, the second part of my initial question:
Is there any way of merging new variables into one multi-response variable (let's call it "MultiVar") which might be uses in tabulations?
All I can do is using all variables starting with "Var":

library(tidyr)
idea <- recoding %>% 
  select(starts_with("Var")) %>% 
  summarise_all(list(Count = sum, Proportion = mean)) %>% 
  pivot_longer(everything(), names_to = c("Category", "summary"), names_sep = "_", "value") %>% 
  pivot_wider(names_from = summary, values_from = value)
idea

but this is not one variable but a df so I cannot use it in my tables

almost resolved but I need 1 and 0 in each new variable instead of values.

I find this mystifying because the result of my code is to produce integer column value of 0 and 1, what are the 'values' you refer to ?

but this is not one variable but a df so I cannot use it in my tables
my best guess is that you want tidyr::unite(), have a look and feedback?

Thank you. Your solution generates new variables with values:
Var1 = 0 or 1
Var2 = 0 or 2
Var3 = 0 or 4 etc.

I need:
Var1 = 0 or 1
Var2 = 0 or 1
Var3 = 0 or 1 etc.

I used recoding but it would be better to have 0 or 1s from the beginning to avoid recoding other numeric variables in my original, bigger df. Alternatively I could use mutate_if variables starts from "Var" but I am having issues with my code.

In terms of my second request, all statistical packages like SPSS have an option of using one multi-response question (as drag-and-drop or a code). You can then define this variable in your tabulations:
Have a look at these examples :
Tableau: Survey Data Multiple Response Questions Frequency Analysis in Tableau - YouTube
or IBM SPSS: SPSS Tables - Frequency or Cross table of a Multiple Answers question (using Multiple Responses) - YouTube

this is not accurate.
my solution does that as a necessary interim step (c2 data.frame)
but the join and mutate I provided (that you named result) deals with this and results in the 0,1 view

so you are asking in general how to create summary statistics ?

Thank you for your prompt response.
So having this:

                 URN QF0 Var1 Var2 Var3 Var4 Var5 Var6 Var7
1  21GB0421520600348376   2    0    2    0    0    0    0    0
2  21GB0418447730308396   6    0    2    4    0    0    0    0
3  21GB0424598350378907  33    1    0    0    0    0   32    0
4  21GB0418447730308382  35    1    2    0    0    0   32    0
5  21GB0420497680334731  32    0    0    0    0    0   32    0
6  21GB0416416410294317   4    0    0    4    0    0    0    0
7  21GB0424598350378906  33    1    0    0    0    0   32    0
8  21GB0412336390238885   1    1    0    0    0    0    0    0
9  21GB0424594550377080  32    0    0    0    0    0   32    0
10 21GB0418447730308379  64    0    0    0    0    0    0   64
11 21GB0414375210263918   2    0    2    0    0    0    0    0
12 21GB0424594550377079  64    0    0    0    0    0    0   64
13 21GB0414375210263920   2    0    2    0    0    0    0    0
14 21GB0418452110311403   8    0    0    0    8    0    0    0
15 21GB0423572270366131  33    1    0    0    0    0   32    0

Can I recode all values >=0 in variables starting with "Var" into 1s?
I believe, I should use somehow

mutate if

My second question is about R capabilities. I cannot find anywhere in R documentation using multi-response variables. Is my code of summary statistics the only way of displaying results of this one question? For other, typical tabulations we can use just a name of a variable and cross it with anything else, check frequencies, proportions, etc. Can we do it with one multiple-response-variable (let's call it "MultiVar") or the only way of checking valies of all Vars (Var1, Var2, Var3) is summary statistics?
I have added gender:

source <- data.frame(
  stringsAsFactors = FALSE,
               URN = c("21GB0421520600348376",
                       "21GB0418447730308396","21GB0424598350378907","21GB0418447730308382","21GB0420497680334731",
                       "21GB0416416410294317","21GB0424598350378906",
                       "21GB0412336390238885","21GB0424594550377080",
                       "21GB0418447730308379","21GB0414375210263918","21GB0424594550377079",
                       "21GB0414375210263920","21GB0418452110311403",
                       "21GB0423572270366131"),
               QF0 = c(2, 6, 33, 35, 32, 4, 33, 1, 32, 64, 2, 64, 2, 8, 33),
              Gender = c(2, 1, 2, 2, 2, 1, 2, 1, 2, 1, 2, 1, 2, 2, 1)
)

source

Once I recode all Var into 0/1 I would like to see results for each gender.
Is this the only way:


idea2 <- recoding %>% 
  select(Gender, starts_with("Var")) %>% 
  group_by(Gender) %>% 
  summarise_all(list(Count = sum, Proportion = mean)) %>% 
  pivot_longer(-Gender, names_to = c("Category", "summary"), names_sep = "_", "value") %>% 
  pivot_wider(names_from = summary, values_from = value)

idea2

Hi,

try to use binary representation of each numeber.

For example if you convert 33 to binary you get '100001'. Add as many leading zeors as you need to get string of 7 characters length (this is consequence of how many statuses do you use) and You get what you need: '0100001'. Now if You read it from right to left You get exact encoding that You need.

Another example:
7 (dec) -> '111'(bin) -> '0000111' -> 1 for "Routine service", 1 for "Repair", 1 for "Recall campaign", 0 for "Tire change" and so on...

Greets,
Adam.

Thank you but I don't know how to convert numbers into binaries. Can you help?

Here You have function which already gives You binary digits in the right order. It's already reversed so now You need to add trailing zeros and split it to obtain separate zeros and ones.

int2bin <- function(int) {
  if (int > 1) {
    return(paste0(int %% 2, int2bin(as.integer(int / 2))))
  } else {
    return(as.character(int %% 2))
  }
}

# EXAMPLE:
# int2bin(33)
# result: "100001"

# Vectorised version of the functions
int2bin.vect <- Vectorize(int2bin)

# EXAMPLE:
# int2bin.vect(c(33, 7))
# result: c("100001", "111")

Thank you but now the difficult part. Converting all results into 7 character string...

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.