# Shapley value regression for brand driver analysis

Hi all,

Trying my hands on shapley regression model for brand driver importance. I have a quick question (pretty basic might be as well). I have multi-coded dependent variable. So the dependent variable question is " which brands would be you consider buying?" and then the answer could be X or X,Y or X,Y,Z (X.Y, Z are the names of brands). How shall I transform this one? Shall I take dummy categorical variable like X=1, Y=2, Z=3 or just consider the first preference among the ones mentioned by the customers? Can anyone help? Many thanks.

Is there any limit to how many brands can be in the list?
I'd probably one hot encode it

Thanks, Yeah..there is a limit which is 3 at max

``````library(tidyverse)

( ex1 <- tibble(
id = 1:7,
brandsliked =c("x,y,z","x,y","z","y,z","a","a,x","b,y")))
# # A tibble: 7 x 2
# id brandsliked
# <int> <chr>
#   1    x,y,z
#   2    x,y
#   3    z
#   4    y,z
#   5    a
#   6    a,x
#   7    b,y

(ex2 <- ex1 %>% separate(col = brandsliked,into=paste0("brnd",1:3),sep = ","))

(ex3 <- pivot_longer(ex2,
cols=c(brnd1:brnd3)))

library(caret)
ex4 <- dummyVars(" ~ value + id", data = ex3)
(ex5 <- data.frame(predict(ex4, newdata = ex3)))

(ex6 <- group_by(ex5,
id) %>% summarise(across(starts_with("value"),max,na.rm=TRUE)))
# # A tibble: 7 x 6
# id   valuea valueb valuex valuey valuez
# <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>
#   1      0      0      1      1      1
#   2      0      0      1      1      0
#   3      0      0      0      0      1
#   4      0      0      0      1      1
#   5      1      0      0      0      0
#   6      1      0      1      0      0
#   7      0      1      0      1      0``````

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.