separate column into many columns reserving the column for a unique category and creating a binary matrix

Hello,

I have a data frame that looks something likes this:

person <- c('x','y','z','a','b')
col <- c('AF, FMCC CUS, FMCC DEAL, HYUN', NA, 'CHR C, AFG, FMCC CUS', NA, 'AF')
df<- data.frame(person,col)

Separation is by comma.So i wanted to split those where every category (AF, FMCC CUS,...etc) is a column that is a either a 1 or not. First step was the separation:

I used cSplit(df,'col',sep=',', stripWhite = TRUE)

ok! But FMCC CUS is present in the second column for person x but in the 3rd column col_3 for person a. How can I tell R that col_1 is exclusive for skill AF and col_2 is exclusive for skill FMCC CUS and so on till my last skill?

Also, when I get this, what is a good way to have each skill as a column and convert the data into a binary matrix where if the person has the skill it is a one. Otherwise, it is a zero.

I read about a function called model.matrix(). Any other suggestions would be great!

Thank you!

I didn't end up using the split columns.

I created many columns for each of the categories I have and used the following code:

df$AF <- ifelse(grepl('AF', df$col), 1,0)
df$FMCC_CUS <- ifelse(grepl('FMCC CUS', df$col), 1,0)
etc.. for each of the categories I have.

This is not efficient as I had to write 21 different lines of code. If anyone knows of a better way to achieve the end result of having a binary column for each of the categories I have in col, that would be great !

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.