I have a data set where each row is a transaction with a unique customer ID number. There are four separate status columns on each row. I am trying to figure out how to report out the unique status values that occur for each customer ID.
I used the unite() function to combine the four status columns into one. This returns a single long string containing all statuses present on each row. I need each separate status as a separate element in a vector so that I can get the unique values. I used strsplit to try and break up the long character string into multiples, my problem is that it's returning a list instead of a series of vectors. I tried to use unlist, but it throws an error message. It looks like the way I am trying to use it, it wants to combine the entire list into a single vector.
I want to unlist by row, then I'd have a vector of statuses for each customer number and then I could unduplicate the statuses by customer number. Reprex below.
library(tidyr)
customer <- rep(1:5, times=4)
order.num <- 1:20
stat.cols <- paste("status", 1:4, sep = "")
mydf <- data.frame(customer, order.num)
mydf[stat.cols] <- NA
mydf$status1 <- sample(paste(LETTERS[1:4], 1:4, sep = ""),20, replace=TRUE, prob=c(0.10, 0.20, 0.65, 0.05) )
mydf$status2 <- sample(paste(LETTERS[1:4], 1:4, sep = ""),20, replace=TRUE, prob=c(0.10, 0.20, 0.65, 0.05) )
mydf$status3 <- sample(paste(LETTERS[1:4], 1:4, sep = ""),20, replace=TRUE, prob=c(0.10, 0.20, 0.65, 0.05) )
mydf$status4 <- sample(paste(LETTERS[1:4], 1:4, sep = ""),20, replace=TRUE, prob=c(0.10, 0.20, 0.65, 0.05) )
mydf <- unite(mydf, col="status.all", sep=",", c(status1, status2, status3, status4))
mydf$status.all <- strsplit(mydf$status.all, ",")