I have a couple of things that need the community's input.
I want to create a plot to represent this information: given a table of thousands of distinct customers, I have all of the information of their purchases. Now, my plot will show me: among all of these customers, I wanna classify them into different "Private Label Purchase percentage" ranges.
For instance, given 100 distinct customers. Customer 1 purchases 100% private label (i.e 0% in other two categories), customer 2 purchases only 10% PL, 60% handpieces and 30% Sundry, etc.
then my plot will summarize these percentage just based on Private Label.
Out of 100 customers, how many of them purchase between 100% to say 80% Private Label out of their total purchases?
Here is a capture of my output.
- First thing first, I want to create another column which is similar to this format:
PrivateLabel counts / Rows_sum
-> I failed.
> df2 = data %>% group_by(CUSTOMER_NAME,PRODUCT_SUB_LINE_DESCR) %>% summarise(count=n()) %>% ungroup() %>% spread(PRODUCT_SUB_LINE_DESCR,count,fill=0) %>% mutate(Row_sum = rowSums(.[3:4]), Percentage = df2[4,] / Row_sum) Error in mutate_impl(.data, dots) : Column `Percentage` is of unsupported class data.frame In addition: Warning message: In Ops.factor(left, right) : ‘/’ not meaningful for factors
I need help to explain some syntax in the code
mutate(Row_sum = rowSums(.[3:4]), ...)
Does it mean sum every row with the columns from 3rd column to the 4th column?
I could be wrong but the output makes sense.
- When I am done finishing this up, we can move on to the next step, which is visualization.