Density plot using summary data frame

I have data from a bicycle rental service in NYC for a quarter. I have two user types - Customers and Subscribers and have created a variable called trip.type which can be "long", "avg" or "short". I created a summary data frame that has the counts for trip.type by user.type. However, we have far more subscribers than customers. So I would like to look at a density / percentage plot vs. just raw counts. How do I do this? Thank you so much. I am using ggplot2. Also, am a newbie, so sorry if this is too elementary a question. I have left in outliers and blanks for now for a reason. SO we have blanks on User Type and trip type. I figured i could subset in ggplot once I figure out how to do it in the first place. Inserting my summary data frame here.

Here is an example where I recoded your missing values as "Unknown" and then used functions from the plyr package to get what I think you want.

DF <- data.frame(User.Type = rep(c("Unknown", "Customer", "Subscriber"), each = 4),
                 trip.type = rep(c("Avg", "long", "Short", "Unknown"), 3),
                 num_trips = c(891, 3817, 31, 77, 4123, 20888, 1378, 388, 266299, 
                               225671, 203048, 734))
library(dplyr)

Summary <- DF %>% group_by(User.Type) %>% 
  summarize(Total = sum(num_trips))
Summary
#> # A tibble: 3 x 2
#>   User.Type   Total
#>   <fct>       <dbl>
#> 1 Customer    26777
#> 2 Subscriber 695752
#> 3 Unknown      4816
DF <- left_join(DF, Summary, by = "User.Type") %>% 
  mutate(Fraction = num_trips/Total)
DF
#>     User.Type trip.type num_trips  Total    Fraction
#> 1     Unknown       Avg       891   4816 0.185008306
#> 2     Unknown      long      3817   4816 0.792566445
#> 3     Unknown     Short        31   4816 0.006436877
#> 4     Unknown   Unknown        77   4816 0.015988372
#> 5    Customer       Avg      4123  26777 0.153975427
#> 6    Customer      long     20888  26777 0.780072450
#> 7    Customer     Short      1378  26777 0.051462076
#> 8    Customer   Unknown       388  26777 0.014490047
#> 9  Subscriber       Avg    266299 695752 0.382749888
#> 10 Subscriber      long    225671 695752 0.324355517
#> 11 Subscriber     Short    203048 695752 0.291839621
#> 12 Subscriber   Unknown       734 695752 0.001054974

Created on 2020-04-21 by the reprex package (v0.3.0)

In the future, please provide a reproducible example so people do not have to manually reproduce your data.

Thank you so much. I will definitely provide sample reproducible data next time. Much appreciated.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.