Density plot using summary data frame

I have data from a bicycle rental service in NYC for a quarter. I have two user types - Customers and Subscribers and have created a variable called trip.type which can be "long", "avg" or "short". I created a summary data frame that has the counts for trip.type by user.type. However, we have far more subscribers than customers. So I would like to look at a density / percentage plot vs. just raw counts. How do I do this? Thank you so much. I am using ggplot2. Also, am a newbie, so sorry if this is too elementary a question. I have left in outliers and blanks for now for a reason. SO we have blanks on User Type and trip type. I figured i could subset in ggplot once I figure out how to do it in the first place. Inserting my summary data frame here.

Here is an example where I recoded your missing values as "Unknown" and then used functions from the plyr package to get what I think you want.

DF <- data.frame(User.Type = rep(c("Unknown", "Customer", "Subscriber"), each = 4),
                 trip.type = rep(c("Avg", "long", "Short", "Unknown"), 3),
                 num_trips = c(891, 3817, 31, 77, 4123, 20888, 1378, 388, 266299, 
                               225671, 203048, 734))
library(dplyr)

Summary <- DF %>% group_by(User.Type) %>% 
  summarize(Total = sum(num_trips))
Summary
#> # A tibble: 3 x 2
#>   User.Type   Total
#>   <fct>       <dbl>
#> 1 Customer    26777
#> 2 Subscriber 695752
#> 3 Unknown      4816
DF <- left_join(DF, Summary, by = "User.Type") %>% 
  mutate(Fraction = num_trips/Total)
DF
#>     User.Type trip.type num_trips  Total    Fraction
#> 1     Unknown       Avg       891   4816 0.185008306
#> 2     Unknown      long      3817   4816 0.792566445
#> 3     Unknown     Short        31   4816 0.006436877
#> 4     Unknown   Unknown        77   4816 0.015988372
#> 5    Customer       Avg      4123  26777 0.153975427
#> 6    Customer      long     20888  26777 0.780072450
#> 7    Customer     Short      1378  26777 0.051462076
#> 8    Customer   Unknown       388  26777 0.014490047
#> 9  Subscriber       Avg    266299 695752 0.382749888
#> 10 Subscriber      long    225671 695752 0.324355517
#> 11 Subscriber     Short    203048 695752 0.291839621
#> 12 Subscriber   Unknown       734 695752 0.001054974

Created on 2020-04-21 by the reprex package (v0.3.0)

In the future, please provide a reproducible example so people do not have to manually reproduce your data.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

Thank you so much. I will definitely provide sample reproducible data next time. Much appreciated.