For an introduction to probability, I am experimenting with using dplyr (well, tidyverse) to connect programming concepts to the idea of conditional probability. In my code below, I am using
mutate to store numbers that I need later (simply the “numerator” and the “denominator”). My query is this: does anyone have a cleaner way of doing this calculation?
Example: Compute the probability that a randomly selected passenger on the Titanic was female given that the passenger was at least 35 years old.
library("tidyverse") #for data wrangling tools library("titanic") tdf <- titanic_train #training set of Titanic data conditional_probability <- tdf %>% filter(Age >= 35) %>% mutate(denominator = n()) %>% filter(Sex == "female") %>% mutate(numerator = n()) %>% summarize(unique(numerator/denominator))