To elaborate (maybe unnecessarily!
) on the same point @cderv made:
When you use dplyr “verbs” (mutate(), summarise(), etc) in a pipeline, the pipe operator (%>%) is doing the job of supplying the data argument for you (the pipe doesn’t know it’s the data argument, it just passes the previous thing in as the first argument to the next function, and these functions all have data as their first argument).
So this line:
exam.data %>%
mutate(diffs = Final-Exam2)
…means the same thing as writing:
mutate(data = exam.data, diffs = Final-Exam2)
At the end of your code, you’re using summarise() on its own, without a data frame fed into it with a pipe, so you need to supply the data argument. You can do that like this:
summarise(data = exam.data,
min = min(diffs),
max = max(diffs),
mean = mean(diffs),
median = median(diffs),
sd = sd(diffs),
q1 = quantile(diffs, probs = 0.25),
q3 = quantile(diffs, probs = 0.75)
)
Or, for the sake of consistency, you can start another pipeline:
exam.data %>%
summarise(
min = min(diffs),
max = max(diffs),
mean = mean(diffs),
median = median(diffs),
sd = sd(diffs),
q1 = quantile(diffs, probs = 0.25),
q3 = quantile(diffs, probs = 0.75)
)
ggplot() has its own data argument, where you supplied exam.data, so that’s how it knows where to find diffs. The pipe-style way of calling that would have been:
exam.data %>% ggplot(aes(y = diffs)) + geom_boxplot()
Single-step pipelines aren’t very impressive — the benefits accrue when you start constructing pipelines with several steps chained together.