Error in creating stats table of differences

Anyone able to give me any assistance with the following piece of code? I am just trying to create a stats table for the differences in scores in an exam but I keep getting the following error:

Error in summarise(min = min(diffs), max = max(diffs), mean = mean(diffs), :
object 'diffs' not found

My code is:

exam.data = read.csv(file="Exam.csv")

exam.data = exam.data %>%
mutate(diffs=Final-Exam2)

ggplot(data=exam.data, aes(y=diffs)) + geom_boxplot()

summarise(min = min(diffs), max = max(diffs),
            mean=mean(diffs),
            median=median(diffs),
            sd = sd(diffs), 
            q1 = quantile(diffs, probs = 0.25),
            q3 = quantile(diffs, probs = 0.75))

I don't get why it gives this error as the boxplot is generated ok. Any help or if someone could point me in the right direction would be greatly appreciated.

Summarize Should have a data argument. It seems that exam.data is missing. Is it on purpose?

1 Like

To elaborate (maybe unnecessarily! :sweat_smile:) on the same point @cderv made:

When you use dplyr “verbs” (mutate(), summarise(), etc) in a pipeline, the pipe operator (%>%) is doing the job of supplying the data argument for you (the pipe doesn’t know it’s the data argument, it just passes the previous thing in as the first argument to the next function, and these functions all have data as their first argument).

So this line:

exam.data %>%
  mutate(diffs = Final-Exam2)

…means the same thing as writing:

mutate(data = exam.data, diffs = Final-Exam2)

At the end of your code, you’re using summarise() on its own, without a data frame fed into it with a pipe, so you need to supply the data argument. You can do that like this:

summarise(data = exam.data,
  min = min(diffs), 
  max = max(diffs),
  mean = mean(diffs),
  median = median(diffs),
  sd = sd(diffs), 
  q1 = quantile(diffs, probs = 0.25),
  q3 = quantile(diffs, probs = 0.75)
)

Or, for the sake of consistency, you can start another pipeline:

exam.data %>%
  summarise(
    min = min(diffs), 
    max = max(diffs),
    mean = mean(diffs),
    median = median(diffs),
    sd = sd(diffs), 
    q1 = quantile(diffs, probs = 0.25),
    q3 = quantile(diffs, probs = 0.75)
  )

ggplot() has its own data argument, where you supplied exam.data, so that’s how it knows where to find diffs. The pipe-style way of calling that would have been:

exam.data %>% ggplot(aes(y = diffs)) + geom_boxplot()

Single-step pipelines aren’t very impressive — the benefits accrue when you start constructing pipelines with several steps chained together.

3 Likes

Thanks so much for the help. I was wrecking my brains over such a simple problem.

2 Likes

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.