Need to compute Graph of Averages

Can't get out of my own way here.

I'm trying to describe what regression is by using (what I always called) the Graph of Averages: find the average of y-value for each unique x-value, and connect those points.

No tidyverse allowed, unfortunately.

This seems like a task for aggregate(). But let me get a reprex going.

Here's the data table you need:

# Import Reaction data set
theURL <- "http://lib.stat.cmu.edu/datasets/Andrews/T30.1"
theNames <- c("Table", "Number", "Row", "Experiment", "Temperature", "Concentration", "Time", "Unchanged", "Converted", "Unwanted")
Reaction <- read.table(theURL, header = F , col.names = theNames)
View(Reaction)

# Remove the first four useless columns
Reaction <- Reaction[-c(1:4)]

I'm looking at the bivariate relationship Temperature v Converted. I can get unique values of Temperature...

> unique(Reaction$Temperature)
[1] 162 172 167 177 157 160

...which I think should be my by variable. So if I aggregate thusly:
aggregate(Reaction$Converted, by=unique(Reaction$Temperature), mean)

...but am told that the by-variable must be a list. No problem!

> aggregate(Reaction$Converted, as.list(unique(Reaction$Concentration)), mean)
Error in aggregate.data.frame(as.data.frame(x), ...) : 
  arguments must have same length

I haven't the foggiest what that's trying to tell me, but the list looks quite complicated:

> as.list(unique(Reaction$Concentration))
[[1]]
[1] 23

[[2]]
[1] 30

[[3]]
[1] 25

[[4]]
[1] 27.5

[[5]]
[1] 32.5

[[6]]
[1] 22.5

[[7]]
[1] 20

[[8]]
[1] 34

So I try converting to a factor instead, which does no better.

> aggregate(Reaction$Converted, as.factor(unique(Reaction$Concentration)), mean)
Error in aggregate.data.frame(as.data.frame(x), ...) : 
  arguments must have same length

I'm not sure why I thought a factor would be a better idea:

> as.factor(unique(Reaction$Concentration))
[1] 23   30   25   27.5 32.5 22.5 20   34  
Levels: 20 22.5 23 25 27.5 30 32.5 34

So now I'm losing the will to live. Sure could use some help.

I'm not sure if I understand you correctly but I think this is what you want

theURL <- "http://lib.stat.cmu.edu/datasets/Andrews/T30.1"
theNames <- c("Table", "Number", "Row", "Experiment", "Temperature", "Concentration", "Time", "Unchanged", "Converted", "Unwanted")
Reaction <- read.table(theURL, header = F , col.names = theNames)
Reaction <- Reaction[-c(1:4)]
aggregate(Converted ~ Temperature, data = Reaction, mean)
#>   Temperature Converted
#> 1         157    46.900
#> 2         160    60.300
#> 3         162    53.875
#> 4         167    55.680
#> 5         172    57.400
#> 6         177    59.800

Created on 2019-03-07 by the reprex package (v0.2.1)

1 Like

That's exactly what I want. Thanks!

Where did I go wrong in explaining what I wanted? I never seem to hit the Goldilocks zone with stating what I want and what I've tried.

To demystify the use of by in aggregate()

# Import Reaction data set
theURL <- "http://lib.stat.cmu.edu/datasets/Andrews/T30.1"
theNames <- c("Table", "Number", "Row", "Experiment", "Temperature", "Concentration", "Time", "Unchanged", "Converted", "Unwanted")
Reaction <- read.table(theURL, header = F , col.names = theNames)
View(Reaction)

# Remove the first four useless columns
Reaction <- Reaction[-c(1:4)]

aggregate(Reaction, by = list(Reaction$Temperature), FUN = mean)
#>   Group.1 Temperature Concentration Time Unchanged Converted Unwanted
#> 1     157         157          27.5  6.5    37.600    46.900 14.70000
#> 2     160         160          34.0  7.5    17.750    60.300 20.70000
#> 3     162         162          26.5  6.0    31.175    53.875 12.77500
#> 4     167         167          27.5  6.5    19.700    55.680 22.38000
#> 5     172         172          27.5  6.5    12.850    57.400 25.32500
#> 6     177         177          22.5  6.5    11.900    59.800 24.83333

Thank you, kindly.

I guess I would have hit it myself had I tried list() instead if as.list().

No, I'm oversimplifying this. How did you know that list() would work for the by-variable? All the help I read seemed to say you needed a list of unique values, which lead me to unique(), then to as.list(), etc. Used alone list() just gives all the values of a column, including repeats, in a list. I'm not sure why you don't need unique() around that.

You did nothing wrong per se, I'm not a native English speaker and some times I don't trust my mental translations, but maybe you want to keep your vocabulary as simple as possible to maximize your chances of getting help.

Well said, and good advice.

Actually it says that you need a list of values of the same length, they don't need to be unique.

...but of the same length as what?

I know the answer, but it's omissions like that which make beginners go crazy.

The same length of the grouped variable i. e. X
I agree it's a little bit confusing that is why I prefer the formula method.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.