Analysis of variance

I cannot figure out what I am doing wrong when I run this Analysis of Variance. It is only producing 6 out of the nine factors I need for TypeBeat. Any help is greatly appreciated.

EX1HW5 data:

TypeBeat Course.Time.In.Hours Score.A Score.B Score.C
upper.class 5 34.4 35.5 39.2
middle.class 10 30.2 32.4 34.7
inner.city 15 20.1 39.4 54.3

My R-code:
Course <- aov(Course.Time.In.Hours ~ TypeBeat+Score.A + Score.B+Score.C, data = EX1HW5)
Course

Course.2<-aov(Course.Time.In.Hours ~ TypeBeat*Score.A+Score.B+Score.C, data = EX1HW5)
Course.2

R-Output:
3 out of 6 effects not estimable
Estimated effects may be unbalanced

5 out of 8 effects not estimable
Estimated effects may be unbalanced

str(TypeBeat)
int [1:6] 1 2 3 1 2 3

What does EX1HW5 look like?

I do not understand why you posted the long message about reprex. However, I apologize for any inconvenience or annoyance (as that was not my intention).

Here are the answers to your questions.

  1. EX1HW5 is the same as the data included in the post (maybe I should have called it that rather than my data. I understand the confusion with the data name in the code).

  2. I thought I kept my codes and comments reasonably short, easy to read and copy.

  3. It is just the basic R package used (honestly, I thought that one was a given).

We can't run the code because we don't know what the data structure is. Read the article. It will make it more likely that someone will be able to give you an answer.

I did read the article. The data structure is what is up there. This data is all my professor provided for us. We are supposed to use it to run an Analysis of Variance and create an interaction plot. Unfortunately, he did not give us a dataset. I am so exhausted from trying to figure this out.

Here is a reproducible version of your data anyway:

EX1HW5 <- tibble::tribble(
       ~TypeBeat, ~Course.Time.In.Hours, ~Score.A, ~Score.B, ~Score.C,
   "upper.class",                    5L,     34.4,     35.5,     39.2,
  "middle.class",                   10L,     30.2,     32.4,     34.7,
    "inner.city",                   15L,     20.1,     39.4,     54.3
  )

Basically we need a good sized sample of your data and the artcle s suggests various ways to provide it. Since we probably only need the data a handy way to supply sample data is to use the dput() function. See ?dput. If you have a very large data set then something like head(dput(myfile), 100) will likely supply enough data for us to work with.

General Inquiry
Multicolinearity or would R specifically warn of this? I thought it did.

Thank you for all of your help, William. I do appreciated it. I apologize for not having enough information, but that was all I was given. Thanks again!

Normally you would have a larger dataset and you would be able to analyse the variance between the groups. It is pretty hard to do an anova with only one observation in each group (TypeBeat).

[quote="williaml, post:6, topic:117267"]

EX1HW5 <- tibble::tribble(
       ~TypeBeat, ~Course.Time.In.Hours, ~Score.A, ~Score.B, ~Score.C,
   "upper.class",                    5L,     34.4,     35.5,     39.2,
  "middle.class",                   10L,     30.2,     32.4,     34.7,
    "inner.city",                   15L,     20.1,     39.4,     54.3
  )
cor(EX1HW5[, 3:5])
         Score.A    Score.B    Score.C
Score.A  1.0000000 -0.7334173 -0.8724027
Score.B -0.7334173  1.0000000  0.9721028
Score.C -0.8724027  0.9721028  1.0000000

I'd go for a multicolinarity problem and not something @TJ37043 is doing wrong but I am not a statistician.

1 Like

This is the issue. With only one observation per level of TypeBeat, it is not possible to fit this model. R is automatically dropping some of the predictors because with them, the matrix is singular.

To address, you might simulate some fake data to fit the models on.

library(tidyverse, quietly = TRUE)
#> Warning: package 'ggplot2' was built under R version 4.0.5

EX1HW5 <- tibble::tribble(
  ~TypeBeat, ~Course.Time.In.Hours, ~Score.A, ~Score.B, ~Score.C,
  "upper.class",                    5L,     34.4,     35.5,     39.2,
  "middle.class",                   10L,     30.2,     32.4,     34.7,
  "inner.city",                   15L,     20.1,     39.4,     54.3
)

EX1HW5 <- map_dfr(
  list(EX1HW5) %>% rep(10),
  mutate_if,
  is.numeric,
  ~. + runif(length(.), min = -0.1, max = 0.1)  # add random number
) %>% 
  arrange(TypeBeat)

EX1HW5 
#> # A tibble: 30 x 5
#>    TypeBeat   Course.Time.In.Hours Score.A Score.B Score.C
#>    <chr>                     <dbl>   <dbl>   <dbl>   <dbl>
#>  1 inner.city                 14.9    20.1    39.4    54.3
#>  2 inner.city                 14.9    20.2    39.4    54.4
#>  3 inner.city                 15.0    20.0    39.5    54.3
#>  4 inner.city                 15.0    20.1    39.5    54.3
#>  5 inner.city                 15.0    20.1    39.3    54.3
#>  6 inner.city                 15.0    20.1    39.3    54.4
#>  7 inner.city                 15.0    20.0    39.5    54.2
#>  8 inner.city                 14.9    20.0    39.3    54.3
#>  9 inner.city                 15.0    20.0    39.5    54.4
#> 10 inner.city                 15.0    20.1    39.5    54.3
#> # ... with 20 more rows

Course <- aov(Course.Time.In.Hours ~ TypeBeat+Score.A + Score.B+Score.C, data = EX1HW5)
Course
#> Call:
#>    aov(formula = Course.Time.In.Hours ~ TypeBeat + Score.A + Score.B + 
#>     Score.C, data = EX1HW5)
#> 
#> Terms:
#>                 TypeBeat  Score.A  Score.B  Score.C Residuals
#> Sum of Squares  498.9288   0.0026   0.0062   0.0016    0.0781
#> Deg. of Freedom        2        1        1        1        24
#> 
#> Residual standard error: 0.05705951
#> Estimated effects may be unbalanced

Course.2<-aov(Course.Time.In.Hours ~ TypeBeat*Score.A+Score.B+Score.C, data = EX1HW5)
Course.2
#> Call:
#>    aov(formula = Course.Time.In.Hours ~ TypeBeat * Score.A + Score.B + 
#>     Score.C, data = EX1HW5)
#> 
#> Terms:
#>                 TypeBeat  Score.A  Score.B  Score.C TypeBeat:Score.A Residuals
#> Sum of Squares  498.9288   0.0026   0.0062   0.0016           0.0069    0.0713
#> Deg. of Freedom        2        1        1        1                2        22
#> 
#> Residual standard error: 0.05691599
#> Estimated effects may be unbalanced

Created on 2021-10-07 by the reprex package (v1.0.0)

1 Like

My best guess would be that the Scores are meant to be invidual observations and not a related set.

something like

EX1HW5 <- tibble::tribble(
  ~TypeBeat, ~Course.Time.In.Hours, ~Score.A, ~Score.B, ~Score.C,
  "upper.class",                    5L,     34.4,     35.5,     39.2,
  "middle.class",                   10L,     30.2,     32.4,     34.7,
  "inner.city",                   15L,     20.1,     39.4,     54.3
)


(ex1hw5_long <- EX1HW5 %>% pivot_longer(cols=starts_with("Score")))

(aov_result <- aov(Course.Time.In.Hours ~ TypeBeat + value,data = ex1hw5_long))

summary(aov_result)
coef(aov_result)

Its always dangerous playing data analysis on source data that is not understood, what the column/rows represent etc. it becomes a bit of a guessing game based on such assumptions.

2 Likes

What do you mean with this part of the post/question?

It is only producing six out of the nine factors I need for TypeBeat.

If what you show is all that you have then the data cannot be analyzed. The problem is that you have more variables than data. You are trying to estimate five variables using three observations. The second problem is that this is a mixed model with one categorical variable (typebeat) and three continuous variables (score.A, score.B, score.C). The latter issue is solvable, the former is a critical fail. Working out the degrees of freedom in the ANOVA may help you see part of the issue. That said, it is odd for an instructor to give students an unsolvable problem. I would reread the question, and I would double check to make sure that I had all the data correctly entered into R. I would check to make sure that you have the correct model. If everything check out, then I would contact the instructor (or TA) and ask for clarification.

An observation: The first model has no interaction terms. The model has four variables stated with the error term implied (five variables). I am not sure how R gets 6 effects.

2 Likes

You are correct. My instructor forgot to give us the complete dataset. This was brought to his attention by several students, including me. Thank you so much for your assistance!