# Linear regression with simple categorical data

As part of my 3rd year university dissertation project i am studying whether sheep grazing has a effect on bird species diversity. This involved comparing the number of species found at 3 different habitat types and resulted in the following data (Habitats= habitat type, Totalsp= Total number of species found at the habitat):

``````     Habitats                                 Totalsp
``````

Grazed grassland 9
Non-grazed grassland 18
Woodland 15

I am trying to carry out a regression on the data to determine if there was a significant difference in the number of species found at each habitat type, with no luck so far. This is the code i have used to try to do this:

fit <- lm(Totalsp ~ Habitats, data=Diss_origdataR)
summary(fit)

and this was the output:

Call:
lm(formula = Totalsp ~ Habitats, data = Diss_origdataR)

Residuals:
ALL 3 residuals are 0: no residual degrees of freedom!

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 9 NaN NaN NaN
HabitatsNon-grazed grassland 9 NaN NaN NaN
HabitatsWoodland 6 NaN NaN NaN

Residual standard error: NaN on 0 degrees of freedom
Multiple R-squared: 1, Adjusted R-squared: NaN
F-statistic: NaN on 2 and 0 DF, p-value: NA

Does anyone have any advice on how i might fix this?

Your problem is that you are fitting three coefficients with only three data points. You get a perfect fit just mechanically. So you can't do a statistical estimate.

The only solution is to get more data.

A Chi-sq goodness of fit test :

``````# Observed frequencies
observed <- c(9, 18, 15)

# Expected frequencies
expected <- rep(42 * 1/3, 3)

# Perform the Chi-Square Goodness of Fit Test
test <- chisq.test(observed, p = expected/sum(expected))

# Print the test result
print(test)
# Chi-squared test for given probabilities
#
# data:  observed
# X-squared = 3, df = 2, p-value = 0.2231
``````

only weak evidence that observed differs from expected; *assuming that you expect 1/3 of the counts to be in each habitat

@nirgrahamuk, I have to disagree with you about this one. The OP has three kinds of habitats and one observation for each habitat. All the regression is doing is feeding back the observed numbers. There is nothing statistical to say.

I didnt suggest a regression, i agree lm is not going to be pracitable here

I'll accept that. Testing if the probabilities are the same does make sense.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.