 # Binomial Logistic Regression model

Hi,
I'm doing a project where I have the presence of a plant, and I would like to see if environmental conditions affect the presence of this plant. I have been doing a Binomial Logistic Regression model, and I have successfully been able to detect how individual conditions affect the plant (pH, turbidity and 12 heavy metal concentrations, one is shown as 'As' here). Is there any way I can analysis two or more independent variables in Rstudio?

This is the script I have been using up to now.
Many thanks,
Julie

names(dframe1)
summary(dframe1)

hist(dframe1\$presence, col="dark red")

pairs(dframe1, panel = panel.smooth)

#OUTLIERS
dotchart(dframe1\$presence,
xlab = "Values of the data",
ylab = "Order of the data")

#PLOT POTENTIAL RELATIONSHIPS
plot (dframe1\$presence ~ dframe1\$As)
abline (lm(dframe1\$presence ~ dframe1\$As), col = "red", lwd = 3)

###Build a model ####

model1 <- glm(presence ~ As,
family = binomial (link = logit),
data = dframe1)

AIC(model1)

###Model Validation ####

plot(model1)

devresid <- resid(model1, type = "deviance")
hist(devresid) # deviance residuals >2 may indicate a poorly fitting model

install.packages("arm")
library(arm)
predicted.values <- predict(model1)
residuals <- resid(model1, type = "deviance")
binnedplot(predicted.values, residuals)

###Model Selection ####
drop1(model1, test='Chi')

###Interpret the model####
summary (model1)
summary.lm (model1)

exp(coef(model1))

exp(confint(model1))

(model1\$null.deviance - model1\$deviance) / model1\$null.deviance

###Visualise the model ####

plot (dframe1\$presence ~ dframe1\$As,
ylab = "Probability of S. latifolium presence",
xlab = "As Concentration of rhyne water (mg/L)" ,
las=1, col = "blue")

summary(dframe1)

# Step 1: Making a table of prediction data (pdat)

pdat <- expand.grid(ph = seq(54.9,82,1))
pdat

# Step 2: Making a file containing the predicted data (pred)

pred <- predict (model1, newdata = pdat, type= "response", se.fit = TRUE)
pred

# into a final dataframe (predframe)

predframe <- data.frame (pdat, presence = pred\$fit, se = pred\$se.fit)
predframe

# Step 4: Add the fitted line

lines (predframe\$presence ~ predframe\$As, col="red", lwd = 2)
lines (predframe\$presence+predframe\$se ~ predframe\$As, col="red", lty = 2)
lines (predframe\$presence-predframe\$se ~ predframe\$As, col="red", lty = 2)

The formula syntax used to define your model allows complex definitions. In the simple case of including another variable, let's say lead concentration, you could write

``````model1 <- glm(presence ~ As + Pb,
family = binomial (link = logit),
data = dframe1)
``````

That assumes that dframe1 has a column named Pb. You can model both the individual effects of As and Pb and their interaction like this

``````model1 <- glm(presence ~ As * Pb,
family = binomial (link = logit),
data = dframe1)
``````

Is that the sort of think you are looking for?

1 Like

Yes, thank you that helps loads (I'm really new to this!)

I'm still having issues with 'visualise the model' at the end. How do I go about changing it to accommodate both into a graph, if it is even possible?

Many thanks,
Julie

It can be difficult to visualize the effects of multiple independent variables in a model. The fundamental point of interest is probably the actual vs the predicted values of your dependent variable. Since your result is binary, a confusion matrix showing the count, or fraction, of correct and erroneous predictions might be the best overall summary. I find graphs with a binary variable are often hard to read.

1 Like

Thank you for your reply, I think I get it a little better now.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.