Binomial Logistic Regression model

Hi,
I'm doing a project where I have the presence of a plant, and I would like to see if environmental conditions affect the presence of this plant. I have been doing a Binomial Logistic Regression model, and I have successfully been able to detect how individual conditions affect the plant (pH, turbidity and 12 heavy metal concentrations, one is shown as 'As' here). Is there any way I can analysis two or more independent variables in Rstudio?

This is the script I have been using up to now.
Many thanks,
Julie

names(dframe1)
summary(dframe1)

hist(dframe1$presence, col="dark red")

pairs(dframe1, panel = panel.smooth)

#OUTLIERS
dotchart(dframe1$presence,
xlab = "Values of the data",
ylab = "Order of the data")

#PLOT POTENTIAL RELATIONSHIPS
plot (dframe1$presence ~ dframe1$As)
abline (lm(dframe1$presence ~ dframe1$As), col = "red", lwd = 3)

###Build a model ####

model1 <- glm(presence ~ As,
family = binomial (link = logit),
data = dframe1)

AIC(model1)

###Model Validation ####

plot(model1)

devresid <- resid(model1, type = "deviance")
hist(devresid) # deviance residuals >2 may indicate a poorly fitting model

install.packages("arm")
library(arm)
predicted.values <- predict(model1)
residuals <- resid(model1, type = "deviance")
binnedplot(predicted.values, residuals)

###Model Selection ####
drop1(model1, test='Chi')

###Interpret the model####
summary (model1)
summary.lm (model1)

exp(coef(model1))

exp(confint(model1))

(model1$null.deviance - model1$deviance) / model1$null.deviance

###Visualise the model ####

plot (dframe1$presence ~ dframe1$As,
ylab = "Probability of S. latifolium presence",
xlab = "As Concentration of rhyne water (mg/L)" ,
las=1, col = "blue")

summary(dframe1)

Step 1: Making a table of prediction data (pdat)

pdat <- expand.grid(ph = seq(54.9,82,1))
pdat

Step 2: Making a file containing the predicted data (pred)

pred <- predict (model1, newdata = pdat, type= "response", se.fit = TRUE)
pred

Step 3: combine the predictions with the predictors,

into a final dataframe (predframe)

predframe <- data.frame (pdat, presence = pred$fit, se = pred$se.fit)
predframe

Step 4: Add the fitted line

lines (predframe$presence ~ predframe$As, col="red", lwd = 2)
lines (predframe$presence+predframe$se ~ predframe$As, col="red", lty = 2)
lines (predframe$presence-predframe$se ~ predframe$As, col="red", lty = 2)

The formula syntax used to define your model allows complex definitions. In the simple case of including another variable, let's say lead concentration, you could write

model1 <- glm(presence ~ As + Pb,
family = binomial (link = logit),
data = dframe1)

That assumes that dframe1 has a column named Pb. You can model both the individual effects of As and Pb and their interaction like this

model1 <- glm(presence ~ As * Pb,
family = binomial (link = logit),
data = dframe1)

Is that the sort of think you are looking for?

1 Like

Yes, thank you that helps loads (I'm really new to this!)

I'm still having issues with 'visualise the model' at the end. How do I go about changing it to accommodate both into a graph, if it is even possible?

Many thanks,
Julie

It can be difficult to visualize the effects of multiple independent variables in a model. The fundamental point of interest is probably the actual vs the predicted values of your dependent variable. Since your result is binary, a confusion matrix showing the count, or fraction, of correct and erroneous predictions might be the best overall summary. I find graphs with a binary variable are often hard to read.

1 Like

Thank you for your reply, I think I get it a little better now.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.