Ignoring categorical variables for standard error with predictions for model

I am trying to follow this post by Ariel Muldoon, but with some slight alterations.

I'd like to incorporate some categorical data into the predicted output, but I'm not sure how I can get R to ignore the categorical data to create the standard error for plotting. I would like to be able to incorporate these factors into the model, as these are important for explaining a lot of the variation, but I'm not good enough at R to know how to do it properly. Any help would be really appreciated! I tried to add a repress but I'm not quite sure if it worked. The y2 data at the bottom is a small example of the data I'm working with as the actual file I can't share.

require(tidyverse)
require(purrr)
require(broom)

fit1 = glm(presence~1+moonlight+diel+tide+depth+temperature+yearday+sst, data=y2, family=binomial(link=logit))

#loop through explanatory variables

( mod_vars = all.vars( formula(fit1) )[-1] )

#create prediction dataset

preddat_fun = function(data, allvars, var, diel, tide) {
sums = summarise_at(data,
vars( one_of(allvars), -one_of(var) ),
~ ifelse(is.factor(.), levels(.)[1] , median(.)))
cbind( select_at(data, var), sums, y2$diel, y2$tide)
}

head( preddat_fun(y2, mod_vars, "moonlight") )

pred_dats = mod_vars %>%
set_names() %>%
map( ~preddat_fun(y2, mod_vars, .x) )

preds = pred_dats %>%
map(~augment(fit1, newdata = .x) ) %>%
map(~mutate(.x,
lower = exp(.fitted - 2*.se.fit),
upper = exp(.fitted + 2*.se.fit),
pred = exp(.fitted) ) )

str(pred_dats)

str(preds$temperature)

plot(preds$moonlight~preds$pred)

ggplot(data = preds$moonlight, aes(x = moonlight, y = pred) ) +
geom_line(size = 1) +
geom_ribbon(aes(ymin = lower, ymax = upper), alpha = .25) +
geom_rug(sides = "b") +
labs(x = "Moonlight",
y = "Presence")+
ylim(0, 1)

xlabs = c("moonlight", "depth", "temperature", "yearday", "sst")

pred_plot = function(data, variable, xlab) {
ggplot(data, aes_string(x = variable, y = "pred") ) +
geom_line(size = 1) +
geom_ribbon(aes(ymin = lower, ymax = upper), alpha = .25) +
geom_rug(sides = "b") +
labs(y = "Presence",
x = xlab) +
ylim(0, 1)
}

all_plots = pmap( list(preds, mod_vars, xlabs), pred_plot)

plot_grid(plotlist = all_plots,
labels = "AUTO",
align = "hv")

y2 <– structure(list(depth = c(49.5, 49.5, 102, 39.5, 43, 15, 39.5,
43, 44.5, 39.5, 139, 44.5), temperature = c(26.2, 25.3, 24.2,
26.5, 23.2, 26.5, 26.5, 24.2, 26.5, 26.9, 26.2, 29.5), sst = c(27.54999924,
28.54999924, 28.54999924, 25.54999924, 28.54999924, 28.54999924,
28.54999924, 28.54999924, 28.54999924, 29.54999924, 28.54999924,
28.54999924), diel = structure(c(1L, 2L, 1L, 2L, 1L, 2L, 1L,
1L, 1L, 1L, 1L, 1L), .Label = c("Day", "Night"), class = "factor"),
tide = structure(c(2L, 4L, 2L, 1L, 2L, 2L, 3L, 2L, 2L, 4L,
2L, 2L), .Label = c("Ebb", "Falling", "Flood", "Rising"), class = "factor"),
yearday = c(79L, 79L, 79L, 81L, 79L, 79L, 84L, 79L, 79L,
91L, 79L, 103L), presence = c(1L, 0L, 1L, 0L, 1L, 1L, 0L,
0L, 0L, 1L, 1L, 1L), moonlight = c(0.046850507, 0.946850507,
0.046850507, 0.946850507, 0.046850507, 0.046850507, 0.946850507,
0.946850507, 0.046850507, 0.046850507, 0.946850507, 0.046850507
), code = c(54921L, 54921L, 19505L, 54921L, 54921L, 44223L,
54921L, 54921L, 54921L, 44223L, 54921L, 54921L)), class = "data.frame", row.names = c(NA,
-12L))

You have a good start at a reproducible example here, which is great. To make it easier for folks to tell what you want and where the problem is arising, I'd recommend making it a little more minimal. Given you can share a little part of your data :+1: , try taking that and building a small model with a couple of variables in it. That way you can run a reprex with your actual example dataset and folks can see where you are having trouble.

Looking at y2 I'd say maybe try making a model with diel and depth as explanatory variables. That way you have one continuous and one categorical variable to work with.

I actually just re-ran it and for some reason this time it worked! But I'm not quite sure how/why!

Oh, good! I thought your approach here was looking good.

One potential problem in the plots is that you are exponentiating to odds but plotting on the data (probability) scale. You can use plogis() instead of exp() to inverse-link your predictions back to the probability scale.

Thanks for the tip! And thank you so much for all your help, my supervisor is on holiday during the last few days before I need to hand my masters project in, your blog and advice has been the most useful thing I've come across during my project! I'm really grateful.

1 Like

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.