binomial glm with bootstrap and weights

Hi, everyone,
I'm trying to test some models with my dataset to see if I find any correlation between the variables and the response. I ran 5 different models and the best model was 5.a, but the residual plots were not really satisfying


(please see the two boxplots attached) because I have different numbers of samples for different classes (at least that's what I thought that could be the responsible for this difference). You can find the best model here:

mod_Bd5a <- glm(Bd_pre_bi ~ Tourism + poly(Altitude,degree=2) + Livestock, data = data2, family=binomial())

So I looked for other options to try to make it better and the residual plots more balanced and I was suggested to bootstrap the analysis and add weights to the classes, due to the different amount of samples. I used this question here to test my data: https://stackoverflow.com/questions/46231261/bootstrap-weighted-mean-in-r
and copied the first command:

samplewmean <- function(d, i, j) {
    d <- d[i, ]
    w <- j[i, ]
    return(weighted.mean(d, w))   
  }

followed by:

results_qsec <- boot(data= data2 [, 7, drop = FALSE], 
                     statistic = samplewmean, 
                     R=10000, 
                     j = data2 [, 6 , drop = FALSE])

But I get the error message: Error in weighted.mean.default(d, w) : (list) object cannot be coerced to type 'double'

I already try to solve this problem using the solution suggested here - https://stackoverflow.com/questions/12384071/how-to-coerce-a-list-object-to-type-double - but then I get another error message:

as.numeric(unlist(samplewmean))
Error in as.numeric(unlist(samplewmean)) :
cannot coerce type 'closure' to vector of type 'double'

lapply(samplewmean, as.numeric)
Error in FUN(X[[i]], ...) :
cannot coerce type 'symbol' to vector of type 'double'

I don't know if it is clear to understand, I tried to put all the info I could remember.
Could someone help me with these error messages or with a better way to fit the data?
Thanks a lot,
Adriana

Welcome @Cravo_Adri!

It could be helpful to have a full reprex to better understand the problem -- especially as it relates to the data.

I attempted your code on the mtcars data frame, and it works fine for me. This leads me to guess (although I cannot confirm) that there may be an issue with the column types for your 6th and 7th columns (wt and qsec are each numeric/double columns in mtcars) -- also, are the columns you want to use the 6th and 7th columns in your data?

1 Like

This could be easily done using rsample but TBH I can't tell what you are trying to do. I can't tell how the glm analysis is related to the weighted mean.

Here's some code for the weighted mean:

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(rsample)
#> Loading required package: tidyr
library(purrr)
library(broom)
library(ggplot2)
theme_set(theme_bw())

# see https://tidymodels.github.io/rsample/articles/Applications/Intervals.html

iris_wts <- 
  iris %>% 
  mutate(
    # make up some weights
    weights = as.integer(Species),
    weights = weights/sum(weights)
    )

stat_fun <- function(split, ...) {
  bt_samp <- analysis(split)
  wmn <- weighted.mean(bt_samp$Sepal.Length, bt_samp$weights)
  tibble(term = "weighted mean", estimate = wmn)
}

set.seed(6325)
bt <- 
  bootstraps(iris_wts, times = 2000, apparent = TRUE) %>% 
  mutate(means = map(splits, stat_fun))

bt %>% 
  select(means) %>% 
  unnest() %>% 
  ggplot(aes(x = estimate))  +
  geom_histogram(col = "white", bins = 30)


int_pctl(bt, means)
#> # A tibble: 1 x 6
#>   term          .lower .estimate .upper .alpha .method   
#>   <chr>          <dbl>     <dbl>  <dbl>  <dbl> <chr>     
#> 1 weighted mean   5.97      6.10   6.23   0.05 percentile

Created on 2019-09-06 by the reprex package (v0.2.1)