Context: I am trying to run bootstrapped tests (following some online instructions) and I am not sure how to fix my code. Would anyone be able to assist on next steps?
Data: example subset
index <- c(0.0000, 6.2500, 0.000, 12.5000, 0.000, 5.8224)
pol_or <- c("left", "left", "left", "right", "right", "left")
df_topics_LR6 <- data.frame(index, pol_or)
Code:
library(boot)
med.diff <- function(d, i) { # med.diff = calculates diff in medians; input 2 arguments: one for data (d) and one to index data (i)
temp <- df_topics_LR6[i,] # take data and resample it according to randomly selected row numbers (i)
median(tmp$index[tmp$pol_or=="right"]) - # return diff in medians for resampled data
median(tmp$index[tmp$pol_or=="left"])
}
boot.out <- boot(data = df_topics_LR, statistic = med.diff, R = 1000) # use boot functon to resample data 1k times, taking diff in medians each time and save results into object = boot.out
Error in console:
Error in median(tmp$index[tmp$pol_or == "right"]) :
object 'tmp' not found
Instructions: The Wilcoxon Rank Sum Test | University of Virginia Library Research Data Services + Sciences
If we’re explicitly interested in the difference in medians between the two populations, we could try a bootstrap approach using the boot package. The idea is to resample the data (with replacement) many times, say 1000 times, each time taking a difference in medians. We then take the median of those 1000 differences to estimate the difference in medians. We can then find a confidence interval based on our 1000 differences. An easy way is to use the 2.5th and 97.5th percentiles as the upper and lower bounds of a 95% confidence interval.
Here is one way to carry this out in R.
First we load the boot package, which comes with R, and create a function called med.diff to calculate the difference in medians. In order to work with the boot package’s boot function, our function needs two arguments: one for the data and one to index the data. We have arbitrarily named these arguments d and i. The boot function will take our data, d, and resample it according to randomly selected row numbers, i. It will then return the difference in medians for the resampled data.
library(boot)
med.diff <- function(d, i) {
- tmp <- d[i,] *
- median(tmp$weight[tmp$company=="A"]) - *
-
median(tmp$weight[tmp$company=="B"])*
- }*
Now we use the boot function to resample our data 1000 times, taking a difference in medians each time, and saving the results into an object called boot.out.
boot.out <- boot(data = dat, statistic = med.diff, R = 1000)
The boot.out object is a list object. The element named “t” contains the 1000 differences in medians. Taking the median of those values gives us a point estimate of the estimated difference in medians. Below we get -5.05, but you will likely get something different.
median(boot.out$t)
[1] -5.05