how to create/debug function in r

Context: I am trying to run bootstrapped tests (following some online instructions) and I am not sure how to fix my code. Would anyone be able to assist on next steps?

Data: example subset

index <- c(0.0000, 6.2500, 0.000, 12.5000, 0.000, 5.8224)
pol_or <- c("left", "left", "left", "right", "right", "left")
df_topics_LR6 <- data.frame(index, pol_or)

Code:

library(boot)
med.diff <- function(d, i) {             # med.diff = calculates diff in medians; input 2 arguments: one for data (d) and one to index data (i)
  temp <- df_topics_LR6[i,]                # take data and resample it according to randomly selected row numbers (i)
  median(tmp$index[tmp$pol_or=="right"]) - #  return diff in medians for resampled data
    median(tmp$index[tmp$pol_or=="left"])
}

boot.out <- boot(data = df_topics_LR, statistic = med.diff, R = 1000) # use boot functon to resample data 1k times, taking diff in medians each time and save results into object = boot.out

Error in console:

Error in median(tmp$index[tmp$pol_or == "right"]) :
object 'tmp' not found

Instructions: The Wilcoxon Rank Sum Test | University of Virginia Library Research Data Services + Sciences

If we’re explicitly interested in the difference in medians between the two populations, we could try a bootstrap approach using the boot package. The idea is to resample the data (with replacement) many times, say 1000 times, each time taking a difference in medians. We then take the median of those 1000 differences to estimate the difference in medians. We can then find a confidence interval based on our 1000 differences. An easy way is to use the 2.5th and 97.5th percentiles as the upper and lower bounds of a 95% confidence interval.

Here is one way to carry this out in R.

First we load the boot package, which comes with R, and create a function called med.diff to calculate the difference in medians. In order to work with the boot package’s boot function, our function needs two arguments: one for the data and one to index the data. We have arbitrarily named these arguments d and i. The boot function will take our data, d, and resample it according to randomly selected row numbers, i. It will then return the difference in medians for the resampled data.

library(boot)
med.diff <- function(d, i) {

  • tmp <- d[i,] *
  • median(tmp$weight[tmp$company=="A"]) - *
  • median(tmp$weight[tmp$company=="B"])*
    
  • }*

Now we use the boot function to resample our data 1000 times, taking a difference in medians each time, and saving the results into an object called boot.out.

boot.out <- boot(data = dat, statistic = med.diff, R = 1000)

The boot.out object is a list object. The element named “t” contains the 1000 differences in medians. Taking the median of those values gives us a point estimate of the estimated difference in medians. Below we get -5.05, but you will likely get something different.

median(boot.out$t)
[1] -5.05

Notice that you store a result in temp and then you refer to an object named tmp.

med.diff <- function(d, i) {             # med.diff = calculates diff in medians; input 2 arguments: one for data (d) and one to index data (i)
  temp <- df_topics_LR6[i,]                # take data and resample it according to randomly selected row numbers (i)
  median(tmp$index[tmp$pol_or=="right"]) - #  return diff in medians for resampled data
    median(tmp$index[tmp$pol_or=="left"])
}
1 Like

Do you mean ?


median(temp$index[temp$pol_or=="right"])

1 Like

Thanks, that was one of the issues, but I have corrected the code and I don't see why I am getting this error. I suspected it is my lack of experience creating a function:


#### Bootstrapped differences ####

library(boot)
med.diff <- function(d, i) {             # med.diff = calculates diff in medians; input 2 arguments: one for data (d) and one to index data (i)
  tmp <- d[i,]                # take data and resample it according to randomly selected row numbers (i)
  median(tmp$left) - #  return diff in medians for resampled data
  median(tmp$right)
}

boot.out <- boot(data = df_topic6_wide, statistic = med.diff, R = 1000) # use boot functon to resample data 1k times, taking diff in medians each time and save results into object = boot.out
boot.out
> boot.out

ORDINARY NONPARAMETRIC BOOTSTRAP


Call:
boot(data = df_topic6_wide, statistic = med.diff, R = 1000)


Bootstrap Statistics :
WARNING: All values of t1* are NA

The dataset df_topic6_wide looks like this (etc):

left <- c(0.000, 6.2, 0.000, 12.500)
right <- c(0.000, 6.5, 1.6, 0.000)
df_topic6_wide <- data.frame(left, right)

Thanks! That was one of the issues, I am getting a new error now when I try to input the code above; would you know if I am creating the function incorrectly?:

> boot.out

ORDINARY NONPARAMETRIC BOOTSTRAP


Call:
boot(data = df_topic6_wide, statistic = med.diff, R = 1000)


Bootstrap Statistics :
WARNING: All values of t1* are NA

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.