Assume error in data.frame

saint · September 16, 2020, 8:55am

Hi to all coders out there.
I have a data.frame time series with production values for harvested wood products. For a sensitivity analysis I want to assume 5%,10% and 20% errors in these values.
the data frame looks like this:

data.frame': 58 obs. of 3 variables:
Year; Wood-based panels; Sawnwood ; Paper and paperboard
1960 NA 454000 6200
1961 2500 430000 6200
1962 1400 475000 12200
1963 700 510000 21100
1964 700 510000 20200
1965 1700 474000 21900.

How would you guys and gals do that? I would like to draw random values from a normal distribution, assuming a certain sd and loop that a 100 times? How do I do this?
Thanks in advance

pieterjanvc · September 16, 2020, 11:56am

Hi,

Have you looked at the Sensitivity package?
https://CRAN.R-project.org/package=sensitivity

I don't know exactly how to do this type of analysis, but it seems they have everything you need in there.

Hope this helps,
PJ

nirgrahamuk · September 17, 2020, 10:40pm

library(tidyverse)
getErr <- function(true,measured){
  paste0(round(mean(abs(measured-true)/true) * 100,digits = 1),
         "%") %>% cat("\n")
}

indata <- tribble(
  ~Year,~Woodbasedpanels,~Sawnwood ,~Paperandpaperboard,
1960, NA ,454000 ,6200,
1961, 2500, 430000, 6200,
1962, 1400, 475000,12200,
1963, 700 ,510000 ,21100,
1964, 700 ,510000, 20200,
1965, 1700, 474000 ,21900)

vec_of_mean_errs <- c(0.05,0.1,0.2)

standard_deviations <- tribble(
 ~ Woodbasedpanels,~ Sawnwood, ~Paperandpaperboard,
 1,100,10,
 2,200,20,
 5,5000,100
)

permute_vec <- function(vec,meanerr,sd){

  newvec_p <- vec + rnorm(n=seq_along(vec),mean = mean(vec,na.rm=TRUE)*meanerr,sd=sd)
  newvec_n <- vec - rnorm(n=seq_along(vec),mean = mean(vec,na.rm=TRUE)*meanerr,sd=sd)
  # getErr(vec,newvec_p)
  # getErr(vec,newvec_n)
  pindex <- sample(x = c(TRUE,FALSE),
                       size = length(vec),
                       replace = TRUE)
  newvec <- newvec_p
  newvec[!pindex] <- newvec_n[!pindex]
  # getErr(vec,newvec)
  newvec
}

innerloop <- function(err){
  suppressMessages({
  map_dfc(names(standard_deviations),
          ~ permute_vec(indata[[.]],err,standard_deviations[[.]])) %>% setNames(names(standard_deviations)) %>%
    mutate(err=err) %>% bind_cols(Year=indata$Year)
  })
}

outerloop <- function(x){
  cat(".")
  if(x%%50==0){
    cat("\n",x,"\n")
  }
  map_dfr(vec_of_mean_errs,innerloop)
}
 
results <-  map_dfr(1:500
        ,outerloop)


indata_tidy <- pivot_longer(indata,
                            cols = -Year )
results_tidy <- pivot_longer(results,
                            cols = c(-Year,-err ))



ggplot(data=indata_tidy,
       mapping = aes(x=Year,y=value))+geom_point(size=4,color="blue") + facet_wrap(vars(name,err),
                                                                                   nrow=3,
                                                                                   ncol=3,
                                                                                   scales="free")+
  geom_point(data=results_tidy,color="red",alpha=0.01,size=.5)

saint · September 18, 2020, 7:03am

■■■■ing genius! thanks man.
Can I buy you a coffee? DM me and I'll get you sorted, friend.

saint · September 18, 2020, 7:51am

Just one final question.
why did you set the values in the "standard_deviations" tribble the way you did?
Best regards

nirgrahamuk · September 18, 2020, 8:20am

The values are latgely arbitrary, I have little context to draw from

system · October 9, 2020, 8:20am

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.