How can I make a histogram with track of distribution?

Hello!
I've tried to make a histogram with figure by using data but failed to make it.Could you help me to find the solution?

 rain <- scan("http://robjhyndman.com/tsdldata/hurst/precip1.dat",skip=1)
 rainseries <- ts(rain,start=c(1813))
 rainseriesforecasts <- HoltWinters(rainseries, beta=FALSE, gamma=FALSE)
 library("forecast")
 rainseriesforecasts2 <- forecast:::forecast.HoltWinters(rainseriesforecasts, h=8)
 rainseriesforecasts2

I made a function of plotForecastError as below and tried it to show a graph but got some errors.

 plotForecastErrors <- function(forecasterrors)
 { mysd <- sd(forecasterrors)
 mymin <- min(forecasterrors) - mysd*5 
mymax <- max(forecasterrors) + mysd*3 
mynorm <- rnorm(10000, mean=0, sd=mysd) 
mymin2 <- min(mynorm) 
 mymax2 <- max(mynorm) 
 if (mymin2 < mymin) { mymin <- mymin2 } 
 if (mymax2 > mymax) { mymax <- mymax2 } 

 mybins <- seq(mymin, mymax, mybinsize)
 hist(forecasterrors, col="red", freq=FALSE, breaks=mybins) 
 myhist <- hist(mynorm, plot=FALSE, breaks=mybins) 

points(myhist$mids, myhist$density, type="l", col="blue", lwd=2)  }
plotForecastErrors(rainseriesforecasts2$residuals)

 Error in quantile.default(as.numeric(x), c(0.25, 0.75), na.rm = na.rm,  : 
  missing values and NaN's not allowed if 'na.rm' is FALSE

Could you help me find any revisions for the function?

Thank you very much!

TG

The error message says "missing values and NaN's not allowed if 'na.rm' is FALSE" so do

sum(is.na(rainseriesforecasts2$residuals))

and you will see that you have one NA in the residuals. You need to remove that NA or to set na.rm = TRUE at the appropriate step. You can find which step in the function causes the problem by setting

forecasterrors <- rainseriesforecasts2$residuals

and executing the steps of your function one by one as if they were steps in a script.

I will also note that the line

mybins <- seq(mymin, mymax, mybinsize)

is likely to cause other problems. The values of mymin and mymax may have been set by the limits of mynorm but mybinsize was set by the limits of forecasterrors. mybinsize may not fit an integer number of times between mymin and mymax. You might try using the length.out parameter of seq() instead.

Also, please format code by typing three back tics ,```, before and after the code. On a US keyboard, the back tic is just above the TAB key. The code you posted is very difficult to read.

Dear FJCC,

Thank you very much for your great help!
I reformatted the function in the above.
Could you check this again?

Best regards,

TG

As FJCC said, you code is still hard to read, please take a look into this FAQ about code formatting.

Could you check again please?

Thank you!

TG

Thank you very much for your comments!
Could you check this again?

Insang

The main problem with your code is that the 1st observation in rainseriesforecasts2$residuals is NA. So, all of mysd, mymin, mymax are NA, as you haven't used na.rm = TRUE. This leads to the fact that all elements of mynorm are NaN, because you're supplying NA to sd argument of rnorm.

If you add na.rm = TRUE to the calculations of mysd, mymin and mymax, then you'll avoid this problem. However, I cannot test it, as I don't know what is mybinsize. You're using it inside the function, but haven't defined it anywhere in your latest code. In my solution below, I used what was here before edits, without knowing whether it's correct or not.

A working code
library(forecast)
#> Registered S3 method overwritten by 'xts':
#>   method     from
#>   as.zoo.xts zoo
#> Registered S3 method overwritten by 'quantmod':
#>   method            from
#>   as.zoo.data.frame zoo
#> Registered S3 methods overwritten by 'forecast':
#>   method             from    
#>   fitted.fracdiff    fracdiff
#>   residuals.fracdiff fracdiff
rainDataset <- scan(file = "http://robjhyndman.com/tsdldata/hurst/precip1.dat",
                    skip = 1)
rainTimeSeries <- ts(data = rainDataset,
                     start = 1813)
rainHoltWinters <- HoltWinters(x = rainTimeSeries,
                               beta = FALSE,
                               gamma = FALSE)
rainForecasts <- forecast(object = rainHoltWinters,
                          h = 8)
plotResiduals <- function(forecastErrors)
{
  forecastErrorsSd <- sd(x = forecastErrors,
                         na.rm = TRUE)
  forecastErrorsMin <- min(forecastErrors,
                           na.rm = TRUE) - forecastErrorsSd * 5
  forecastErrorsMax <- max(forecastErrors,
                           na.rm = TRUE) + forecastErrorsSd * 3
  forecastErrorsNorm <- rnorm(n = 10000,
                  mean = 0,
                  sd = forecastErrorsSd)
  binMin <- min(forecastErrorsMin, forecastErrorsNorm)
  binMax <- max(forecastErrorsMax, forecastErrorsNorm)
  binBreaks <- IQR(x = forecastErrors,
                   na.rm = TRUE) / 4 # obtained from previous edits of OP
  bins <- seq(from = binMin,
              to = binMax,
              by = binBreaks)
  hist(x = forecastErrors,
       col = "red",
       freq = FALSE,
       breaks = bins)
  with(data = hist(x = forecastErrorsNorm,
                   plot = FALSE,
                   breaks = bins),
       expr = lines(x = mids,
                    y = density,
                    col = "blue",
                    lwd = 2))
}
plotResiduals(forecastErrors = rainForecasts$residuals)

Created on 2019-09-16 by the reprex package (v0.3.0)

Hope this helps.

Note

It's not really relevant to this thread, but you shouldn't use :::. I know I suggested it to you earlier, but since Nathan already gave you a better solution (and you also chose it), you should use that only.

Hi Yarnabrina,

Thank you very much for your great solution!
I will use this for the plot.
About :::, I did not have time to check the correct method yet.
Using this just for the convenience, I will use the right code for the next time.
Best,

TG

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.