Fitting continious distributions in R

Good afternoon. I have a vector 'a' containing 16000 values. I get the descriptive statistics with the help of the following:

library(pastecs)
library(timeDate)
stat.desc(a)
skewness(a)
kurtosis(a)

Especially skewness=-0.5012, kurtosis=420.8073 (1)

Then I build a histogram of my empirical data: hist(a,col="lightblue",breaks = 140, border="white",main="", xlab="Value",xlim=c(-0.001,0.001))

Empirical

After this I try to fit a theoretical distribution to my empirical data. I choose Variance-Gamma distribution and try to get its parameter estimates on my data: library(VarianceGammma) a_VG<-vgFit(a)

The parameter estimates are the following: vgC=-11.7485, sigma=0.4446, theta=11.7193, nu=0.1186 (2)

Further, I create a sample from the Variance-Gamma distribution with the parameters from (2) and build a histogram of created theoretical values:

VG<-rvg(length(a),vgC=-11.7485,sigma=0.4446,theta=11.7193,nu=0.1186) hist(VG,breaks=140,col="orange",main="",xlab="Value")

VG_theoretical

But the second histogram differs absolutely from the first (empirical) histogram. Moreover, it is built on the basis of the parameters (2), which I got on the empirical data.

What's wrong with my code? How can I fix it?

The parameters skewness and kurtosis only have meaning in the context of an approximately normal distribution, which your example asserts/assumes it is not... Why did you choose the Variance-Gamma distribution ? Perhaps that is simply a poor choice given your empircal distribution?

I chose Varaince-Gamma distribution, as it has not only location (mean) and spread (standard deviation), as in the case of normal distribution, but also asymmetry and shape parameters. In this distribution asymmetry and shape parameters represent actual skewness and kurtosis of the empirical data.

Im not saying variance gamma doesn't have , but other distributions do also, and they may fit better.
Can you share the data? Perhaps via github.

Thats a very interesting set of numbers

library(tidyverse)
library(cowplot)
df <- read.csv("mydata2.txt", header = FALSE)



empiricalvec <- c(
  df$V1,
  df$V2,
  df$V3,
  df$V4
) %>% na.omit()


styleplot <- function(p){
p  + geom_histogram(bins = 10, alpha = .1, fill = "green")+
    geom_histogram(bins = 50, alpha = .2, fill = "blue") +
    geom_histogram(bins = 250, alpha = .4, fill = "yellow") +
    geom_histogram(bins = 1250, fill = "red") 
}
fullplot <- ggplot(
  data = enframe(empiricalvec),
  aes(x = value)
) %>% styleplot



# find 5th and 9th percentile
(q <- quantile(empiricalvec, probs = c(0.1, .90)))
# enframe(empiricalvec) %>% filter(between(value,q[1],q[2]))

plot80x <- styleplot(ggplot(
  data = enframe(empiricalvec),
  aes(x = value)
) +
  xlim(q) )



(q2 <- quantile(empiricalvec, probs = c(0.45, .55)))
plot_center <- styleplot(ggplot(
  data = enframe(empiricalvec),
  aes(x = value)
) +
  xlim(q2) )


cowplot::plot_grid(fullplot,
  plot80x,
  plot_center,
  ncol = 3
)

I'm not very familiar with 'tidyverse' and 'cowplot' packages.
On all these plots I see different histograms with different number of bins (intervals).
But how is it related to fitting Variance-Gamma distribution to this empirical dataset?

Your data doesn't appear continious, rather quantized... I show the same data with with different bins to show you that analysing this by bin is relatively arbitrary and likely to appear different from any continuous distribution you may fit.

What is the context of your data? What is being measured?

I think fitting something with a trigonometric term like a sinusoidal multiplicative might give some better approximation.

It's quite zero-inflated, ~4% of your observations

Logarithmic stock returns are being measured.
'with a trigonometric term like sinusoidal multiplicative' - I don't understand quite well. What example of distribution do you mean by that? Something discrete?

I mean that these dont look like simple stock returns (log transformed or otherwise) as they seem regularly discontinious/ discrete. like for example

example_data <- data.frame(
                    x = seq(from=-2*pi,
                    to=2*pi,
                    length.out = 1000),
                    dn=dnorm(1:1000/500,1,1)) %>% rowwise() %>%
  mutate(y1= ifelse(abs(sin(2*x+.5*pi))>.9,sample((0:10)/10,
                                                  size = 1),NA),
         y2 = y1 * dn) %>% na.omit()

  plot(example_data$x,example_data$y2)

But they are in fact 1-minute logarthmic returns.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.