Modeling bimodal curves with smsn.mix()


#1

Hi Rstudio community,
I am a student of the university of Leuven, and I’m new to this community and Rstudio in general. I am struggling to implement the function smsn.mix from the package mixsmsn, Fitting Finite Mixture of Scale Mixture of Skew-Normal Distributions. In my research I measure particles and the result is a long list (3059 observations) of diameters, data set is called Diameter1. I will give the code and the result in order to clarify my problem.
Code:

hist(Diameter1, breaks = 100, main = "Histogram of caps", xlab = "diameter")

Result:
See figure below, only the histogram, not the red curve. I obtain a distribution curve with two distinct modes (bimodal estimated means 1.5 and 17). The curves are not normal distribution, it shows large tailing. Now I would like to know the mean and the sigma value for each distribution around each mode.
Code:

Skew.map2 <- smsn.mix(Diameter1, nu =23, g =2, mu=c(1.5,17), sigma2=c(2,100), family = "Skew.t", calc.im = FALSE)
mix.hist(Diameter1, Skew.map2, breaks = 100)
Result:
Seemingly, only one mode was found (red curve). Details of the analysis by smsn.mix show indeed two modes, but they are unexpected.
Code:
mix.print(Skew.map2)
Result:
Number of observations: 3059

Hyperparameter(nu): 8.022694

   group 1 group 2

mu 18.578 19.297
sigma2 241.148 62.370
shape 1.671 -1.135

AIC: 22361.43

BIC: 22403.61

EDC: 22424.86

ICL: 23013.31

EM iterations: 11

The first mode around 1.5, is not seen by the function. When I ask to look for 3 modes, g = 3 in the function I do get a nice fit (red curve) and the first mode is observed, however still two modes are seen in the second part of the curve which seems strange.
Code:
Skew.map2 <- smsn.mix(Diameter1, nu =23, g =3, mu=c(1.5,17), sigma2=c(2,100),family = "Skew.t", calc.im = FALSE)

mix.hist(Diameter1, Snorm.map2, breaks = 100)

Result:
image

mix.print(Snorm.map2)

Number of observations: 3059

Hyperparameter(nu): 3.30653

   group 1 group 2 group 3

mu 21.634 1.539 12.499
sigma2 202.873 2.207 34.779
shape 3.454 24.159 2.480

AIC: 20687.02

BIC: 20753.3

EDC: 20786.7

ICL: 21232.43

EM iterations: 101

So my question is, how do I get a fit of the two peaks of my distribution, together with the info of means and sigma with this function smsn.mix. Why does it see 3 modes and I do not?

Thank you all in advance


#2

Hi Stud, welcome to community. I am not too familiar with this mixsmsn or what you are trying to do. But I, or other folks here might be able to help if you provide a reproducible example (often called reprex for short).
A reprex makes it much easier for others to understand your issue and figure out how to help. A lack of a reprex just makes it much less likely others will reply.

Your question is very close to a reprex, but we'd need your library/package loads, and you'd need to setup Diameter1 for us.


Having said all that, (and again, not really knowing much in this domain), are the three modes you're confused about listed mu 21.634 1.539 12.499? Looking at the histomgram, those appear to be the three local maxima.
Note super zoomed in at 21, a little jump up.

image


#3

Thank you for the swift response EconomiCurtis. I am not sure how I give you my library loads, so I have made a printscreen:
image

I actuallly only needed the mixsmsn package.
Package: mixsmsn
Title: Fitting Finite Mixture of Scale Mixture of Skew-Normal
_ Distributions_
Date: 2017-07-17
Version: 1.1-4
Authors@R: c(person("Marcos", "Prates", role=c("aut","cre","trl"),
_ email="marcosop@est.ufmg.br"),_
_ person("Victor", "Lachos", role="aut"),_
_ person("Celso", "Cabral", role="aut"))_
Description: Functions to fit finite mixture of scale mixture of
_ skew-normal (FM-SMSN) distributions._
Depends: R (>= 1.9.0), mvtnorm (>= 0.9-9)
Author: Marcos Prates [aut, cre, trl], Victor Lachos [aut], Celso Cabral [aut]
Maintainer: Marcos Prates marcosop@est.ufmg.br
License: GPL (>= 2.0)
Packaged: 2017-07-17 12:10:43 UTC; marcos
Repository: CRAN
NeedsCompilation: no
Date/Publication: 2017-07-17 22:51:36 UTC
Built: R 3.4.3; ; 2018-01-28 17:17:46 UTC; windows

I can give the raw data I used for Diameter1, which was in a csv file format, but I can not send it as is within this discussion maybe as a pdf? please let me know what you would find helpful.

Thank you for the insight of the three maxima, indeed the 'mu' values are what I call modes, although they are actually mean values.
However, my question remains, why does this function not see the mu 1.5, when I ask to look for only 2 modes (g = 2), and I define an estimate for the mu values (mu=(1.5,17))?

Kind regards


#4

Sorry, I'm not really sure.

Just for reference, this function is implementing "Functions to fit finite mixture of scale mixture of skew-normal (FM-SMSN) distributions."


You may start to dig into the source code for smsn.mix. You can find it here:


#5

I am new to Rstudio, so reading and changing source code is quite a challenge for me. Knowing how to change the source code would require me to better understand what is limiting the function to obtain the most expected/desired outcome. Is there someone wh could explain the mechanics of the function in general terms? The pdf is indeed helpful, as it has lead me to the function to begin with. In the file with the example of the BMI data, the following conclusion is made:

Then, between the normal mixture models considered, the best t occurs when we have three
components { although, as commented before, the data has a clear bimodal nature. That is,
we need more normal than skew-slash components to accommodate the asymmetric and/or
heavy tailed behaviour of this data, showing the flexibility of the latter model.

I have tried different family types in the function, but no improvements are seen.
The only minor improvement is when I apply a log transformation on my results. The fit obtained with the previously mentioned code is good and I get two modes. After transforming the result again (10^x), the mean values are more or less to be expected but the variances are very strange (very high and very small) unlike the results when the logtransformation is not done.
If still usefull, please find the raw data with this link:

Kind regards,

Student