zero inflated data and lognormal distribution

I am trying to fit a regression model to zero-inflated data with a lognormal distribution using r. The histogram looks like this:

I did some research on the net. So far I believe there is no possibility to fit these conditions to glm. I found the gamlss function as the possibility to fit a lognormal distribution with the LOGNO family. However I get an error: "family = LOGNO, : response variable out of range" - maybe because of the zero inflation?

To make my question a little clearer: I am trying to investigate the influence of various Aminoacid combinations collected under certain conditions on a certain ratio. The ratio is my response variable plotted in the shown histogram. In the end I end up with a continuous response variable and some other categorical independent variables

Has anyone an idea how I can deal with the above-mentioned problem? I couldn't find a solution so far! Thank you!

1 Like

Hi Mats,

If I have this right, I think you are looking for some help with modeling continuous zero inflated data? I believe some options for that are beta and gamma regression.

In your shoes I might try examining your outcome distribution with fitdistrplus to get a finer sense of the shape of the distribution (of course still directed mainly by your domain knowledge of the process) to see what may be a good fit (For instance, I'm not sure I would immediately describe the plot you included as log normal...). I am a big fan of the Cullen and Frey plot to give you an idea of the distribution family.

You can then start simple with say a glm with family = gamma, if that happens to be appropriate, or take a look at some more specialized packages in appropriate/needed for your data. Hope that helps!

Cheers,
Ben

Some possible resources that might help:

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.