Is there a function in R that allows fitting a multinomial distribution, but specifying parameter restrictions?

0

I have collected responses to an item with 3 possible response categories, and I would like to fit a multinomial distribution to it using with restriction that 2 of the 3 probabilities must be equal.

How can I do that? I would like to do that, so that I could compare the fit with such restriction to a fit with no such restriction using LRT test.

Thank you

Welcome to the community! :slight_smile:

I do not know a general solution for fitting distribution with specific constraints on the parameters.

But in this particular case, can't you use fitdistr in MASS? You can express the pmf of trinomial with the equality constraint of two probabilities in a nice form, and you can definitely write a function returning the probabilities of different mass points. What happens you pass that function as densfun?

Or, after applying the constraint, it can also be looked as a binomial pmf. Can't you use it directly?

I'm on the go, and hence can't try to experiment it myself. Can you please let me know what happens (success or failure) with these approaches?

1 Like

@Yarnabrina Thank you very much for your advice. I am a very basic used with little math background also, so I am struggling with writing the function you're suggesting myself. Any chance I can look it up somewhere? Thanks... :pensive:

I'm not much good with math either, especially not with the abstract part, but let me try to explain what I meant. Then you (and anyone else coming across this) can think whether this is a correct approach or not, and let me know, as I am out of touch with statistics for some time.

Suppose this is your distribution under consideration MN(n, p_1, p_2, p_3), where p_1 + p_2 + p_3 = 1, with the PMF being the following, where x_1 + x_2 + x_3 = n:

f(x_1, x_2, x_3 ; p_1, p_2, p_3) =constant \times p_1^{x_1} \times p_2^{x_2} \times p_3^{x_3}

For the unconstrained optimisation, the MLE's will be \hat{p_1}=\frac{x_1}{n}, \hat{p_2}=\frac{x_2}{n}, \hat{p_3}=\frac{x_3}{n}, and using these in the above expression, you can find the supremum in the denominator.

For the constrained optimisation, let's assume that your null hypothesis is the equality of p_1 and p_2. Then, in that case you can rewrite the PMF as follows:

g(x_1, x_2, x_3; q_1, q_2) = \text{another_constant} \times q_1^{x_1 + x_2} \times q_3^{x_3}

where q_1 = p_1 = p_2 and q_2 = p_3. Then, the MLE's will be \hat{q_1}=\frac{x_1 + x_2}{n}, \hat{q_2}=\frac{x_3}{n}. Using these, ou can find the supremum in the numerator.

Does any of these make sense?

2 Likes

It does! Thank you very much! Trying to make it work)

@Yarnabrina That's really not working. Do you know a freelancer or someone else by any chance who could write the code? Thank you!

I do not. And, you are unlikely to get a help of that type in this community, as no one will code for you without first checking what you have tried, where you failed and how you tried to solve the problems.

Do you mean the method i suggested is wrong, or you were unable to implement it?

If the first, then I am really sorry, but I do not have another suggestion. When you have a better solution, please share with us.

If not the former and the latter, then I think I might want to revise your statistics skills, because my suggestion should not require any coding at all. When you have the observed data, you already have the observations x_1, x_2 and x_3. That should be sufficient to find the MLE's, and hence the likelihoods. If you failed to do that in 13 days, then I am afraid that something is not quite right.

1 Like

Fair points. Thank you.

1 Like

@Yarnabrina, sorry, but since we have already started.

Is this what you mean?
(I use your notation)

prob1<-c(p1,p2,p3)
prob2<-c(q1,q1,q2)

fit2<-xmonte(observed, prob2)
fit2

Does this basically do the thing? or am I getting it wrong?

I have not used this function or the package XNomial before, so I can't say for sure. Please wait for others to chime in.

But it seems to me that you are actually okay with my suggestion. So, why don't you use table function on observations to get the frequencies (say f, a vector of 3 elements x1, x2 and x3) of different classes, and use those values directly to get the likelihoods, as you actually know the constants?

1 Like

Hi @Yarnabrina,

Thank you for the reply. That's exactly what I did about frequencies, just didn't want to complicate the code/explanations here.

Thank you very much for your help!