trouble with error variable lengths differ

# Hi all, I am relatively new to performing statistics with R and now I have a problem when trying to do a probit analysis for dose-response.

# Underneath is the table I am using.
dose total  dead percentage survive
   <dbl> <dbl> <dbl>      <dbl>   <dbl>
1 0.0825   100  20        0.2      80  
2 0.165    100  26.7      0.267    73.3
3 0.248    100  30        0.3      70  
4 0.33     100  33.3      0.333    66.7
5 0.412    100  40        0.4      60  
6 0.495    100  53.3      0.533    46.7
7 0.578    100  65        0.65     35  
8 0.66     100  73.3      0.733    26.7
9 0.742    100  80        0.8      20

# First I attach the data with the function attach
attach(data)

# Then I create a matrix of dead and survival
y=cbind(dead,survive)

# However, when I try to create binomial glm (probit model), I get a specific error
model.results = glm(data = data, y ~ dose,binomial(link="probit"))
Error in model.frame.default(formula = y ~ dose, data = data,  : 
  variable lengths differ (found for 'dose')

# Is there someone who can help me with this error/problem, any help would be appreciated. Thanks in advance!

is the column survive or survival. ?

sorry my bad, i made a typo in the question, but in my R file both in the column and the ybind function I used survive.

I cant reproduce your issue . as :

data <- data.frame(
        dose = c(0.0825, 0.165, 0.248, 0.33, 0.412, 0.495, 0.578, 0.66, 0.742),
       total = c(100L, 100L, 100L, 100L, 100L, 100L, 100L, 100L, 100L),
        dead = c(20, 26.7, 30, 33.3, 40, 53.3, 65, 73.3, 80),
  percentage = c(0.2, 0.267, 0.3, 0.333, 0.4, 0.533, 0.65, 0.733, 0.8),
     survive = c(80, 73.3, 70, 66.7, 60, 46.7, 35, 26.7, 20)
)

attach(data)

# Then I create a matrix of dead and survival
y=cbind(dead,survive)

# However, when I try to create binomial glm (probit model), I get a specific error
model.results = glm(data = data, y ~ dose,binomial(link="probit"))

In a fresh session, does not give a length differ error; but I'm also of the opinion that the code exhibits poor practice, and is probably worth starting over with.

The problems I see begin with the mixing of paradigms of where/how the data is loaded that glm is asked to process.
we have a data.frame passed in to the data param of glm, great but the formula involves a y that is not present there, etc. why do this ? what is the purpose/intent behind why you created 'y', are your trying to simultaneously predict two outputs from a single input ? you want to know dead and survive from a given dose ? Your dead and survive seem to be percentages which add up pairwise to total, so you would probable be best just predicting the dead % ( or surv %) from dose, and if you need the other stat its just the different of 100% - other stat;
its glm probit idea for this analysis, I'm unsure, probably you would have more rigor, if you had data of the actual trials / cases that your data is about, and use the glm weights, and have integer survival ( or death) cases depending on which way round you choose to frame the analysis.

Thanks for your reaction.
When producing the commands in a new session the problem did not occur and I think I got the right values of the LC50 and LC90 values that I wanted. Since I am just starting with R, most of the times I use examples that I find on the internet. For this specific probit analysis I used the example from this link: https://groups.google.com/g/ggplot2/c/Ssmm276zSnI?pli=1
I hope you may get more information about what I am doing by reading this link. Furthermore, to make it a little bit clearer for you I will provide you the follow up commands after making the model.results.

This are the following commands:
#create matrix of dead and survival
y = cbind(dead,survive)

#create binomial glm (probit model)
model.results = glm(data = data, y ~ dose,binomial(link="probit"))
summary(model.results)

#use function from MASS to calculate LC
dose.p(model.results,p=0.5)
dose.p(model.results,p=c(0.1,0.25,0.5,0.90))

I appreciate the link;
The reply on that link is particularly useful, it echoes some of my comments; Also I think it does have a key difference in the glm, which is that they have actual case counts of survival and death; and you dont; therefore you may be biasing your analysis; if there were not the same volume of total people evaluated at each dose measure.

Ah okay, but for my analysis I got my results in percentages, were I transformed this percentages into dead and survive. So the total will always be 100 and when for example 20% of the plants have died, dead will be 20 and survive 80. Do you think I can still use this method of analysis for this kind of results or should I try another?

You are introducing bias

How can I avoid this bias?

Have the counts in the way the 2nd replier on that thread does it

1 Like

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.