# Hi all, I am relatively new to performing statistics with R and now I have a problem when trying to do a probit analysis for dose-response.
# Underneath is the table I am using.
dose total dead percentage survive
<dbl> <dbl> <dbl> <dbl> <dbl>
1 0.0825 100 20 0.2 80
2 0.165 100 26.7 0.267 73.3
3 0.248 100 30 0.3 70
4 0.33 100 33.3 0.333 66.7
5 0.412 100 40 0.4 60
6 0.495 100 53.3 0.533 46.7
7 0.578 100 65 0.65 35
8 0.66 100 73.3 0.733 26.7
9 0.742 100 80 0.8 20
# First I attach the data with the function attach
attach(data)
# Then I create a matrix of dead and survival
y=cbind(dead,survive)
# However, when I try to create binomial glm (probit model), I get a specific error
model.results = glm(data = data, y ~ dose,binomial(link="probit"))
Error in model.frame.default(formula = y ~ dose, data = data, :
variable lengths differ (found for 'dose')
# Is there someone who can help me with this error/problem, any help would be appreciated. Thanks in advance!
is the column survive or survival. ?
sorry my bad, i made a typo in the question, but in my R file both in the column and the ybind function I used survive.
I cant reproduce your issue . as :
data <- data.frame(
dose = c(0.0825, 0.165, 0.248, 0.33, 0.412, 0.495, 0.578, 0.66, 0.742),
total = c(100L, 100L, 100L, 100L, 100L, 100L, 100L, 100L, 100L),
dead = c(20, 26.7, 30, 33.3, 40, 53.3, 65, 73.3, 80),
percentage = c(0.2, 0.267, 0.3, 0.333, 0.4, 0.533, 0.65, 0.733, 0.8),
survive = c(80, 73.3, 70, 66.7, 60, 46.7, 35, 26.7, 20)
)
attach(data)
# Then I create a matrix of dead and survival
y=cbind(dead,survive)
# However, when I try to create binomial glm (probit model), I get a specific error
model.results = glm(data = data, y ~ dose,binomial(link="probit"))
In a fresh session, does not give a length differ error; but I'm also of the opinion that the code exhibits poor practice, and is probably worth starting over with.
The problems I see begin with the mixing of paradigms of where/how the data is loaded that glm is asked to process.
we have a data.frame passed in to the data param of glm, great but the formula involves a y that is not present there, etc. why do this ? what is the purpose/intent behind why you created 'y', are your trying to simultaneously predict two outputs from a single input ? you want to know dead and survive from a given dose ? Your dead and survive seem to be percentages which add up pairwise to total, so you would probable be best just predicting the dead % ( or surv %) from dose, and if you need the other stat its just the different of 100% - other stat;
its glm probit idea for this analysis, I'm unsure, probably you would have more rigor, if you had data of the actual trials / cases that your data is about, and use the glm weights, and have integer survival ( or death) cases depending on which way round you choose to frame the analysis.
Thanks for your reaction.
When producing the commands in a new session the problem did not occur and I think I got the right values of the LC50 and LC90 values that I wanted. Since I am just starting with R, most of the times I use examples that I find on the internet. For this specific probit analysis I used the example from this link: https://groups.google.com/g/ggplot2/c/Ssmm276zSnI?pli=1
I hope you may get more information about what I am doing by reading this link. Furthermore, to make it a little bit clearer for you I will provide you the follow up commands after making the model.results.
This are the following commands:
#create matrix of dead and survival
y = cbind(dead,survive)
#create binomial glm (probit model)
model.results = glm(data = data, y ~ dose,binomial(link="probit"))
summary(model.results)
#use function from MASS to calculate LC
dose.p(model.results,p=0.5)
dose.p(model.results,p=c(0.1,0.25,0.5,0.90))
I appreciate the link;
The reply on that link is particularly useful, it echoes some of my comments; Also I think it does have a key difference in the glm, which is that they have actual case counts of survival and death; and you dont; therefore you may be biasing your analysis; if there were not the same volume of total people evaluated at each dose measure.
Ah okay, but for my analysis I got my results in percentages, were I transformed this percentages into dead and survive. So the total will always be 100 and when for example 20% of the plants have died, dead will be 20 and survive 80. Do you think I can still use this method of analysis for this kind of results or should I try another?
You are introducing bias
How can I avoid this bias?
Have the counts in the way the 2nd replier on that thread does it
This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.
If you have a query related to it or one of the replies, start a new topic and refer back with a link.