My predicted line is way above actual. Ideally it would cut right through the middle. Are there any adjustments I could make to my fit to get a better line through this range of predictor variable days?
It's hard to see the issue clearly because GrowthRate spans orders of magnitude. I think it would be easier to work with log(GrowthRate). Then, plotting the prediction, it will be easier to see what's happening.
Following up on @arthur.t 's suggestion, if you don't have i in your model then the plot plot you just showed us should be roughly linear. Since that's wildly untrue, it suggests that the functional form in your model isn't right.
Out of curiosity, what are you modelling the growth rate of?
I was able to get nls with a coefficient per your initial suggestion to complete with these parameters:
mod.nls.c <- nls(GrowthRate ~ i + c * I(DAY^power),
data = exdata,
start = list(power = -0.01, i = 1.005258, c = 2),
control = nls.control(maxiter = 200))
mod.nls.c |> summary()
Formula: GrowthRate ~ i + c * I(DAY^power)
Parameters:
Estimate Std. Error t value Pr(>|t|)
power -6.285e-01 9.733e-04 -645.7 <2e-16
i 9.898e-01 6.423e-05 15410.8 <2e-16
c 4.587e-01 7.141e-04 642.3 <2e-16
power ***
i ***
c ***
---
Signif. codes:
0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.03025 on 1064596 degrees of freedom
Number of iterations to convergence: 25
Achieved convergence tolerance: 7.798e-06
But the same issue seems to persist, here's the plot and zoomed in plot:
exdata$PredictionsNLSC <- predict(mod.nls.c)
set.seed(123)
exdata |>
sample_n(200000) |>
ggplot(aes(x = DAY, y = GrowthRate)) +
geom_point(alpha = 0.01, color = 'grey') +
geom_line(aes(x = DAY, y = PredictionsNLSC), color = 'steelblue') +
theme_minimal()
I think I'm starting to see. Growth isn't even over time. Probably not surprising for an app. You might think about a more flexible functional form with respect to DAY. Exactly how to do that might depend some on the purpose of the estimate.
There might be something weird going on here where the growth rate seems to level off to 1 instead of 0, and that's why the log doesn't accomplish what it's supposed to. Perhaps the growth rate needs to be defined differently so that zero growth produce quantitatively growth = 0 and not 1.