Non-Linear Regressions

For a Fisheries Research class project, I am using R to get p-values for linear and non-linear regressions. I have successfully gotten p-values for linear and logarithmic regressions, and the equations matched the regressions I did in excel to check.

Despite some success, I have had some troubles with exponential and power function regressions. I have gotten outputs for both, but the estimated parameters are very different from the ones I got in Excel. I've included the code that I've been using for each regression type below.

As additional context, I am doing regressions with percent brown trout (p_brown) on the y axis and environmental conditions on the x axis. I have named the environmental conditions as follows: season_condition_extreme. For example, fa_d_max would correspond to fall_discharge_maximum.

Linear (works & matches Excel)

linreg_fa_d_max <- lm(p_brown~fa_d_max)

Logarithmic (works & matches Excel)

logreg_su_d_max <- nls(p_brown~a*log(su_d_max)+b,
                       start = list(a = 4, b = -2))

Exponential (works, but doesn't match Excel)
Excel regression: y = 5.6287 * e^(0.0325​ * x)
R regression: y = 7.248 * e^(0.0271 * x)

expreg_sp_d_min <- nls(p_brown~a*exp(b*sp_d_min),
                       start = list(a = 5, b = 0))

Power (works, but doesn't match Excel)
Excel regression: y = 3.4559 * x^(0.4272)
R regression: y = 7.5279 * x^(0.2282)

powreg_fa_d_min <- nls(p_brown~a*(fa_d_min^(b)),
                       start = list(a = 3.5, b = .4))

I appreciate any help or pointers that I can get. My experience in R is minimal, so this is all very new to me. Thanks!

Can you provide a reproducible example?

Have you compared performance metrics/residuals to form an opinion of whether the excel solutions are preffered to your nls solutions?

Here is a test dataset to use for each regression type. I've included a screenshot of the Excel output that I get from it.

    ~env_cond,    ~p_fish,
40.35139752, 34.93150685,
 28.4673913, 21.70564746,
40.75807453, 24.90221643,
36.28462733, 7.305936073,
28.01552795, 13.32464571,
36.32981366, 15.70881226,
30.68152174, 8.983890954,
30.18447205, 18.71401152
#> # A tibble: 8 x 2
#>   env_cond p_fish
#>    <dbl>   <dbl>
#> 1   40.4   34.9 
#> 2   28.5   21.7 
#> 3   40.8   24.9 
#> 4   36.3    7.31
#> 5   28.0   13.3 
#> 6   36.3   15.7 
#> 7   30.7    8.98
#> 8   30.2   18.7

I just compared the Residual Standard Errors of one of the power functions in Excel vs R. The RSE for R was 8.416 and 6.168 in Excel.

How does the data that you have provided relate to the code that you initially posted?

After some more digging, it seems that the difference in results between in Excel and R is due to a difference in regression methodology. The nls() function in R takes a more complex yet direct approach of running a regression on the data, while Excel transforms the data, runs a linear regression, then converts the parameters back into the nonlinear, non-transformed equivalent.

My general rule is if there a difference between R and Excel trust R.

Excel will do some strange things sometimes.

I am Linex so I cannot check in Excel but, if you do not mind a slightly stupid suggestion, what about trying the regression in the spreadsheet program gnumeric? I know this does not answer the R vs Excel question directly but it is another source of comparison and gnumeric algorithms in regression seem a bit more trustworthy than Excel's.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.