I am trying to determine the regression equation for the relationship between the water level (stage) (in metres) and the discharge (m3/s) of a stream).
This code produced the following plot, which shows a clear exponential relationship between Stage and Discharge (with the exception of something strange going on between a stage of approximately 5.13 and 5.15 mm). I'm not sure what is going on here, and I have too many data points (17665 observations) to try to find these values.
Thank you, is there a way to 1) plot this model over the original data to check how well it represents the data, and 2) determine the equation of the model? My datasheet has several values for "Stage" that do not have a corresponding discharge, and I want to be able to create a formula that I can use to calculate missing values for the discharge based on known values for stage.
regressionResult <- lm(Discharge~exp(Stage, data = df)
summary(regressionResult)
prediction <- predict(regressionResult)
summary() will show you the coefficients for the equation and predict() will give you the predicted values. To predict from a different set of data, put it in a new dataframe like
Thank you. When I run the summary on the regressionResult, I receive the following output:
Call:
lm(formula = df$Discharge ~ exp(df$Stage))
Residuals:
Min 1Q Median 3Q Max
-0.07482 -0.05960 -0.01539 0.04806 1.22838
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -4.117812 0.009718 -423.7 <2e-16 ***
exp(df$Stage) 0.028988 0.000065 446.0 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.07444 on 17662 degrees of freedom
Multiple R-squared: 0.9184, Adjusted R-squared: 0.9184
F-statistic: 1.989e+05 on 1 and 17662 DF, p-value: < 2.2e-16
How do I find the regression equation from this, and is there a way to plot the regression equation over my original data to check the fit of the model?
The equation is Discharge = -4.1178 + 0.0289\times exp(Stage).
You can use lines() to add a line to a plot, using the values of your new data as the x variable and the values from predict() as the y variable.
Ok, I am clearly doing something wrong. When I ran the regression using the steps outlined by @startz, I got a discharge equation where Discharge = -4.1178 + 0.0289 x exp(Stage). The points on my curve follow a fairly clear regression pattern, which should mean that the regression equation should represent the data well. However, I ran a few of my first observations: Shown below:
As shown in the calculations above, the discharge formula that I created does not accurately calculate the discharge based on the stage of my data. Does anyone have any advice?
Consider the mtcars dataset. Suppose you want to estimate mpg as an exponential function of disp, or mpg = A*e^(B*disp). How can this nonlinear function be estimated using a linear regression model?
Because A = e^log(A), this equation can be rewritten as mpg = e^(log(A) + B*disp). Taking the natural log of both sides yields log(mgp) = log(A) + B*disp. This is linear with log(mpg) as the Y variable and disp as the X variable, log(A) as the intercept and B as the slope coefficient.
lm(log(mpg) ~ disp, mtcars) will estimate log(A) and B and thus estimate the coefficients for the nonlinear equation mgp = e^(log(A) + B*disp).
This is all due to the magical properties of the exponential function and its inverse, the natural log function.
lm(formula = log(mpg) ~ disp, data = mtcars)
#>
#> Call:
#> lm(formula = log(mpg) ~ disp, data = mtcars)
#>
#> Coefficients:
#> (Intercept) disp
#> 3.445548 -0.002115
# predicted mpg for a car with disp = 225,
# which has actual mpg = 18.1
exp(3.445 - 0.002115*225)
#> [1] 19.47487
Everything @EconProf says is true (well duh!). But it's worth noting that a log equation and a nonlinear exponential equation are not the same because of the residuals. One is
y=A\times exp(b\times x) + \epsilon
The other is
log(y) = log(A) + b\times X+\epsilon
Note for the OP: You might actually want to have three coefficients: