# Difference in Differences - which VARIABLES do I need? dataset "hprice3" - Wooldridge

Hello everyone,
currently, I practise the handling with RStudio. My Prof. teached us to do a "DiD"-regression for the following formula:

LogHausPreis= Beta0 + Beta1Nähe +u

So the question is, which variables do I need to analyzie the "percentage price impact of an incinerator in the near of a house". The dataset belongs to Wooldridge:
Wooldridge

In my opinion, I need "price" (selling price) and "dist" (distance to incinerator). I did the corresponding regressions, but my Prof. has other results and I don`t find the mistakes
Maybe someone has an Idea? Maybe somebody already worked with this database?
THANK YOU, Christine

There are 137 data sets in the `{wooldridge}` library. Which one are you dealing with?

Thank You!

The dataset is to be called "HPrice3"

LG,

Christine

My guess is that you need a dummy variable for bein close to the incinerator or not.

The data don't support a DiD. See this explainer.

``````library(wooldridge)
# R is case sensitice; there is no HPrice3
data("hprice3")
# ordinary least squares regression
summary(lm(price ~ dist, data = hprice3))
#>
#> Call:
#> lm(formula = price ~ dist, data = hprice3)
#>
#> Residuals:
#>    Min     1Q Median     3Q    Max
#> -68772 -31196 -12955  23511 209165
#>
#> Coefficients:
#>              Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 7.519e+04  6.241e+03  12.046  < 2e-16 ***
#> dist        1.010e+00  2.788e-01   3.622  0.00034 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 42430 on 319 degrees of freedom
#> Multiple R-squared:  0.03949,    Adjusted R-squared:  0.03648
#> F-statistic: 13.12 on 1 and 319 DF,  p-value: 0.0003404
# ordinary least square regression with a log dependent variable
summary(lm(log(price) ~ dist, data = hprice3))
#>
#> Call:
#> lm(formula = log(price) ~ dist, data = hprice3)
#>
#> Residuals:
#>      Min       1Q   Median       3Q      Max
#> -1.19300 -0.31126 -0.05756  0.28315  1.30982
#>
#> Coefficients:
#>              Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 1.107e+01  6.189e-02 178.931  < 2e-16 ***
#> dist        1.465e-05  2.764e-06   5.299 2.18e-07 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 0.4207 on 319 degrees of freedom
#> Multiple R-squared:  0.0809, Adjusted R-squared:  0.07802
#> F-statistic: 28.08 on 1 and 319 DF,  p-value: 2.178e-07
``````

Created on 2023-01-06 with reprex v2.0.2

1 Like

Great!! For Level-Level and Log-Level you have the same results like me and you work with the same variables. This means my results should be correct! Thank You!

According the topic which belongs to DiD, I have to speak with my Prof. But my Idea is, if you do a DiD regression, you will have the same results like me, because you work with the same variables.

Thank You & LG, Christine

Hello startz,

I understand what you mean. I think this is a second option to deal with this dataset. The package "dummy" is installed. What would you write in R?

Thank You & LG, Christine

I don't know what's specified in your homework.

I suspect you are supposed to include a dummy for the year and for being near the incinerator. That's what is done in the Wooldridge textbook.

I would also suggest that before asking for homework help with your R code that you show us what regression you want run. Then people can give help on how to implement that in R.