Differences-in-Differences Method graph

An2C · June 9, 2020, 10:45am

Hello,

I'm doing a differences-in-differences analysis using R and i'm trying to create a graph show the trend of both groups as well as the intervention. I'm new to R so it's a bit trick. I have two groups over two periods of time. My data contains passenger arrivals at two airports from 2017 to 2019. But it looks a bit like this:

City        Datum       Passengers
HongKong    31.01.17    6190000
HongKong    28.02.17    5489000
HongKong    31.03.17    5888000
HongKong    30.04.17    6268000
HongKong    31.05.17    5982000
HongKong    30.06.17    5906000
HongKong    31.07.17    6521000
HongKong    31.08.17    6503000
HongKong    30.09.17    5601000
HongKong    31.10.17    6159000
HongKong    30.11.17    5938000
HongKong    31.12.17    6421000
HongKong    31.01.18    6128000
HongKong    28.02.18    5820000
HongKong    31.03.18    6128000
HongKong    30.04.18    5820000
HongKong    31.05.18    6398000
HongKong    30.06.18    6301000
HongKong    31.07.18    6038000
HongKong    31.08.18    6213000
HongKong    30.09.18    6661000
HongKong    31.10.18    6847000
HongKong    30.11.18    5566000
HongKong    31.12.18    6176000
HongKong    31.01.19    5995000
HongKong    28.02.19    6528000
HongKong    31.03.19    6420000
HongKong    30.04.19    6491000
HongKong    31.05.19    6236000
HongKong    30.06.19    6347000
HongKong    31.07.19    6729000
HongKong    31.08.19    5994000
HongKong    30.09.19    4857000
HongKong    31.10.19    5374000
HongKong    30.11.19    5026000
HongKong    31.12.19    5715000
Singapur    31.01.17    5256301
Singapur    28.02.17    4669729
Singapur    31.03.17    5112576
Singapur    30.04.17    5168548
Singapur    31.05.17    5003578
Singapur    30.06.17    5208779
Singapur    31.07.17    5415734
Singapur    31.08.17    5265703
Singapur    30.09.17    4927561
Singapur    31.10.17    5155327
Singapur    30.11.17    5173747
Singapur    31.12.17    5861990
Singapur    31.01.18    5303639
Singapur    28.02.18    4932345
Singapur    31.03.18    5303639
Singapur    30.04.18    4932345
Singapur    31.05.18    5555117
Singapur    30.06.18    5430745
Singapur    31.07.18    5294980
Singapur    31.08.18    5565775
Singapur    30.09.18    5723094
Singapur    31.10.18    5682688
Singapur    30.11.18    5225903
Singapur    31.12.18    5376234
Singapur    31.01.19    5408993
Singapur    28.02.19    6127843
Singapur    31.03.19    5630780
Singapur    30.04.19    5580503
Singapur    31.05.19    5407308
Singapur    30.06.19    5816089
Singapur    31.07.19    5910782
Singapur    31.08.19    5900629
Singapur    30.09.19    5469342
Singapur    31.10.19    5646643
Singapur    30.11.19    5718386
Singapur    31.12.19    6414495

These are the codes that i used in order to show the trends of both of my groups. I created a dummy for Time (before = 0 and after treatment = 1) and a dummy for the group (Hong Kong = treatement group = 0). They appear in the dataset once i run the codes.

y <- HONGKONG$Passengers
x <- HONGKONG$Datum
HONGKONG$Datum<-as.yearmon(x)
sort(as.yearmon(HONGKONG$Datum,format="%b-%y"))
limit<-c("2019-07-31")

HONGKONG$P = ifelse(HONGKONG$Datum >limit, 1, 0)
HONGKONG$S = ifelse(HONGKONG$City == "HongKong", 0, 1)
HONGKONG$did = HONGKONG$P * HONGKONG$S

city1<-HONGKONG$City==as.character("HongKong")
city2<-HONGKONG$City==as.character("Singapur")

plot(HONGKONG$Datum[city1],HONGKONG$Passengers[city1],
     type="l",col="blue",xlab="Zeitraum",
     main="Trend zwischen Hong Kong und Singapur",
     ylim=c(3200000,6800000), ylab="Anzahl Fluggäste",
     xlim=as.yearmon(c("2017-01-31", "2019-12-31")))

lines(HONGKONG$Datum[city2],HONGKONG$Passengers[city2],col="darkgreen")

legend("bottomright",
       legend=c("Hong Kong", "Singapur"),
       text.col=c("blue","darkgreen"),cex=0.99)

From here on i'm a bit stuck. I conducted the regression with the following codes:

didreg = lm(y ~ S + P + did, data=HONGKONG)
summary(didreg)

tab_model(didreg)

This is how I tried to create a graph, based on the information found under this link:

I alredy have a graph showing the common trend, but now i need to include the difference in difference estimator in my graph.

library(foreign)

a = sapply(subset(HONGKONG, P == 1 & S == 0, select=y), mean)
b = sapply(subset(HONGKONG, P == 1 & S == 1, select=y), mean)
c = sapply(subset(HONGKONG, P == 0 & S == 0, select=y), mean)
d = sapply(subset(HONGKONG, P == 1 & S == 0, select=y), mean)

(d-c)-(b-a)

However it doesn't work and i'm not sure how to improve it.

Thank you for the help!

FJCC · June 9, 2020, 1:54pm

In your last bit of code, where you are defining a, b, c, and d, the definitions of a and d are identical. The calculation of d should be

d = sapply(subset(HONGKONG, P == 0 & S == 1, select=y), mean)

Also, if you are defining S == 0 as the treatment, then the next calculation would be

(a-b) - (c-d)

That is, the difference from treatment to control group after the treatment minus the difference between the groups before the treatment. However, that might not match the sign of the coefficient in the linear model. If Hong Kong is the treatment, I would code that as S == 1 as in the blog post that you linked. Also, a and b in the blog refer to the time before the treatment, P == 0, but you have made those P == 1. All of those differences make the comparison of your code to his rather confusing.

An2C · June 9, 2020, 7:51pm

Thank you for the help!

kuriwaki · June 9, 2020, 7:58pm

You may want to cluster your standard errors at the country-level for proper inference, especially as you get more data.

These videos walk through both visualizing the trend (using ggplot/tidyverse) and estimating the coefficient (using lfe::felm), if of interest:
Difference-in-Differences Estimation in R: https://vimeo.com/channels/1569326

An2C · June 10, 2020, 3:00pm

thank you the videos were very helpful

system · June 17, 2020, 3:00pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.