Hello,
I'm doing a differences-in-differences analysis using R and i'm trying to create a graph show the trend of both groups as well as the intervention. I'm new to R so it's a bit trick. I have two groups over two periods of time. My data contains passenger arrivals at two airports from 2017 to 2019. But it looks a bit like this:
City Datum Passengers
HongKong 31.01.17 6190000
HongKong 28.02.17 5489000
HongKong 31.03.17 5888000
HongKong 30.04.17 6268000
HongKong 31.05.17 5982000
HongKong 30.06.17 5906000
HongKong 31.07.17 6521000
HongKong 31.08.17 6503000
HongKong 30.09.17 5601000
HongKong 31.10.17 6159000
HongKong 30.11.17 5938000
HongKong 31.12.17 6421000
HongKong 31.01.18 6128000
HongKong 28.02.18 5820000
HongKong 31.03.18 6128000
HongKong 30.04.18 5820000
HongKong 31.05.18 6398000
HongKong 30.06.18 6301000
HongKong 31.07.18 6038000
HongKong 31.08.18 6213000
HongKong 30.09.18 6661000
HongKong 31.10.18 6847000
HongKong 30.11.18 5566000
HongKong 31.12.18 6176000
HongKong 31.01.19 5995000
HongKong 28.02.19 6528000
HongKong 31.03.19 6420000
HongKong 30.04.19 6491000
HongKong 31.05.19 6236000
HongKong 30.06.19 6347000
HongKong 31.07.19 6729000
HongKong 31.08.19 5994000
HongKong 30.09.19 4857000
HongKong 31.10.19 5374000
HongKong 30.11.19 5026000
HongKong 31.12.19 5715000
Singapur 31.01.17 5256301
Singapur 28.02.17 4669729
Singapur 31.03.17 5112576
Singapur 30.04.17 5168548
Singapur 31.05.17 5003578
Singapur 30.06.17 5208779
Singapur 31.07.17 5415734
Singapur 31.08.17 5265703
Singapur 30.09.17 4927561
Singapur 31.10.17 5155327
Singapur 30.11.17 5173747
Singapur 31.12.17 5861990
Singapur 31.01.18 5303639
Singapur 28.02.18 4932345
Singapur 31.03.18 5303639
Singapur 30.04.18 4932345
Singapur 31.05.18 5555117
Singapur 30.06.18 5430745
Singapur 31.07.18 5294980
Singapur 31.08.18 5565775
Singapur 30.09.18 5723094
Singapur 31.10.18 5682688
Singapur 30.11.18 5225903
Singapur 31.12.18 5376234
Singapur 31.01.19 5408993
Singapur 28.02.19 6127843
Singapur 31.03.19 5630780
Singapur 30.04.19 5580503
Singapur 31.05.19 5407308
Singapur 30.06.19 5816089
Singapur 31.07.19 5910782
Singapur 31.08.19 5900629
Singapur 30.09.19 5469342
Singapur 31.10.19 5646643
Singapur 30.11.19 5718386
Singapur 31.12.19 6414495
These are the codes that i used in order to show the trends of both of my groups. I created a dummy for Time (before = 0 and after treatment = 1) and a dummy for the group (Hong Kong = treatement group = 0). They appear in the dataset once i run the codes.
y <- HONGKONG$Passengers
x <- HONGKONG$Datum
HONGKONG$Datum<-as.yearmon(x)
sort(as.yearmon(HONGKONG$Datum,format="%b-%y"))
limit<-c("2019-07-31")
HONGKONG$P = ifelse(HONGKONG$Datum >limit, 1, 0)
HONGKONG$S = ifelse(HONGKONG$City == "HongKong", 0, 1)
HONGKONG$did = HONGKONG$P * HONGKONG$S
city1<-HONGKONG$City==as.character("HongKong")
city2<-HONGKONG$City==as.character("Singapur")
plot(HONGKONG$Datum[city1],HONGKONG$Passengers[city1],
type="l",col="blue",xlab="Zeitraum",
main="Trend zwischen Hong Kong und Singapur",
ylim=c(3200000,6800000), ylab="Anzahl Fluggäste",
xlim=as.yearmon(c("2017-01-31", "2019-12-31")))
lines(HONGKONG$Datum[city2],HONGKONG$Passengers[city2],col="darkgreen")
legend("bottomright",
legend=c("Hong Kong", "Singapur"),
text.col=c("blue","darkgreen"),cex=0.99)
From here on i'm a bit stuck. I conducted the regression with the following codes:
didreg = lm(y ~ S + P + did, data=HONGKONG)
summary(didreg)
tab_model(didreg)
This is how I tried to create a graph, based on the information found under this link:
I alredy have a graph showing the common trend, but now i need to include the difference in difference estimator in my graph.
library(foreign)
a = sapply(subset(HONGKONG, P == 1 & S == 0, select=y), mean)
b = sapply(subset(HONGKONG, P == 1 & S == 1, select=y), mean)
c = sapply(subset(HONGKONG, P == 0 & S == 0, select=y), mean)
d = sapply(subset(HONGKONG, P == 1 & S == 0, select=y), mean)
(d-c)-(b-a)
However it doesn't work and i'm not sure how to improve it.
Thank you for the help!