I'm trying to create a synthetic control by using R package "Synth". However, my code isn't working. I'm receiving the message "undefined columns selected", though my unit.variable is in fact nummeric (from 1 to 27).
I'm working with a panel dataset. My data is annual, ranging from 2012 to 2021, and I have dependent variable, "hompc", for 27 states, "uf", each with an "id" (number from 1 to 27).
My code is as follows:
# Loading syntehtic control
install.packages("Synth")
library("Synth")
> # dataprep for Synth
> dataprep.out <- dataprep(foo = dfss,
+ predictors = c("pibpc",
+ "gini",
+ "pop",
+ "ocuppc",
+ "pp1624",
+ "salmed"),
+ predictors.op = "mean",
+ time.predictors.prior = 2012:2018,
+ special.predictors =
+ list(list("hompm", 2012:2018, "mean"),
+ list("osppc", 2012:2018, "mean"),
+ list("gini", 2012:2018, "mean"),
+ list("popppc", 2012:2018, "mean"),
+ list("pp1624", 2012:2018, "mean")),
+ dependent = "hompm",
+ unit.variable = "id",
+ unit.names.variable = 23,
+ time.variable = "ano",
+ treatment.identifier = "trat",
+ controls.identifier = c(1:22,24:27),
+ time.optimize.ssr = 2012:2018,
+ time.plot = 2012:2021)
Error in `[.data.frame`(foo, , unit.names.variable) :
undefined columns selected
Here is a sample of my dataset dfss:
> dput(head(dfss))
structure(list(id = c(1, 1, 1, 1, 1, 1),
uf = c("AC", "AC", "AC", "AC", "AC", "AC"), ano = c(2012, 2013, 2014, 2015, 2016, 2017),
pop = c(758786, 776463, 790101, 803513, 816687, 829619),
trat = c(0, 0, 0, 0, 0, 0),
osppc = c(416.344970703125, 494.075622558594, 521.079528808594, 596.987121582031, 608.531005859375, 511.153869628906),
gini = c(0.569999992847443, 0.550000011920929, 0.529999971389771, 0.550000011920929, 0.560000002384186, 0.550000011920929),
salmed = c(1200, 1221, 1305, 1455, 1474, 1446),
ocuppc = c(0.000390096800401807, 0.000383791630156338, 0.000388557906262577, 0.00038953943294473, 0.00035999104147777, 0.000356790289515629),
pibpc = c(13.1789464950562, 14.166805267334, 16.453592300415, 17.4234886169434, 17.1424312591553, 16.8752155303955),
hompm = c(27.4122085571289, 30.1366577148438, 29.3633346557617,
27.0064086914062, 44.4478721618652, 62.1972236633301),
pp1624 = c(15.3000001907349, 14.3000001907349, 14.6000003814697, 15.1000003814697, 13.3000001907349, 12.8000001907349)),
row.names = c(NA, 6L),
class = "data.frame")
Could any of you help me, please? What is this error about?
I'm struggling with it for my master's thesis! Thanks in advance