I´m trying to use the srvyr
package in R. The data is extracted from this link:
Since this is a household survey, I use merge for this two datasets.
Ingresos <- read_excel("Sociodemograficas_e_Ingresos.xlsx", sheet ="Base")
gastos <- read_excel("Gasto_consumo_final_mensual.xlsx")
The expansion factor is supossed to give the population estimate which for the Dominican Republic = +10,000,000.
If I sum the FACTOR_EXPANSION
varibles it´s exactly the amount needed. However, when I create my survey object, I don´t get the population estimates.
Ingresos_Filtrado <- Ingresos %>%
select (A204,A303,A302, A402, A404, A405, A410,GRUPO_RAMA, GRUPO_OCUPACION,
GRUPO_CATEGORIA,GRUPO_EDAD, GRUPO_EDUCACION, GRUPO_SECTOR, GRUPO_EMPLEO, ESCOLARIDAD,
SALARIO_PRINCIPAL, A201, A202A, A202B, A202C, A202D, A207, A208, A212, A221, GRUPO_REGION,
DES_PROVINCIA,DES_MUNICIPIO, A206, A213, A218, A219, A224,A303, A309, CALLES_ASFALTADAS,ESTRATO ,
ALUMBRADO_PUBLICO,FACTOR_EXPANSION, VIVIENDA, HOGAR, MIEMBRO, UPM,PET, PEA, QUINTIL, TRIMESTRE, REPLICA, ORDEN_REGION,A102,A401,A401A)
Union_Ingresos_Gastos <- merge(x=Ingresos_Filtrado,y=gastos,by=c("TRIMESTRE", "REPLICA", "UPM", "FACTOR_EXPANSION", "VIVIENDA", "HOGAR", "ORDEN_REGION", "QUINTIL"),all.x=TRUE)
survey_2 <- Union_Ingresos_Gastos %>%
as_survey_design(ids=UPM,strata=ESTRATO,weigths=FACTOR_EXPANSION,nest=TRUE)
survey_2 %>% group_by(QUINTIL) %>%
summarise(total = survey_total(A401A,level=0.95,na.rm=TRUE)) %>% mutate(Total = sum(total))
Result is:
# A tibble: 5 × 4
QUINTIL total total_se Total
<dbl> <dbl> <dbl> <dbl>
1 1 2213294 82873. 8513968
2 2 2127726 80353. 8513968
3 3 1765902 65479. 8513968
4 4 1483914 71456. 8513968
5 5 923132 54371. 8513968
With this formula, the population estimate is 8,513,968
I need help, because when I use formulas without the survey object, I get more precise results.
sum(Ingresos$FACTOR_EXPANSION)
[1] 10,299,551
Is the problem merging the two datasets?
Perhaps I need another argument for the as_survey_design
Help!