sryvyr package creating complex survey objects in R

almr20 · July 14, 2023, 12:21pm

I´m trying to use the srvyr package in R. The data is extracted from this link:

Expenses: https://cdn.bancentral.gov.do/documents/estadisticas/encuesta-de-gastos-e-ingresos/documents/Cuadros_Gastos.xlsx?v=1689283267553

Income: https://cdn.bancentral.gov.do/documents/estadisticas/encuesta-de-gastos-e-ingresos/documents/Cuadros_Ingresos.xlsx?v=1689283267553

Since this is a household survey, I use merge for this two datasets.

Ingresos <- read_excel("Sociodemograficas_e_Ingresos.xlsx", sheet ="Base")

gastos <- read_excel("Gasto_consumo_final_mensual.xlsx")

The expansion factor is supossed to give the population estimate which for the Dominican Republic = +10,000,000.

If I sum the FACTOR_EXPANSION varibles it´s exactly the amount needed. However, when I create my survey object, I don´t get the population estimates.

Ingresos_Filtrado <- Ingresos %>% 
      select (A204,A303,A302, A402, A404, A405, A410,GRUPO_RAMA, GRUPO_OCUPACION, 
GRUPO_CATEGORIA,GRUPO_EDAD, GRUPO_EDUCACION, GRUPO_SECTOR, GRUPO_EMPLEO, ESCOLARIDAD, 
SALARIO_PRINCIPAL, A201, A202A, A202B, A202C, A202D, A207, A208,  A212, A221, GRUPO_REGION, 
DES_PROVINCIA,DES_MUNICIPIO, A206, A213, A218, A219, A224,A303, A309, CALLES_ASFALTADAS,ESTRATO ,
ALUMBRADO_PUBLICO,FACTOR_EXPANSION, VIVIENDA, HOGAR, MIEMBRO, UPM,PET, PEA, QUINTIL, TRIMESTRE, REPLICA, ORDEN_REGION,A102,A401,A401A)


Union_Ingresos_Gastos <- merge(x=Ingresos_Filtrado,y=gastos,by=c("TRIMESTRE", "REPLICA", "UPM", "FACTOR_EXPANSION", "VIVIENDA", "HOGAR", "ORDEN_REGION", "QUINTIL"),all.x=TRUE)

survey_2 <- Union_Ingresos_Gastos %>% 
		as_survey_design(ids=UPM,strata=ESTRATO,weigths=FACTOR_EXPANSION,nest=TRUE)

survey_2  %>% group_by(QUINTIL) %>%
 	summarise(total = survey_total(A401A,level=0.95,na.rm=TRUE)) %>% mutate(Total = sum(total))

Result is:

# A tibble: 5 × 4
  QUINTIL   total total_se   Total
    <dbl>   <dbl>    <dbl>   <dbl>
1       1 2213294   82873. 8513968
2       2 2127726   80353. 8513968
3       3 1765902   65479. 8513968
4       4 1483914   71456. 8513968
5       5  923132   54371. 8513968

With this formula, the population estimate is 8,513,968

I need help, because when I use formulas without the survey object, I get more precise results.

sum(Ingresos$FACTOR_EXPANSION)
[1] 10,299,551

Is the problem merging the two datasets?

Perhaps I need another argument for the as_survey_design

Help!

almr20 · July 14, 2023, 2:08pm

Doing some research I believe the issue is with fpc.

However I don´t know how to calculate it, considering most household surveys would have this variable

system · September 6, 2023, 6:09pm

This topic was automatically closed 54 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.