Need help in getting confidence interval through srvyr package

jrkrideau · December 12, 2023, 1:59am

The LIHTC_svy[[??]] codes don't go anywhere. I was just showing the various list items in LIHTC_svy. It's structure is much more complicated than a normal data.frame or tibble.

xx <- LIHTC_svy[[7]]

is just extracting a tibble from LIHTC_svy. I have not checked to see if xx$OverLIHTC is the same as LIHTC_svy$OverLIHTC. If it is not then the xx tibble may be what you need. I'll try to get back to the IPUMS site tomorrow and see if it suggests anything. Heck, I'll even have another try at {survey} & {srvyr} documentation.

jrkrideau · December 12, 2023, 4:55pm

Ah, I. finally see the first problem with the second equation. I don't know how I missed it!

You have

LIHTC_svy   <-   filter(OverLIHTC == 1) %>%

You need

LIHTC_svy  %>%  filter(OverLIHTC == 1) %>%

I am still running into a problem but it may be that in the truncated sample we have here. hhRaw is all zeros.

 hhRaw$OverLIHTC
 [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

`

cbachx · December 12, 2023, 10:43pm

In the entire dataset there are 106 1s and 277 0s; as you saw the first 60 rows were all 0s. Should I do a sort to mix up 0s and 1s in the first rows?

cbachx · December 12, 2023, 11:02pm

I changed the line you advised and ran into a different error... the program suggested running rlang_lasttrace:: I did and got the output below.

Run rlang::last_trace() to see where the error occurred.

rlang::last_trace()
<error/dplyr:::mutate_error>
Error in mutate():
In argument: LIHTC_Percent = LIHTC/survey_total() * 100.
Caused by error in cur_svy():
! Survey context not set

Backtrace:
▆

├─... %>% select(LIHTC_Percent, LIHTC)
├─dplyr::select(., LIHTC_Percent, LIHTC)
├─dplyr::mutate(., LIHTC_Percent = LIHTC/survey_total() * 100)
├─dplyr:::mutate.data.frame(., LIHTC_Percent = LIHTC/survey_total() * 100)
│ └─dplyr:::mutate_cols(.data, dplyr_quosures(...), by)
│ ├─base::withCallingHandlers(...)
│ └─dplyr:::mutate_col(dots[[i]], data, mask, new_columns)
│ └─mask$eval_all_mutate(quo)
│ └─dplyr (local) eval()
└─srvyr::survey_total()
└─srvyr::cur_svy()

jrkrideau · December 13, 2023, 12:23am

Ah, I think I am getting the exact same error. So it may not be a problem with a truncated dataset.

No, if necessary you could post the original .tar.gz file somewhere like mediafire or Dropbox and give us a link to it or for me personally just email it to jrkrideau ata gmail {full stop} com.

BTW why do you have the data in an Excel file? If I understand the IPUMS downloads a straightforward coma.delimited ASCII file?

jrkrideau · December 14, 2023, 1:54am

Hi Chip,

Did you adapt the code below from a an example? If so, can you point us to it?

Thanks.

IHTC_svy  %>%  filter(OverLIHTC == 1) %>%
summarise(LIHTC=survey_total(vartype="ci")) %>%
mutate(LIHTC_Percent = LIHTC / survey_total()*100) %>%
select(LIHTC_Percent, LIHTC)

cbachx · December 14, 2023, 3:32am

I think this is what I got from Bing (seeking an R translation of Stata code sent by IPUMS):

Certainly! Here is an example of calculating margins of error for PUMS estimates using R. You can use the R packages srvyr or survey to use replicate weights to show uncertainty (as used in the PUMS files, these are all the variables with PWGTP [#] in the person file). Here is an example using srvyr:RAI-generated code. Review and use carefully. More info on FAQ.

library(srvyr)
library(tidyverse)

householdsRaw <- read.csv("ACS_PUMS_Household_File.csv")

households_svy <- householdsRaw %>% as_survey_rep(weight =HINCP, repweights =matches("HINCP [0-9]+"), scale =4/80, rscales =rep(1,80), mse =TRUE, type ="JK1", variables = c(BLD, BDSP, RNTP))

households_svy %>% filter(BLD ==1 & BDSP >=2 & RNTP >0) %>% summarize(Renters =survey_total(vartype="ci")) %>% mutate(Renters_Percent = Renters / survey_total()*100) %>% select(Renters_Percent, Renters)

This code reads in the ACS PUMS household file and creates a survey object with replicate weights. It then filters the data to only include households with one building, two or more bedrooms, and rent paid. Finally, it calculates the total number of renters and reports the margin of error as a confidence interval.

jrkrideau · December 14, 2023, 3:40am

As I plod along , if we just do

S1  <- LIHTC_svy  %>%  filter(OverLIHTC == 1) %>%
summarise(LIHTC=survey_total(vartype="ci")) 
S1

we get what looks suspiciously like what may be upper and lower bounds on a confidence interval. I have no idea what survey_total() is or what it is supposed to do. It does not look like a legitimate command.

cbachx · December 14, 2023, 4:21am

I went back to Bing/Co-Pilot and this is what came up (I had to delete a lot of included links.)

The survey_total() function in R is used to calculate the total and its variation using survey methods.

[It is a wrapper around svytotal() and should always be called from summarise() ]

The survey_total() function takes a variable or expression as input and calculates the total from complex survey data.

[It can report variability as one or more of: standard error ("se", default), confidence interval ("ci"), variance ("var") or coefficient of variation ("cv") ]

[The na.rm argument is used to indicate whether missing values should be dropped ]

[The level argument is used to specify the confidence level, which can be a single number or a vector of numbers ]

[The deff argument is used to indicate whether the design effect should be returned ]

[The df argument is used to specify the degrees of freedom for t-distribution when vartype is set to "ci" ]

Here is an example of how to use survey_total():

library(survey)
library(srvyr)

# Set survey design
design <- svydesign(ids = ~1, weights = ~hhwt, data = data.frame())

# Calculate total of enroll variable
data %>% summarise(enroll_tot = survey_total(enroll))

This code calculates the total of the enroll variable using the survey_total() function. The summarise() function is used to summarize the data by calculating the total of the enroll variable [1]

jrkrideau · December 14, 2023, 8:03pm

Now that looks like an R statement. I'm tied up for most of the day but I may have something by evening or early tomorrow (EST).

jrkrideau · December 14, 2023, 8:42pm

Oh wait, Bing is an AI?

cbachx · December 14, 2023, 10:22pm

I think Bing was the original AI in Microsoft Edge... now it looks like it is Copilot. These AI tools just sort of appear through regular updates of Windows 11.

jrkrideau · December 15, 2023, 3:50pm

Let me repeat: Excel is evil. Excel with AI is demonic

I'll try to be a bit more rational about the problem after breakfast.

StatSteph · December 19, 2023, 5:15pm

In words, can you describe what you're trying to do here?

LIHTC_svy %>% filter(OverLIHTC == 1) %>%
summarise(LIHTC=survey_total(vartype="ci")) %>%
mutate(LIHTC_Percent = LIHTC / survey_total()*100) %>%
select(LIHTC_Percent, LIHTC)

That second usage of survey_total() within mutate() is not going to work on a data frame which is what comes out of the prior line (summarise(...).)

cbachx · December 19, 2023, 7:20pm

Thank you so much for your question. What I'm trying to get from this code is variance or margin of error for the data in the column OverLIHTC. (I'm more used to Excel concepts so I'll be using them in my description). OverLIHTC consists of 0s and 1s. The 1s denote rental households paying rent in excess of program (LIHTC) limits and the percentage of rows with 1s in OverLIHTC is being calculated. The relevant columns in the entire data table are a unique Census survey number, the OverLIHTC column, and 80 columns of replicate weights from Census/IPUMS that I believe are used by a function in the package ipumsr for determining the variance.

cbachx · January 6, 2024, 3:25am

I've used the Bing Chat to help with this code, below is the resulting code... this provides an ME of 2.33. Are there errors in this coding approach?

library(srvyr)
library(readxl)

data <-read_excel("D:/mhp3/qualnvhhs2.xlsx")

wgt <- "HHWT"
var <- "OverLIHTC"

Create a survey object

survey_data <- as_survey(data,weights =wgt)

Calculate the estimate

est <- srvyr::summarize(survey_data,
total = srvyr::survey_total(wgt))

Calculate the standard error for the filtered HHWT

SE_filtered <-
srvyr::summarize(survey_data %>%
dplyr::filter(!!dplyr::sym(var)
==1), SE =
srvyr::survey_se(wgt))

Calculate the margin of error for the fraction of the filtered HHWT divided by the unfiltered HHWT

ME <- 1.645 * SE_filtered$SE * (nrow(data[data$OverLIHTC ==1,])/nrow(data))^0.5

system · February 17, 2024, 3:25am

This topic was automatically closed 42 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.