The LIHTC_svy[[??]] codes don't go anywhere. I was just showing the various list items in LIHTC_svy. It's structure is much more complicated than a normal data.frame or tibble.

xx <- LIHTC_svy[[7]]

is just extracting a tibble from LIHTC_svy. I have not checked to see if xx$OverLIHTC is the same as LIHTC_svy$OverLIHTC. If it is not then the xx tibble may be what you need. I'll try to get back to the IPUMS site tomorrow and see if it suggests anything. Heck, I'll even have another try at {survey} & {srvyr} documentation.

I changed the line you advised and ran into a different error... the program suggested running rlang_lasttrace:: I did and got the output below.

Run rlang::last_trace() to see where the error occurred.

rlang::last_trace()
<error/dplyr:::mutate_error>
Error in mutate():
In argument: LIHTC_Percent = LIHTC/survey_total() * 100.
Caused by error in cur_svy():
! Survey context not set

Ah, I think I am getting the exact same error. So it may not be a problem with a truncated dataset.

No, if necessary you could post the original .tar.gz file somewhere like mediafire or Dropbox and give us a link to it or for me personally just email it to jrkrideau ata gmail {full stop} com.

BTW why do you have the data in an Excel file? If I understand the IPUMS downloads a straightforward coma.delimited ASCII file?

I think this is what I got from Bing (seeking an R translation of Stata code sent by IPUMS):

Certainly! Here is an example of calculating margins of error for PUMS estimates using R. You can use the R packages srvyr or survey to use replicate weights to show uncertainty (as used in the PUMS files, these are all the variables with PWGTP [#] in the person file). Here is an example using srvyr:RAI-generated code. Review and use carefully. More info on FAQ.

This code reads in the ACS PUMS household file and creates a survey object with replicate weights. It then filters the data to only include households with one building, two or more bedrooms, and rent paid. Finally, it calculates the total number of renters and reports the margin of error as a confidence interval.

we get what looks suspiciously like what may be upper and lower bounds on a confidence interval. I have no idea what survey_total() is or what it is supposed to do. It does not look like a legitimate command.

I went back to Bing/Co-Pilot and this is what came up (I had to delete a lot of included links.)

The survey_total() function in R is used to calculate the total and its variation using survey methods.

[It is a wrapper around svytotal() and should always be called from summarise() ]

The survey_total() function takes a variable or expression as input and calculates the total from complex survey data.

[It can report variability as one or more of: standard error ("se", default), confidence interval ("ci"), variance ("var") or coefficient of variation ("cv") ]

[The na.rm argument is used to indicate whether missing values should be dropped ]

[The level argument is used to specify the confidence level, which can be a single number or a vector of numbers ]

[The deff argument is used to indicate whether the design effect should be returned ]

[The df argument is used to specify the degrees of freedom for t-distribution when vartype is set to "ci" ]

Here is an example of how to use survey_total():

library(survey)
library(srvyr)
# Set survey design
design <- svydesign(ids = ~1, weights = ~hhwt, data = data.frame())
# Calculate total of enroll variable
data %>% summarise(enroll_tot = survey_total(enroll))

I think Bing was the original AI in Microsoft Edge... now it looks like it is Copilot. These AI tools just sort of appear through regular updates of Windows 11.

Thank you so much for your question. What I'm trying to get from this code is variance or margin of error for the data in the column OverLIHTC. (I'm more used to Excel concepts so I'll be using them in my description). OverLIHTC consists of 0s and 1s. The 1s denote rental households paying rent in excess of program (LIHTC) limits and the percentage of rows with 1s in OverLIHTC is being calculated. The relevant columns in the entire data table are a unique Census survey number, the OverLIHTC column, and 80 columns of replicate weights from Census/IPUMS that I believe are used by a function in the package ipumsr for determining the variance.