How do I bootstrap with logistic regression

acarol84 · June 27, 2023, 9:13pm

I am trying the function code but I am new to R and I am not sure what the "i" argument means; or in other words, what does "index vector of the observations in your dataset to use"

Here is the code that I have for the function:

func <- function(d, i){
d2 <- d[i,]

Also, should "d" be the database name that I am using?
I assume the "i" argument is the variable that want to bootstrap?

Wanna see the odds of frequently "eating out" according to geographic locations?
"Eating out" has three categories: Never (n=641), Weekly (n=308), Daily (n=24)
"Geographic location" is a dummy variable: 1: Small center (n=51); 2: Medium center (n=257)
OR 1: Small center (n=51); 2: Large center (n= 638)

Can someone help me with the code URGENTLY!!!!! Self-explain what each argument means, i.e., HOW TO FILL THE BLANKS!

Thank!!

technocrat · June 28, 2023, 9:06am

This line assigns to an object to be named d2 a subset of object d (a data frame or matrix or some other object with rows and columns so that there are two dimensions) that consists of the ith row of d and all columns.

But that's a how detail and it's only useful to know after understanding what.

Let's relate d to the textual description. Because d is being subset, we know thatit is data frame or matrix (the main difference is that a data frame can have both numeric and character variables but a matrix must have all one or the other). The text describes something with two objects (think of everything in R as an object) termed Eating out and Geographic location. The latter is identified as a variable, meaning a column, and the former isn't, but won't fit unless it is. n= tells us this is count data and we'll see this is in the realm of discrete data, which has important differences from continuous data that can, in theory, take on an infinite number of different values between integers. But you can't have half a Never.

Usually, this is as far as I'd take it without a a reprex (see the FAQ), but it's a slow night.

d <- data.frame(
  dine = sample(as.factor(c(rep("never",641),rep("weekly",308),rep("daily",24))),973),
  geo = c(rep(1,51),rep(0,922))
)

tab <- table(d)
attributes(tab)$dimnames$geo <- c("sm","lg")
prop.table(tab) 
#>         geo
#> dine              sm          lg
#>   daily  0.023638232 0.001027749
#>   never  0.626927030 0.031860226
#>   weekly 0.297019527 0.019527235

^{Created on 2023-06-28 with reprex v2.0.2}

Come back with a new question including a a reprex (see the FAQ) for help when you get to the regression part. Also, see the homework FAQ, if applicable.

system · July 19, 2023, 9:07am

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.