Cleaning Data for Logistic Regression

Hi joels,

That helps immensely, thank you!

I'm unfamiliar with most of the functions and notation that you've used. Thank you for your helpful comments in your code!

To answer your earlier question about my actual data, yes: all of the original columns are read in as factors.

I do have a few questions concerning what you've done:

  1. How did you learn how to use all of those functions?!? That's amazing! I'm completely baffled! You're fantastic!

  2. Let's say that indep_var3 and indep_4 aren't so conveniently named. Let's pretend they're named "Factor A" and "Effect B". How would you incorporate that into your response?

  3. This question is a longer one, please forgive me:
    When I run the first step (as indicated by the first # sign, after the first %>%, in your response) , my R Console informs me that, "NAs introduced by coercion". I can't tell if each step introduces NAs or not, but in any case, there are NAs in the cleaned data frame. Having NAs may not be the problem, but I suspect that it might be important. The problem is that, when I run the glm function now, there are over a hundred "observations deleted due to 'missingness'", according to the glm output. What does R mean by: "deletion" (which I'm interpreting as "exclusion from the logistic regression"), "observations" (which I'm interpreting as "rows"), and "missingness" (which I'm interpreting as NAs)? Am I interpreting those correctly?

  4. Another question that comes to mind is, "Is each row that has an NA excluded from the analysis, or are the individual entries that have NA themselves not included in the logistic regression?"

  5. My data set only has ~3000 rows, so ignoring ~100 rows may affect the accuracy of the logistic regression, and of the p-values that I care about. Supposing that imputation is out of the question, is there anything I can do to do resolve this problem?

  6. Let's say that I discover from those who performed the data entry that each NA should be considered as "N". How would you handle that? (I don't know if this is the case, it's just hypothetical)

  7. Is it alright that I'm asking so many questions? I don't want to overstep my bounds. I'm just impressed by how effective your suggestions have been so far, and I really want to learn how to solve this problem.

Thank you so much for your help, patience, and proficiency! I really appreciate you taking the time to teach me how to tackle this problem!

You're awesome!