Hi there, just to start off, I am new to R and essentially trying to learn how to use R by jumping in with a project. I will try to simplify the idea of my project (this is not my actual project – just used for simplicity). I have almost 400,000 individual people and asked them if they drink tea or not. In addition to determining if they drink tea, I recorded several other demographics such as what country they are in, gender, ethnicity, and several other characteristics. All of the 400,000 responses are binned and placed into a summary table. I would like to figure out which demographics of these individuals is associated with tea drinking. An example of some data is below. How would I go about doing a regression or goodness-of-fit model based on the summary table? Thanks in advance for any help.
Is it possible to use glm(Location ~., family=binomial(link = "logit"),data=Location)) for example on this binned data? If I try to use the function as written here, I get an error "Error in eval(family$initialize) : y values must be 0 <= y <= 1"
Thanks for the reply and helping out. Now let's say if I wanted to compare multiple different characteristics (Location, Gender, Race, etc.) to find out which characteristics are mostly associated with Tea use, would you still deal only with the proportions of Yes/No?
Thank you. I appreciate your comments.
The more I think about it, I am thinking just doing proportions is probably more realistic since I have the entire population for my data set.
I was initially thinking of it in terms of zero or one though - if a person in Australia drinks tea, that would be 1 and if they did not, would be 0. Then all of that was made in to the total number of 1's and 0's which is how the table was made.
It would have been possible to use the individual data but the data needed a lot of work before it was usable. So the data was aggregated as it was obtained. Each individual had a unique identifier, then the unique identifier had to be use to figure out the location, gender, race, etc. A separate identifier tied to the individual then had to be used to find out if they used tea.