Apparently, I didn't pay enough attention in categorical analysis class. I'm headed for calculating odds, odds ratios, and eventually (but not here) the Cochran-Mantel-Haenszel Test. My problem is that the odds()
function does not work as expected.
Some definitions:
Odds is a way of expressing probabilities, with \text{odds} = \frac{\pi}{1-\pi}. That means p=\frac{1}{2} gives odds of 1, i.e. odds are "even", "50-50".
An odds ratio \theta is exactly what it sounds like: the ratio of two odds. It measures the odds that an outcome will occur under a condition compared to the odds of the outcome occurring without the condition. For example, some joint injuries as young adults increase the chances of developing arthritis as an older adult. The odds ratio for this situation would be written
So I'm using the Whickham data set, which looked at long-term survival based on smoking. Researchers recorded a person's age and yes/no smoke levels. Twenty years later, they found the same people and (after doing proper epidemiological adjustments, etc.) recorded whether they were alive or dead. It's in the mosaicData
package. I'm going to create a factor variable that groups the ages 18–34, 35–64, and 64+, and store it in WhickTable
.
library(mosaicData) ## Use mosaicData::Whickham to avoid loading packages
Whickham$AgeRange <- cut(Whickham$age, c(18, 35, 65, Inf), include.lowest = TRUE, labels = c("18 – 34", "35 – 64", "65+"))
WhickTable <- xtabs( ~ smoker + outcome + AgeRange, data=Whickham)
The 18–34 group in WhickTable
is
, , AgeRange = 18 – 34
outcome
smoker Alive Dead
No 236 7
Yes 182 5
To compute the odds for being Alive in the 18–34 group, calculate \frac{\text{Number present}}{\text{Number absent}} in the group:
The odds of being Alive + No Smoke is \frac{36}{7} = 33.7143.
The odds of being Alive + Smoke is \frac{182}{5} = 36.4. (I know this seems counterintuitive, but this is also a great Simpson's Paradox example).
Therefore the odds ratio is
and for the epidemiologists our there, the log odds ratio:
...and we could do that for each of the three groups. Note that the odds are in the 30s and the odds ratio is 0.926217.
Tons of good categorical stuff is in Michael Friendly's vcd
package, including a function to compute the stuff above
library(vcd)
oddsratio(WhickTable, log=FALSE)
# odds ratios for smoker and outcome by AgeRange
#
# 18 – 34 35 – 64 65+
#0.9262166 1.2647636 1.1559140
Huzzah! We calculated the odds ratio correctly for the 18–34 group!
According to the vcd::lodds()
documentation, I should be able to calculate the odds (33.7143 and 36.4) using the odds()
command.
> odds(WhickTable, log=FALSE)
# odds for smoker by outcome, AgeRange
#
# AgeRange
#outcome 18 – 34 35 – 64 65+
# Alive 1.296703 0.9450980 4.166667
# Dead 1.400000 0.7472527 3.604651
Nope. maybe I should try the lodds()
command, intended for calculating log odds, but set log = FALSE
lodds(WhickTable, log=FALSE)
# odds for smoker by outcome, AgeRange
#
# AgeRange
#outcome 18 – 34 35 – 64 65+
# Alive 1.296703 0.9450980 4.166667
# Dead 1.400000 0.7472527 3.604651
Looks like odds()
and lodds()
do the same thing.
Any idea why odds()
isn't giving me the non-logged odds? I'll bet something's staring me in the face, right there in the docs, but I'm not seeing it.