Can someone please tell me when I need to use Yate continuity correction (using "correct = TRUE" instead of "correct = FALSE") when conducting chi-squared analysis (Pls see an example of my code below)?
The correction can be appropriate if you have a small number of counts in one or more of the cells of the table. I have seen 5 mentioned as a threshold. How many counts do you have?
The chi-square is a continuous distribution while the underlying binomial(s) is(are) discrete. With large enough cell sizes, this tends not to matter very much. But with small cell sizes, it can matter quite a bit because the chi-square's approximation improves as cell sizes increase. It is for this reason that prop.test and chisq.test include the correction as an option.
@rwalker
Thank you so much! So, if I have a small number of counts in one or more of the cells of the table (e.g. <5), why not just use Fisher's exact test? What's the difference between Fisher's exact vs Chi-squared with correction?
Fisher's exact test obviates the need for even considering the correction. Same for tests associated with Barnard and Boschloo. In short, one of the three aforementioned exact tests is likely better than the chi-square test.
So in summary, if I have <5 in one of the cells of the table, I should use Fisher's exact and if not, I should use the chi-squared test (with correction), is this right?
In other words, we should avoid using the chi-squared test without correction altogether?
Fisher's exact test was designed for small cell problems; once the table is sufficiently dense, then a chi-square test is usually performed because the approximation is far better. 5 is rather arbitrary, but the intuition is correct. The chi-square tests without correction is a perfectly valid approximation as the sample sizes grow large and the large samples obviate the need for the correction. I do not think we should avoid using the chi-squared test without correction altogether is correct.
when some cell counts are low, typically understood to mean “below 10” or “below 5”. The Yates’
Correction, therefore, is used when conducting a Pearson’s Chi-squared test on
2 Ă— 2 contingency tables and prevents overestimation of statistical significance;
So in my opinion, and reading the article, you will have to write: