Hello everyone,
I'm really getting frantic about this problem:
Using the popular package "irr" by Matthias Gamer, I currently compute Intraclass Correlations to assess the retest-reliability of EEG-based parameters. However, I receive unrealistic p-values. For example, a moderate ICC of .675 in a large sample of N=583 comes with a non-significant p-value of .102, which seems highly unlikely given the moderate effect and the large sample (I caculate thousands of ICCs atm and usually obtain p-values <.001 for ICCs > .200). To be sure, I reproduced the exact same ICC using a different software, and obtained the expected p-value <.001 here. However, I would like to use the irr-package for an automated, code-based analysis. Could there be a bug in the irr-package (or do I miss any crucial information here?
This is the code I'm using:
icc(Example_dataframe, model = "twoway", type = "agreement", unit = "average")
This is the full output I receive:
Average Score Intraclass Correlation
Model: twoway
Type : agreement
Subjects = 583
Raters = 2
ICC(A,2) = 0.675
F-Test, H0: r0 = 0 ; H1: r0 > 0
F(582,2) = 9.27 , p = 0.102
95%-Confidence Interval for ICC Population Values:
-0.237 < ICC < 0.893
Any help is highly appreciated!!!
Kind regards,
Tobias