Newbie trying ICC in RStudio

RichD · February 2, 2024, 9:03am

Dear All

Apologies if this might be too trivial, but I am new here and this is my first post/ try for getting guidance and help. Unfortunately, can not upload excel sheet (does not allow here) nor its pdf version (beginners can't do that), so I try to explain the intended task.

The data consist of rater', text', question', score' and `overall-score', where 12 raters are scoring 19 texts for 6 questions each text, plus one final overall score for each text. The scores are between 0-10 to 2 decimal places, so perhaps z-transform would be useful to apply here before any further steps?

I am trying to evaluate ICC for the `overall-score' compare it with ICC's of each of the 6 questions.

I am trying to learn RStudio and I would be grateful for any suggestions, comments and hopefully examples of code which would get me started in this problem.

Kind regards, Richard.

jrkrideau · February 2, 2024, 1:32pm

Hi and welcome to your first question. It does not sound at all trivial.

We probably do need to see your code and and some sample data. Generally we need to see the nitty-gritty of both code and data. R can be very picky about things.

Assuming that you have successfully read in the data from the Excel file, You can supply some sample data with the dput() function. In the case of a large dataset something like dput(head(mydata, 100)) should supply the data we need.

Just do dput(mydata) where mydata is your data. Copy the output and paste it here between
```

```

Code should also be pasted between
```

```
to maintain formatting.

Here is a link to some general guidelines :
FAQ Asking Questions

RichD · February 2, 2024, 3:19pm

Dear @jrkrideau

Thank you very much for your response and time to guide me. I think I have loaded the data, although in the dput(scores) output appears some letters L which is not part of the data.. (see below please). The columns in my excel file are in order : rater, text, question, scores, z_scores, overall_scorescale. Following your guide I get:

Just do dput(mydata) where mydata is your data. Copy the output and paste it here between

> dput(head(scores, 100))
structure(list(rater = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), levels = c("1", 
"2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12"), class = "factor"), 
    text = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 
    2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 5L, 
    5L, 5L, 5L, 5L, 5L, 6L, 6L, 6L, 6L, 6L, 6L, 7L, 7L, 7L, 7L, 
    7L, 7L, 8L, 8L, 8L, 8L, 8L, 8L, 9L, 9L, 9L, 9L, 9L, 9L, 10L, 
    10L, 10L, 10L, 10L, 10L, 11L, 11L, 11L, 11L, 11L, 11L, 12L, 
    12L, 12L, 12L, 12L, 12L, 13L, 13L, 13L, 13L, 13L, 13L, 14L, 
    14L, 14L, 14L, 14L, 14L, 15L, 15L, 15L, 15L, 15L, 15L, 16L, 
    16L, 16L, 16L, 16L, 16L, 17L, 17L, 17L, 17L), levels = c("1", 
    "2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", 
    "13", "14", "15", "16", "17", "18", "19"), class = "factor"), 
    question = structure(c(1L, 2L, 3L, 4L, 5L, 6L, 1L, 2L, 3L, 
    4L, 5L, 6L, 1L, 2L, 3L, 4L, 5L, 6L, 1L, 2L, 3L, 4L, 5L, 6L, 
    1L, 2L, 3L, 4L, 5L, 6L, 1L, 2L, 3L, 4L, 5L, 6L, 1L, 2L, 3L, 
    4L, 5L, 6L, 1L, 2L, 3L, 4L, 5L, 6L, 1L, 2L, 3L, 4L, 5L, 6L, 
    1L, 2L, 3L, 4L, 5L, 6L, 1L, 2L, 3L, 4L, 5L, 6L, 1L, 2L, 3L, 
    4L, 5L, 6L, 1L, 2L, 3L, 4L, 5L, 6L, 1L, 2L, 3L, 4L, 5L, 6L, 
    1L, 2L, 3L, 4L, 5L, 6L, 1L, 2L, 3L, 4L, 5L, 6L, 1L, 2L, 3L, 
    4L), levels = c("1", "2", "3", "4", "5", "6"), class = "factor"), 
    scores = c(8.4, 2.3, 8.4, 8.4, 2.6, 8.4, 8.3, 8.3, 8.3, 8.3, 
    8.3, 8.3, 5.5, 1.8, 5.4, 5.4, 1.2, 5.5, 5.1, 1.1, 3.4, 5.1, 
    0.9, 3.5, 4.6, 1.3, 5.6, 4.5, 0.7, 5.6, 7, 6.2, 6.2, 7.2, 
    6.2, 6.2, 8, 8, 8, 8.1, 8.1, 8.2, 7.8, 7.8, 7.9, 7.7, 7.7, 
    7.7, 7.6, 7.6, 5.2, 7.8, 7.8, 5.2, 7, 5.6, 6.7, 6.7, 5.5, 
    6.4, 7.2, 2.6, 7.3, 7.4, 2.2, 7.5, 7.9, 2.7, 5.9, 8, 2.6, 
    6, 4.9, 2.9, 4.9, 4.9, 2.8, 5, 5.8, 2.6, 5.9, 5.9, 2.5, 5.9, 
    8.8, 8.8, 8.8, 8.8, 8.8, 8.9, 5.9, 3.2, 6.4, 6, 3.1, 6, 8.2, 
    7.4, 8.3, 8.6), z_score = c(0.963444776230018, -1.81309686235413, 
    0.963444776230018, 0.963444776230018, -1.67654563422704, 
    0.963444776230018, 0.917927700187655, 0.917927700187655, 
    0.917927700187655, 0.917927700187655, 0.917927700187655, 
    0.917927700187655, -0.356550428998511, -2.04068224256594, 
    -0.402067505040874, -0.402067505040874, -2.31378469882012, 
    -0.356550428998511, -0.538618733167963, -2.35930177486249, 
    -1.31240902588813, -0.538618733167963, -2.45033592694721, 
    -1.26689194984577, -0.766204113379778, -2.26826762277776, 
    -0.311033352956148, -0.811721189422141, -2.54137007903194, 
    -0.311033352956148, 0.326205711636935, -0.0379308967019693, 
    -0.0379308967019693, 0.417239863721661, -0.0379308967019693, 
    -0.0379308967019693, 0.781376472060566, 0.781376472060566, 
    0.781376472060566, 0.826893548102929, 0.826893548102929, 
    0.872410624145292, 0.69034231997584, 0.69034231997584, 0.735859396018203, 
    0.644825243933477, 0.644825243933477, 0.644825243933477, 
    0.599308167891113, 0.599308167891113, -0.4931016571256, 0.69034231997584, 
    0.69034231997584, -0.4931016571256, 0.326205711636935, -0.311033352956148, 
    0.189654483509846, 0.189654483509846, -0.356550428998511, 
    0.0531032553827568, 0.417239863721661, -1.67654563422704, 
    0.462756939764024, 0.508274015806387, -1.85861393839649, 
    0.55379109184875, 0.735859396018203, -1.63102855818468, -0.174482124829058, 
    0.781376472060566, -1.67654563422704, -0.128965048786695, 
    -0.629652885252689, -1.53999440609995, -0.629652885252689, 
    -0.629652885252689, -1.58551148214231, -0.584135809210326, 
    -0.219999200871422, -1.67654563422704, -0.174482124829058, 
    -0.174482124829058, -1.7220627102694, -0.174482124829058, 
    1.14551308039947, 1.14551308039947, 1.14551308039947, 1.14551308039947, 
    1.14551308039947, 1.19103015644183, -0.174482124829058, -1.40344317797286, 
    0.0531032553827568, -0.128965048786695, -1.44896025401522, 
    -0.128965048786695, 0.872410624145292, 0.508274015806387, 
    0.917927700187655, 1.05447892831474), overall_scorescale = c(7.7, 
    7.7, 7.7, 7.7, 7.7, 7.7, 8.4, 8.4, 8.4, 8.4, 8.4, 8.4, 7.3, 
    7.3, 7.3, 7.3, 7.3, 7.3, 6.8, 6.8, 6.8, 6.8, 6.8, 6.8, 6.4, 
    6.4, 6.4, 6.4, 6.4, 6.4, 7.5, 7.5, 7.5, 7.5, 7.5, 7.5, 8.6, 
    8.6, 8.6, 8.6, 8.6, 8.6, 8.4, 8.4, 8.4, 8.4, 8.4, 8.4, 6.6, 
    6.6, 6.6, 6.6, 6.6, 6.6, 8.8, 8.8, 8.8, 8.8, 8.8, 8.8, 8.5, 
    8.5, 8.5, 8.5, 8.5, 8.5, 8.5, 8.5, 8.5, 8.5, 8.5, 8.5, 5.6, 
    5.6, 5.6, 5.6, 5.6, 5.6, 5, 5, 5, 5, 5, 5, 9, 9, 9, 9, 9, 
    9, 7.1, 7.1, 7.1, 7.1, 7.1, 7.1, 8.1, 8.1, 8.1, 8.1)), row.names = c(NA, 
-100L), class = c("tbl_df", "tbl", "data.frame"))

Code should also be pasted between

# clear environment
rm(list=ls())
par(las=1,pch=16) 

require(readxl)
require(dplyr)
require(rstatix)
require(emmeans)

scores <- read_excel("C:/Users/rda2/OneDrive - ..........etc...../Exp4_d.xlsx")
# scores <- read_excel("Exp4_d.xlsx") # RD filepath for testing

scores <- within(scores, {
  text <- factor(text)
  question <- factor(question)
  rater <- factor(rater)
})

######## Analysis ICC

dput(scores)

to maintain formatting.

startz · February 2, 2024, 4:39pm

The "L's" just mean that the number is an integer.

jrkrideau · February 2, 2024, 9:46pm

It looks like my semi-confident statement that 100 rows of data should be sufficient was wrong. All we have are results for Rater1. If you can, without exceeding upload limits, can you give us the entire data set? Otherwise you might consider posting the data set at a file-hosting site like dropbox or mediafire, etc.

In the mean time you might want to have a look at Intraclass Correlation Coefficient in R

At the moment, I don't see what this is doing, or rather I don't understand the need for factors.

scores <- within(scores, {
  text <- factor(text)
  question <- factor(question)
  rater <- factor(rater)
})

RichD · February 3, 2024, 7:14am

Dear @startz

Thank you for the comment.

RichD · February 3, 2024, 7:34am

Dear @jrkrideau

The data set is over 50k a try the file hosting site " Exp4_d.xlsx file " . The appearance of factors is my doing by simply copying and pasting the initial part of someone else's code when they loaded the data.

Here is screen shot of the two way complete data ICC, which I would like to follow and try:

As well as the paper where this is discussed:
paper

I truly hope that I am trying the right approach with ICC's given the nature of data I have. I would of course welcome any comments or feedback on that as well.

Thank you very much for your time and support. Kind regards,
Richard

jrkrideau · February 3, 2024, 10:00pm

I got the data but won't have a chance to do any with it until tomorrow, EST.

jrkrideau · February 4, 2024, 2:48pm

I think that this will do what you want.

library(data.table)
library(rio)  # Easy way to read in files of varying formats. 
library(psych)


dat1 <- import("Exp4_d.xlsx")
DT  <- as.data.table(dat1) # convert to data.table
DT2 <- dcast(DT, question + text ~ rater , value.var = "scores" ) # reshape data
DT2 <- DT2[, !c("question", "text")]  # drop unwanted variables

inter <- ICC(DT2)

RichD · February 8, 2024, 10:28am

Dear @jrkrideau

Thank you again for your help and guidance, it is greatly appreciated. There is more I would like to explore if you would have little time spare please.

With reference to the same data set, I would like to compare the ICC we just computed for overall_scorescale, with the ICC of a average taken out of the six question. Essentially, could you help me to create a new overall_average_score for each rater-text by averaging the six question scores in each rater-text part, and than we could run it through the same code for the ICC as before.

Obviously I am not sore how to proceed with the averaging and creating overall_averge_score, neither I am not sure if the ICC(DT2) is still applicable for this new created data ..

Kind regards, Richard

jrkrideau · February 8, 2024, 4:08pm

Hi Richard,
Mathematically it is doable, I believe but I think the results are a bit dubious or rather that you are throwing away a lot of information. I guess it depends on the substantive question you are asking.

IF i have understood you correctly, I think this does what you want.

library(data.table)
library(rio)  # Easy way to read in files of varying formats. 
library(psych)


dat1 <- import("Exp4_d.xlsx")
DT  <- as.data.table(dat1) # convert to data.table
DT4 <- DT[ , .(mean_q = mean(scores)), by = c("rater", "text")]
DT5 <- dcast(DT4, text ~ rater, value.var = "mean_q")
xx <- 1:12
names(DT5) <- c("text", paste0("rater", xx))
DT5
DT6 <- DT5[, !c("text")]

ICC(DT6)

system · February 29, 2024, 4:08pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.