reshaping data for inter-rater reliability analysis

mar6567 · August 7, 2023, 3:22pm

I have two datasets with the same set of IDs and column variables (categorical 0/1); one was completed by a child and the other by a parent. I want to merge these datasets and reshape the data so that there are 4 columns: ID, item, rater1, rater2.

How do I specify that columns 2:28 are rater1 and columns 29:55 are rater 2? I just want to literally take those two horizontal vectors and rotate them to be vertical and next to each other, preserving the ordering (e.g., 2 goes with 29, 3 goes with 30, etc).

the variables have random character string labels (e.g., "homework"); after merging, they may look like "homework_1" "homework_2".

Should I try to reshape before merging? or merge and then reshape? Any assistance is helpful.

scottyd22 · August 7, 2023, 4:21pm

Welcome to the community @mar6567! Below is one approach I believe gets to your desired outcome, which uses the pivot_longer() function. The sample data sets are 5 observations across 4 variables, but it should extend to a case with more columns.

library(tidyverse)
set.seed(22)

# sample data
data1 = data.frame(id = 1:5,
                    homework1 = sample(c(0, 1), 5, replace = T),
                    homework2 = sample(c(0, 1), 5, replace = T),
                    homework3 = sample(c(0, 1), 5, replace = T)
                    )

data2 = data.frame(id = 1:5,
                    homework1 = sample(c(0, 1), 5, replace = T),
                    homework2 = sample(c(0, 1), 5, replace = T),
                    homework3 = sample(c(0, 1), 5, replace = T)
                    )

# add rater
rater1 = data1 |> mutate(rater = 'rater1')
rater2 = data2 |> mutate(rater = 'rater2')

# transform
rater1 = rater1 |> 
  pivot_longer(cols = c(-'id', -'rater'), 
               names_to = 'item',
               values_to = 'rater1') |>
  distinct(id, item, rater1)

rater2 = rater2 |> 
  pivot_longer(cols = c(-'id', -'rater'),
               names_to = 'item',
               values_to = 'rater2') |>
  distinct(id, item, rater2)


# combine
out = left_join(rater1, rater2)
#> Joining with `by = join_by(id, item)`

out
#> # A tibble: 15 × 4
#>       id item      rater1 rater2
#>    <int> <chr>      <dbl>  <dbl>
#>  1     1 homework1      1      1
#>  2     1 homework2      1      1
#>  3     1 homework3      0      0
#>  4     2 homework1      0      1
#>  5     2 homework2      1      0
#>  6     2 homework3      1      1
#>  7     3 homework1      1      1
#>  8     3 homework2      0      0
#>  9     3 homework3      0      0
#> 10     4 homework1      1      1
#> 11     4 homework2      0      1
#> 12     4 homework3      1      1
#> 13     5 homework1      1      0
#> 14     5 homework2      0      1
#> 15     5 homework3      1      0

Created on 2023-08-07 with reprex v2.0.2

system · August 28, 2023, 4:21pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.