Hi, I'm looking to find total number of unique combinations of 3 diseases within a group of 20 conditions (factorial). I have code from a reprex that works, and I've made my csv the same shape (diseases begin from column 6 onwards), but it throws and error message when using the real file. I want to find all possible combinations and calculate prevalence of each combination, to then plot as mean and sd. What is the difference between the csv and reprex? (Reprex right at the bottom).

Thanks

Code:

library(tidyverse)

library(utils)

dat <- read_csv("005_trimmed_spice.csv")

dat <- dat[,-c(3,20)]

dat$comorbid <- FALSE

comorbids <- dat[which(rowSums(dat[,7:20]) > 2),1]

dat[comorbids,"comorbid"] <- TRUE

cases <- combn(7:20,3)

dat[,cases[,1]]

make_comb <- function(x) dat[which(rowSums(dat[,cases[,x]]) > 2),1]

show_result <- function(x) dat[dat[make_comb(x)][which(rowSums(dat[,cases[,1]]) > 2),1],]

show_result(1)

show_result(2)

apply(cases, 2, show_result)

Console:

dat <- read_csv("005_trimmed_spice.csv")

New names: 0s

- `` -> ...47
- `` -> ...48
- `` -> ...49
- `` -> ...50
- `` -> ...51
- ...

Rows: 65534 Columns: 86

── Column specification ─────────────────────────────────────────────

Delimiter: ","

chr (1): age_group

dbl (45): UniquePatientID, Age, Sex, CarstairsQuintile, Carstairs...

lgl (40): ...47, ...48, ...49, ...50, ...51, ...52, ...53, ...54,...

Use `spec()`

to retrieve the full column specification for this data.

Specify the column types or set `show_col_types = FALSE`

to quiet this message.

dat <- dat[,-c(3,20)]

dat$comorbid <- FALSE

comorbids <- dat[which(rowSums(dat[,7:20]) > 2),1]

dat[comorbids,"comorbid"] <- TRUE

Error: Must assign to rows with a valid subscript vector.

x Subscript`comorbids`

has the wrong type`tbl_df<UniquePatientID:double>`

.

It must be logical, numeric, or character.

Run`rlang::last_error()`

to see where the error occurred.cases <- combn(7:20,3)

dat[,cases[,1]]

# A tibble: 65,534 x 3

Depression PainfulCondition ActiveAsthma

1 0 0 0

2 0 0 1

3 0 0 0

4 0 0 0

5 0 0 0

6 0 0 0

7 0 0 0

8 0 0 0

9 0 1 0

10 0 0 0

# … with 65,524 more rows

make_comb <- function(x) dat[which(rowSums(dat[,cases[,x]]) > 2),1]

show_result <- function(x) dat[dat[make_comb(x)][which(rowSums(dat[,cases[,1]]) > 2),1],]

show_result(1)

Error: Must subset columns with a valid subscript vector.

x Subscript`make_comb(x)`

has the wrong type`tbl_df<UniquePatientID:double>`

.

It must be logical, numeric, or character.

Run`rlang::last_error()`

to see where the error occurred. >

show_result(2)

Error: Must subset columns with a valid subscript vector.

x Subscript`make_comb(x)`

has the wrong type`tbl_df<UniquePatientID:double>`

.

It must be logical, numeric, or character.

Run`rlang::last_error()`

to see where the error occurred. >

apply(cases, 2, show_result)

Error: Must subset columns with a valid subscript vector.

x Subscript`cases[, x]`

must be a simple vector, not a matrix.

Run`rlang::last_error()`

to see where the error occurred.

Practice reprex where code above worked:

ID =

c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10),

Age =

c(18, 77, 25, 30, 54, 78, 69, 62, 68, 63),

Sex =

c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1),

CarsQuintie =

c(2, 1, 3, 1, 1, 5, 1, 1, 5, 1),

age_group =

c("18 - 24", "65 - 74", "25 - 34", "25 - 34", "55 - 64", "75 - 84", "65 - 74", "55 - 64", "55 - 64", "55 - 64"),

CarsQuintie_group =

c(3, 1, 4, 3, 1, 5, 1, 2, 1, 3),

Diabetes =

c(1, 0, 0, 0, 0, 1, 1, 0, 1, 1),

Asthma =

c(1, 1, 0, 0, 0, 1, 1, 0, 1, 0),

Stroke =

c(0, 1, 0, 0, 0, 0, 0, 0, 0, 0),

Heart.attack =

c(1, 1, 0, 0, 0, 1, 1, 0, 1, 1),

COPD =

c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0),

Hypertension =

c(0, 0, 1, 0, 1, 0, 1, 0, 0, 0),

Eczema =

c(0, 1, 0, 0, 1, 0, 0, 0, 1, 0),

Depression =

c(0, 0, 0, 1, 0, 0, 0, 1, 0, 0))