Hi, I'm looking to find total number of unique combinations of 3 diseases within a group of 20 conditions (factorial). I have code from a reprex that works, and I've made my csv the same shape (diseases begin from column 6 onwards), but it throws and error message when using the real file. I want to find all possible combinations and calculate prevalence of each combination, to then plot as mean and sd. What is the difference between the csv and reprex? (Reprex right at the bottom).
Thanks
Code:
library(tidyverse)
library(utils)
dat <- read_csv("005_trimmed_spice.csv")
dat <- dat[,-c(3,20)]
dat$comorbid <- FALSE
comorbids <- dat[which(rowSums(dat[,7:20]) > 2),1]
dat[comorbids,"comorbid"] <- TRUE
cases <- combn(7:20,3)
dat[,cases[,1]]
make_comb <- function(x) dat[which(rowSums(dat[,cases[,x]]) > 2),1]
show_result <- function(x) dat[dat[make_comb(x)][which(rowSums(dat[,cases[,1]]) > 2),1],]
show_result(1)
show_result(2)
apply(cases, 2, show_result)
Console:
dat <- read_csv("005_trimmed_spice.csv")
New names: 0s
- `` -> ...47
- `` -> ...48
- `` -> ...49
- `` -> ...50
- `` -> ...51
- ...
Rows: 65534 Columns: 86
── Column specification ─────────────────────────────────────────────
Delimiter: ","
chr (1): age_group
dbl (45): UniquePatientID, Age, Sex, CarstairsQuintile, Carstairs...
lgl (40): ...47, ...48, ...49, ...50, ...51, ...52, ...53, ...54,...
Use spec()
to retrieve the full column specification for this data.
Specify the column types or set show_col_types = FALSE
to quiet this message.
dat <- dat[,-c(3,20)]
dat$comorbid <- FALSE
comorbids <- dat[which(rowSums(dat[,7:20]) > 2),1]
dat[comorbids,"comorbid"] <- TRUE
Error: Must assign to rows with a valid subscript vector.
x Subscriptcomorbids
has the wrong typetbl_df<UniquePatientID:double>
.
It must be logical, numeric, or character.
Runrlang::last_error()
to see where the error occurred.cases <- combn(7:20,3)
dat[,cases[,1]]
A tibble: 65,534 x 3
Depression PainfulCondition ActiveAsthma
1 0 0 0
2 0 0 1
3 0 0 0
4 0 0 0
5 0 0 0
6 0 0 0
7 0 0 0
8 0 0 0
9 0 1 0
10 0 0 0
… with 65,524 more rows
make_comb <- function(x) dat[which(rowSums(dat[,cases[,x]]) > 2),1]
show_result <- function(x) dat[dat[make_comb(x)][which(rowSums(dat[,cases[,1]]) > 2),1],]
show_result(1)
Error: Must subset columns with a valid subscript vector.
x Subscriptmake_comb(x)
has the wrong typetbl_df<UniquePatientID:double>
.
It must be logical, numeric, or character.
Runrlang::last_error()
to see where the error occurred. >
show_result(2)
Error: Must subset columns with a valid subscript vector.
x Subscriptmake_comb(x)
has the wrong typetbl_df<UniquePatientID:double>
.
It must be logical, numeric, or character.
Runrlang::last_error()
to see where the error occurred. >
apply(cases, 2, show_result)
Error: Must subset columns with a valid subscript vector.
x Subscriptcases[, x]
must be a simple vector, not a matrix.
Runrlang::last_error()
to see where the error occurred.
Practice reprex where code above worked:
ID =
c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10),
Age =
c(18, 77, 25, 30, 54, 78, 69, 62, 68, 63),
Sex =
c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1),
CarsQuintie =
c(2, 1, 3, 1, 1, 5, 1, 1, 5, 1),
age_group =
c("18 - 24", "65 - 74", "25 - 34", "25 - 34", "55 - 64", "75 - 84", "65 - 74", "55 - 64", "55 - 64", "55 - 64"),
CarsQuintie_group =
c(3, 1, 4, 3, 1, 5, 1, 2, 1, 3),
Diabetes =
c(1, 0, 0, 0, 0, 1, 1, 0, 1, 1),
Asthma =
c(1, 1, 0, 0, 0, 1, 1, 0, 1, 0),
Stroke =
c(0, 1, 0, 0, 0, 0, 0, 0, 0, 0),
Heart.attack =
c(1, 1, 0, 0, 0, 1, 1, 0, 1, 1),
COPD =
c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0),
Hypertension =
c(0, 0, 1, 0, 1, 0, 1, 0, 0, 0),
Eczema =
c(0, 1, 0, 0, 1, 0, 0, 0, 1, 0),
Depression =
c(0, 0, 0, 1, 0, 0, 0, 1, 0, 0))