I'm trying to figure out the simplest way to do the following. I have a data frame df
with colnames(df) <- c("A", "B", "C", "D", "E")
where all the variables are encoded as factors. I want to find all possible combinations of this data frame. For example, the number of all possible combinations of "df" = 2^5-1=31 and will include data frames with the following column names
"A", "B"
"A", "C"
"A", "D"
"A", "E"
"A", "B", "C"
etc., with the last data frame being "A", "B", "C", "D", "E"
Is there a package to combine all these subsets in one large data frame?
Thanks
I don't understand how you want to combine data frames with varying numbers of columns. I think the following code makes a list of the 31 individual data frames that you want to get.
Combinations <- expand.grid(A=0:1,B=0:1,C=0:1,D=0:1,E=0:1)
Combinations <- sapply(Combinations, as.logical)
Combinations <- Combinations[2:32,] #drop the first row that is all FALSE
DF <- data.frame(A=1:3,B=11:13,C=21:23, D=31:33, E=41:43) #invent data
DF_List <- vector(mode = "list", length = 31) #Make list to store results
for(i in 1:31) {
DF_List[[i]] <- DF[, Combinations[i, ]]
}
Combinations[6,]
#> A B C D E
#> FALSE TRUE TRUE FALSE FALSE
DF_List[[6]]
#> B C
#> 1 11 21
#> 2 12 22
#> 3 13 23
Created on 2024-02-04 with reprex v2.0.2
If you want specific combinations based on conditions on factor levels or interactions, you might need to loop or use advanced libraries like patsy
in R.
Thanks so much. This solves my problem. One more question, is there no way of including the full data frame with all variables A B C D E ?
isnt that what you begin with ?
so you would make the list be 32;
and after you looped over 31 , add
DF_List[[32]] <- DF
Thanks. One more question, could I fit 32 different logistics models (outcome not included in that list) using each data frame?
Thanks
map()
from purrr (tidyverse)
is a general approach to iteration
#example of a list of 3 datasets
list_of_3 <- split(iris,~Species)
(some_outcome_vec <- rep(0:1,25))
library(tidyverse)
#fit all the lm's for each of the list of 3
(list_of_models <- map(list_of_3,\(df_){
temp_df <- bind_cols(enframe(name=NULL,
value="outcome",
x=some_outcome_vec),df_) |> select(-Species)
lm(outcome ~ .,
data=temp_df)
}))
Thanks so much. On the other hand, Is it possible to use the previous code to fit a logistic model on all possible combinations of the variables in the Iris data e.g. a model with
Sepal length only
Sepal width only
petal length only
petal width only
Sepal length and sepal width etc. ?
yes, I encourage you to try to do so.
I am facing some trouble issues with my client's work names [Bear Names]
How do I list variables in a dataset in R?
I am having difficulties modifying the codes, please help!
I expect I can find some time to review code that you might share on this
See the code I wrote. I get an error that the data must be in data.frame including some additional warnings.
Thanks
library(tidyverse)
x_dat<- subset(iris, select=c(Sepal.Length,Sepal.Width, Petal.Length,Petal.Width))
Combinations <- expand.grid(Sepal.Length =0:1,Sepal.Width=0:1, Petal.Length=0:1,Petal.Width=0:1)
Combinations <- sapply(Combinations, as.logical)
Combinations <- Combinations[2:15,] #drop the first row that is all FALSE
x_List <- vector(mode = "list", length = 14) #Make list to store results
(some_outcome_vec <- rep(0:1,75))
for(i in 1:14) {
x_List[[i]] <- x_dat[, Combinations[i, ]]
#fit all the lm's for each of the list of 3
(list_of_models <- map(x_List[[i]],(df_){
x_List[[i]] <- bind_cols(enframe(name=NULL,
value="outcome",
x=some_outcome_vec),df_) |> #select(-Species)
lm(outcome ~ .,
data=x_List[[i]])
}))
}
Sorry guys, I was able to make the code to work. I pasted below. Is it possible to get an overall minimum of the model BIC as well as the position of this minimum from all the iterations?
Combinations <-expand.grid(Sepal.Length=0:1,Sepal.Width=0:1,Petal.Length=0:1,Petal.Width=0:1)
Combinations <- sapply(Combinations, as.logical)
Combinations <- Combinations[2:14,]
x_List <- vector(mode = "list", length = 15) #Make list to store results
list_of_3 <- split(iris,~Species)
(some_outcome_vec <- rep(0:1,25))
for(i in 1:15) {
x_List[[i]] <- list_of_3[, Combinations[i, ]]
}
for(i in 1:15) {
bic[i]<-BIC(glm(some_outcome_vec~., data = x_List[[i]], family = "binomial"))
}
library(tidyverse)
Combinations <- expand.grid(Sepal.Length=0:1,Sepal.Width=0:1,Petal.Length=0:1,Petal.Width=0:1)
Combinations <- sapply(Combinations, as.logical)
max_comb_len <- 15
Combinations <- Combinations[2:(max_comb_len+1),] #drop the first row that is all FALSE
DF_List <- vector(mode = "list", length = max_comb_len) #Make list to store results
lm_list <- vector(mode = "list", length = max_comb_len) #Make list to store results
BIC_list <- vector(mode = "list", length = max_comb_len) #Make list to store results
(DF <- tibble(iris) |> mutate(outcome= Species=="virginica"))
min_bic <- min_bic_pos <- Inf
for(i in 1:max_comb_len) {
vars_in_subset <- names(which(Combinations[i,]))
print(vars_in_subset)
DF_List[[i]] <- DF[, c("outcome", vars_in_subset)]
lm_list[[i]] <- lm(outcome ~ . , data=DF_List[[i]])
BIC_list[[i]] <- BIC(lm_list[[i]])
if(BIC_list[[i]] < min_bic){
min_bic <- BIC_list[[i]]
min_bic_pos <- i
}
}
min_bic
min_bic_pos
This is wonderful. It gives exactly what I needed
Thanks
I am having errors grouping this sample data into percentiles and adding the group to the data frame. The following 2 codes give me errors. Is there something I am not doing right?
Thanks much
df <- data.frame(ID = 1:10, Score1 = c(78, 82, 65, 90, 72, 88, 55, 67, 92, 81),
Score2 = c(89, 95, 76, 82, 91, 85, 72, 68, 97, 88))
Calculate the calculating percentiles
df$quartile <- with(df, factor(
findInterval (score1, c(-Inf,
quantile(score1, probs=c(.2,.4,.6,.8)),Inf),na.rm=TRUE),
labels=c('Q1','q2','Q3','Q4'.'Q5')
))
df$quartile <- with(df, cut(score1, breaks=quantile(score1, probs=seq(0,1, by .2),na.rm=TRUE), include.lowest=TRUE
))
Is it possible to keep other variables in the combination which will not be used for example ID number?
Thanks
I'm lost ? it seems the original issue was addressed, and you are on to other non-related issues?
or is there something to vary in the original request ?
This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.
If you have a query related to it or one of the replies, start a new topic and refer back with a link.