Hi R community,
So below I've got two bits of code for the same problem. The first is the actual code I'm working on (which obviously won't be able to run) and the second is my attempt at a reprex (though I'm new to this so forgive me if I've done this wrong).
Ok, so here's my actual code:
for (i in 1:nrow(enrichResList[[1]])) {
#get the names of each ontology and the analysis and put it into an empty list variable 'v'
v <- append(v, as.name(paste(names(enrichResList)[[1]], "_", enrichResList[[1]]@result[i,]$Description, sep = "")))
#get the geneIDs for each and put them in the list variable as well
v[[i]] <- append(v[[i]], enrichResList[[1]]@result[i,]$geneID)
#split the geneIDs from 1 string into multiple strings
v[[i]][[2]] <- str_split(v[[i]][[2]], "/")
#set the names of each element in the vector to be the name of the analysis/ontology from line one
v[[i]] <- setNames(v[[i]][[2]], as.character(v[[i]][[1]]))
}
#remove the top level from the list of lists so that it is a list of vectors with names corresponding to experiments
v <- flatten(v)
#for each of the vectors in the list, v, subset the FTD_Mouse_Ontology dataframe by the entrez IDs (individual vector elements)
#is there a way to keep these subset dataframes in a list of dataframes?
for (i in 1:length(v)) {
#this splits the elements into separate dataframes with proper naming
assign(names(v)[[i]], subset(FTD_Mouse_Ontology, entrez %in% v[[i]]))
}
So this works fine. Basically I've gone from an S4 object which contains a dataframe enrichResList[[1]]@result
. Ultimately, I will be trying to convert all of this to a function so the [[1]] index is just a placeholder as I actually have 4 elements which I will want to process in this way.
So each row of the dataframe contains a few bits of information, and I want to extract two of the columns (one of which contains a string variable that is actually multiple variables that need to be split). This is what I'm doing in the first for loop - just extracting the data and formatting the character strings properly as separate observations/elements rather than as 1 thing.
From there, I flatten the resulting list so that it is a list of vectors, each of which has a name that is relevant to what it is, e.g. v[[1]
is a vector called clusterGO_CellCycle
which contains a bunch of numbers that are basically ID numbers (entrez IDs for those of you in bioinformatics). Simple enough so far (P.S: these IDs are what were contained in the string variable that was split)
I want to then use these IDs to subset a dataframe. It's important to iterate through each vector, subset the dataframe, and call the resulting newly subset dataframe a name that corresponds to the vector that it has come from, this is what I'm doing with the second for loop.
However, this just creates a bunch of dataframes in my environment, and because I am going to be doing this for many lists of vectors, and some of them will be quite long, I want to be able to group the dataframes together into a list according to which list of vectors they've come from. So in the example above, I would want to name the new list paste(names(enrichResList)[[1]], "_dfList", sep = "")
But as I'm sure you all know, I can't create an empty list with a name that is produced from paste()
, so I would have to use assign. This would look something like:
for (i in 1:length(v)) {
#this splits the elements into separate dataframes with proper naming
assign(paste(names(enrichResList)[[1]], "_dfList", sep = "") ,assign(names(v)[[i]], subset(FTD_Mouse_Ontology, entrez %in% v[[i]])))
}
But this doesn't work as I would like.
So my problem is, how do I make an empty list with a name that is produced from paste()
which I can assign dataframes as I iteratively create them by subsetting a larger dataframe.
Ok, so that's a lot of information sorry, so below find my attempt at a reprex:
#start with a dataframe containing some information (e.g. numeric IDs)
id <- sample(1:100, 50, replace=TRUE)
n <- sample(1:100, 50, replace = TRUE)
ratio <- rnorm(50, mean = 0.5, sd=0.1)
start_df <- data.frame(id,n,ratio)
#have some other data structure which contains named elements
experiment <- list("item1" = sample(1:100, 100), "item2" = sample(1:100, 100))
#create a list of vectors
v1 <- sample(1:100, 10, replace = TRUE)
v2 <- sample(1:100, 10, replace = TRUE)
v3 <- sample(1:100, 10, replace = TRUE)
v_list <- list("res1" = v1, "res2" = v2, "res3" = v3)
#for each vector in the list, subset the original dataframe by the elements of the vector
for (i in 1:length(v_list)) {
assign(names(v_list)[[i]], subset(start_df, id %in% v_list[[i]]))
}
#NEXT STEP: assign the resulting dataframes to a list which is named based upon some variables
#e.g:
nameOfDfList <- paste(names(experiment)[[1]], "_dfList", sep="")
#So what I'm trying to achieve, instead of the above for loop, would be something like:
for (i in 1:length(v_list)) {
assign(paste(names(experiment)[[1]], "_dfList", sep=""), assign(names(v_list)[[i]], subset(start_df, id %in% v_list[[i]])))
}
#however this overwrites the list of dataframes so that I end up with only the dataframe produced last from the for loop
The problem can be seen here as, how do I get res1,res2, res3
into a list which is named based upon the named elements of v_list
?
I have also had a go experimenting with as.name()
but this doesn't seem to work well either....
thanks for any help or advice you can give, and sorry if the first chunk of code causes any confusion!