Save plots to individual folders in for-loop

Nate_L · July 11, 2022, 3:12pm

I made a for-loop for automatically making multiple heatmaps and placing them all in a single folder on my pc. Can anyone help me change it it so that it creates a new folder for each new map that gets made (plot "A" goes in folder "A", plot "B" in folder "B", etc...)?

My code is very complicated, so making this a simple watered-down reproducible example isn't possible but hopefully this can be solved conceptually. Here are the definitions of all my variables and functions I've defined:

"dat" is my dataframe with columns "common_name", "site", "year_season", and "num"
common_name = species name, site = location of catch, year_season = time of catch, num = count data
"splist" is a character string of each unique name from the common_name column in dat
"GetMatrix" is a function that makes a matrix for every name in splist to be used for creating a heatmap
"PlotHeatMap" is a function that makes both clustered and unclustered heatmaps for each name in splist

dataList<-list()
filenameVal<-paste0(1:length(splist),splist," - NotClustered.png")
doClusterVal<-FALSE

setwd('C:.../Not clustered')

for(run in 1:length(splist)) {
 dataList[[run]]<-GetMatrix(dat,splist[run])
 PlotHeatMap(data=dataList[[run]],splist[run],fileName=filenameVal[run],doCluster=doClusterVal)
 print(paste(run,splist[run]))
}

My attempt creates the folders and the maps, but it doesn't put the maps in the correct folders:

for(run in 1:length(splist)) {
 dataList[[run]]<-GetMatrix(filter(dat,CYR>=2005),splist[run])  
 PlotHeatMap(data=dataList[[run]],splist[run],fileName=filenameVal[run],doCluster=doClusterVal)
 dir.create(file.path(splist[run]), recursive = TRUE)
 print(paste(run,splist[run]))
}

Reference:

GetMatrix=function(data,spToSum) {
 newdat<-filter(data,common_name %in% spToSum)
 matrixdat<-dcast(newdat, site ~ year_season, value.var="num", fun.aggregate = sum)
 matrixdat$site <- as.character(matrixdat$site)
 matrixdat <- matrixdat[order(nchar(matrixdat$site), matrixdat$site)]
 matrixdat <- matrixdat %>% remove_rownames %>% column_to_rownames(var="site") # col to row names
 as.matrix(matrixdat)
}

spSummary<- dat %>% group_by(common_name) %>%
  summarize(total.number=sum(num,na.rm=TRUE),
    observations=length(num[num>0 &!is.na(num)])) %>%
  arrange(-total.number)

splist<-spSummary$common_name[spSummary$observations>=1]

thomascf · July 11, 2022, 7:40pm

I've gone through your code. Here's what I found

General feedback

You're using setwd(), when you could just directly construct the path using file.path. Also, I'm somewhat wary of paths with white spaces, but it's not necessarily going to break anything.

The code structure is fine, and it's great that you included the function definitions as a reference. The only
thing that's missing here is the definition of PlotHeatMap or which package it comes with. The GetMatrix function could be cleaned up a little using dplyr pipe operators, for example as follows:

GetMatrix <- function(data, spToSum){
  matrixdat <- data %>%
    filter(common_name %in% spToSum) %>% 
    dcast(formula = site ~ year_season, fun.aggregate = sum,
          value.var = "num") %>%
    mutate(site = as.character(site)) %>%
    arrange(nchar(site)) %>% 
    remove_rownames() %>% 
    column_to_rownames(var="site") %>% 
    as.matrix()

  return(matrixdat)
}

Handling of file paths

Not quite sure what the intended folder structure is supposed to be like. In any case, your call to dir.create() correctly created the directories, but you never adjusted the fileName argument in the second for-loop. So ultimately, the second loop overwrote the results from the first. Assuming that fileName corresponds to the output path of the maps.

Here's how I'd modify your code:

First loop

Here I assumed that you meant to output these maps to the "Not clustered" folder.

foldername <- "Not clustered"

if (!dir.exists(file.path(foldername)){
    dir.create(file.path(foldername))
}

for(run in 1:length(splist)){ 
  dataList[[run]] <- dat %>% 
    GetMatrix(splist[run])
  
  PlotHeatMap(data = dataList[[run]], splist[run], 
    fileName = file.path(foldername, filenameVal[run]), 
    doCluster = doClusterVal)
}

Second loop

Here I assumed that you wanted the maps to be generated in the subdirectory "~/Not clustered/some value from splist". In your code, you created the directory after your call to PlotHeatMap. Instead, I'd make sure that the directories exist ahead of generating the plot.

foldername <- "Not clustered"

for(run in 1:length(splist)){
  if (!dir.exists(file.path(foldername, split[run])){
    dir.create(file.path(foldername, split[run]))
  }
  
  dataList[[run]] <- dat %>% 
    filter(CYR >= 2005) %>% 
    GetMatrix(splist[run])
      
  PlotHeatMap(data = dataList[[run]], splist[run], 
    fileName = file.path(foldername, splist[run], filenameVal[run]), #Changed fileName 
    doCluster = doClusterVal)  
}

If this doesn't solve your issue, we'll next need to look at what PlotHeatMap is doing to save plots to disk. For that, it would be great if you could either post the function definition or, if it's from an R package, a link to the repository.

Nate_L · July 11, 2022, 7:52pm

I'll try this out, thank you! Yes, the the subdirectory "~/Not clustered/some value from splist " is correct (1 folder for 2 plots (clustered and unclustered) by value in splist = 154 folders total all in "~/Not clustered")

Ya, I thought I could get away with not posting the PlotHeatMap function because 1. it's probably too long for my short question, 2. A colleague wrote it so I couldn't answer questions about it except that it helps build the maps, and 3. probably distracts the reader, but it seems necessary, so here it is anyway.

PlotHeatMap<-function(data,common_name,fileName=NULL,doCluster=FALSE,redCols=c("2021_WET")) {
  printPlot<-TRUE
  #Check if there are any non-zero values. 
  if(sum(data,na.rm=TRUE)==0) {
   print(paste("Not enough data for heatmap for",sp))
   printPlot<-FALSE
  } else { #If there is enough data carry on
   #If clustering throw out data until all rows and columns have non-zero data
   #and all year_seasons have at least one sample in common with all others
   if(doCluster) {
    data<-data[rowSums(data,na.rm=TRUE)>0,colSums(data,na.rm=TRUE)>0] #Exclude sites or time periods with no non-zero observations
    if(is.null(dim(data)) | sum(dim(data))<5)  {
       print(paste("Not enough observations to cluster",common_name))
       printPlot<-FALSE
    } else {   
     colCluster<-suppressWarnings(vegdist(t(log(data+1)),na.rm=TRUE)) #Check if both clusters work
     rowCluster<-suppressWarnings(vegdist(log(data+1),na.rm=TRUE))
     if(any(is.na(rowCluster)) | any(is.na(colCluster)) )  { 
      z<-as.matrix(colCluster)
      badCols<-dimnames(z)[[2]][is.na(colSums(z))]
      data<-data[,!dimnames(data)[[2]] %in% badCols]
      z<-as.matrix(rowCluster)
      badRows<-dimnames(z)[[2]][is.na(colSums(z))]
      data<-data[!dimnames(data)[[1]] %in% badRows,]
      data<-data[rowSums(data,na.rm=TRUE)>0,colSums(data,na.rm=TRUE)>0]
      if(is.null(dim(data)))  {
       print(paste("Not enough observations to cluster",common_name))
       printPlot<-FALSE
      }
     }
    }
   } 
  if(printPlot) {
   season_mean <- colMeans(data,na.rm=TRUE)
   site_mean <- rowMeans(data,na.rm=TRUE)
   ha <- try(HeatmapAnnotation(Ave=anno_points(season_mean)))
   ha2 <- try(rowAnnotation(Ave= anno_points(site_mean)),silent=TRUE)
   if(class(ha)=="try-error" | class(ha2)=="try-error") {
     print(paste("Not enough observations to cluster",sp))
     printPlot<-FALSE
   }
  } 
  if(printPlot) {
   cols <- rep('black', ncol(data))
   cols[colnames(data) %in% redCols] <- 'red'
   rg<-brewer.pal(n = 8, name = "RdYlBu")  #originally 8
   if(!is.null(fileName)) png(fileName, units="in", width=5, height=5, res=700)
   plot1 <- getPlot1(data,doCluster,rg,cols,ha,ha2)
   if(any(is.na(plot1@matrix_color_mapping@levels))) {
    rg <- brewer.pal(n = 7, name = "RdYlBu")  #Change to 7 if plot doesn't work
    plot1 <- getPlot1(data, doCluster,rg,cols,ha,ha2)
   }
   try(print(plot1))
   if(!is.null(fileName)) dev.off()
  }  
  }    
}

#This is now a standalone function so it can be called from other functions
getPlot1<-function(data,doCluster,rg,cols,ha,ha2)  {
 if(doCluster) {
  plot1<-Heatmap(as.matrix(round(log(data+1), digits = 1)),
        cluster_rows = TRUE,
        cluster_columns = TRUE,
        col = rev(rg),
        na_col = "black",
        row_names_side = "left", # Row names move to left side
        row_dend_side = "left",
        column_names_side = "top",# Dendragrams move to left side
        # row_names_gp = gpar(cex=0.2, fontface = "bold"),
        row_names_gp = gpar(cex=0.4, fontface = "bold"), # L-sites
        column_names_gp = gpar(cex=0.4, fontface = "bold", col = cols), # change font size
        row_dend_width = unit(2, "cm"),
        column_dend_height = unit(1, "cm"),
        rect_gp = gpar(col = "grey"),
        column_title = "Year/Season",
        column_title_gp = gpar(fontsize = 7),
        # column_names_rot = 70,
        row_title = "Site",
        row_title_gp = gpar(fontsize = 7),
        heatmap_legend_param = list(title = "ln(x+1)"), # ..."ln(x+1)", color_bar = "discrete"),
        clustering_distance_rows =  function(x) {vegan::vegdist(x, method = "euclidean",na.rm=TRUE)},
        clustering_distance_columns = function(x) {vegan::vegdist(x, method = "euclidean",na.rm=TRUE)},
        clustering_method_rows = "ward.D2",
        clustering_method_columns = "ward.D2",
        row_dend_reorder = FALSE,
        column_dend_reorder = FALSE,
        column_split = 4,
        row_split = 3,
        top_annotation = ha,
        left_anno = ha2)
 } else {
   plot1<-Heatmap(as.matrix(round(log(data+1), digits = 1)),
        cluster_rows = FALSE,
        cluster_columns = FALSE,
        col = rev(rg),
        na_col = "black",
        # row_names_side = "left", # Row names move to left side
        # row_dend_side = "left",
        # row_names_gp = gpar(cex=0.2, fontface = "bold"),
        row_names_gp = gpar(cex=0.4, fontface = "bold"), # L-sites
        row_names_side = "left",
        column_names_gp = gpar(cex=0.4, fontface = "bold"), # change font size
        column_names_side = "top",# Dendragrams move to left side
        # row_dend_width = unit(2, "cm"),
        # column_dend_height = unit(1, "cm"),
        rect_gp = gpar(col = "grey"),
        column_title = "Year/Season",
        column_title_gp = gpar(fontsize = 7),
        # column_names_rot = 70,
        row_title = "Site",
        row_title_gp = gpar(fontsize = 7),
        heatmap_legend_param = list(title = "ln(x+1)"),
        # clustering_distance_rows =  function(x) {vegan::vegdist(x, method = "bray")},
        # clustering_distance_columns = function(x) {vegan::vegdist(x, method = "bray")},
        # clustering_method_rows = "ward.D2",
        # clustering_method_columns = "ward.D2",
        # row_dend_reorder = FALSE,
        # column_dend_reorder = FALSE,
        # column_split = 4,
        # row_split = 3,
        top_annotation = ha,
        left_anno = ha2)
 }
   plot1
} 



# STOP
# Skip when not clustering


# I couldn't get this to work without deleting the NA rows. 
# dcast can automatically give NAs where there are no observations
datlong<-dat
dat<-dat[!is.na(dat$num),]

#Check if there is overlap in sites sampled by time
temp<-datlong %>% group_by(site,year_season) %>%
  summarize(samples=length(num[!is.na(num)])) %>%
  pivot_wider(names_from = year_season,values_from = samples)
write.csv(temp,"temp.csv")
#The above prints out a table with the sites sampled each year.season
#Below, calculate the number of sites that are sampled in both of two year.seasons
year.season<-sort(unique(dat$year_season))
sites<-sort(unique(dat$site))
SampleMatrix<-matrix(0,length(year.season),length(year.season),dimnames=list(year.season,year.season))
for(i in 1:length(year.season))
  for(j in 1:length(year.season)) 
    SampleMatrix[i,j]<-length(unique(sites[sites %in% dat$site[dat$year_season==year.season[i]] & sites %in% dat$site[dat$year_season==year.season[j]]]))
write.csv(SampleMatrix,"SampleMatrix.csv")
#Note that in the early years, there were some time periods that had not overlap 
# in which sites were sampled. It isn't possible to cluster two time periods
# with no shared sample sites, because there is no data to compare. Doing the clustering
# from 2005 onward only solves this problem for many species. For some species, 
# deleting the sites with only zero observations makes some periods not viable. 

#Make summary of number of non-zero observations and total count
# and sort by decreasing total count

spSummary<- dat %>% group_by(common_name) %>%
  summarize(total.number=sum(num,na.rm=TRUE),
    observations=length(num[num>0 &!is.na(num)])) %>%
  arrange(-total.number)
spSummary
summary(spSummary)
dim(spSummary)
tail(spSummary)
# There are species in the data frame that have no observations, so they
# can't be used in analysis. 
# Make a splist of only species with at least 1 observation


splist<-spSummary$common_name[spSummary$observations>=1]
length(splist)

thomascf · July 11, 2022, 9:29pm

Thanks. I see why you thought so, but it was important because of this snippet from the PlotHeatMap function:

Nate_L:

# ...
if(!is.null(fileName)) png(fileName, units="in", width=5, height=5, res=700)
plot1 <- getPlot1(data, doCluster, rg, cols, ha, ha2)
#...
try(print(plot1))
if (!is.null(fileName)) dev.off()
#...

What the code does

The preceding code essentially preprocesses the data. If no failure conditions are encountered, printPlot is TRUE and getPlot1 is called. If fileName was specified, the code also opens a png() device which is what you'd use to save plots. The Heatmap itself is first stored as a variable, and then turned into a plot via the print method. Finally, the device is closed, after which the map should have been stored to disk at the path specified in filePath. I don't see any glaring problems here.

Conclusion

I'm still a bit confused about the folder structure, but it shouldn't be too difficult to adapt the modified code from earlier. Here's another modification which places the maps in species-specific folders:

for(run in 1:length(splist)){
    # Allocate folders for individual species in current working directory
    if (!dir.exists(file.path(splist[run])){
        dir.create(file.path(splist[run]))
  }
  
  dataList[[run]] <- dat %>% 
    filter(CYR >= 2005) %>% 
    GetMatrix(splist[run])
      
  PlotHeatMap(data = dataList[[run]], splist[run], 
    fileName = file.path(splist[run], ...), # Complete the missing code at the ellipsis.
    doCluster = doClusterVal)  
}

There's still some work left to do, though. You'll need to think of a way to incorporate the value of doClusterVal in the variable filenameVal, since it's currently hard-coded to NotClustered. If you fix these things, I'm fairly confident you'll have resolved your problem.

system · July 18, 2022, 9:29pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.