How to generate separate heatmap images in R

SLP · September 20, 2023, 3:26pm

Hello, I am trying to create separate heatmap images in r. The below script will create only one image. The 'test_name' in the script has multiple tests and I would like to create separate images for each test. What adjustments should I make in the script to make this possible?
'sql1' is my connection string and 'query' is the query used to pull data from DB

test_name = '{TEST_NAME}'
res = dbGetQuery(sql1, gsub("\\{TEST_NAME\\}", test_name, gsub("\\n\\s+", " ", query)))

res[['status']] = ifelse(res[['line']] == 'TRUE', 'N', 'T')
res[is.na(res[['spec_name']]),][['spec_name']] = res[is.na(res[['spec_name']]),][['number']]
res[['spec_desc']] = paste0(res[['last_name']], '__', res[['spec_name']])
variants = unique(res[['hash']])
samples = unique(res[['spec_desc']])

dat = res[,c('hash','spec_desc', 'varfreq')] %>%
      group_by(hash,spec_desc) %>%
      summarize(mean_varfreq = mean(varfreq))

df = data.frame(variant = variants)
for (sample in samples) {
  df = merge(df, dat[dat[['sample_desc']] == sample, c('hash','mean_varfreq')], by.x=c('variant'), by.y=c('hash'), all.x=TRUE)
  names(df)[names(df) == 'mean_varfreq'] = sample
}

rownames(df) = df[['variant']]
df_binary = df
df_binary[['variant']] = NULL
df_binary[is.na(df_binary)] = 0
df_binary = df_binary[apply(df_binary, 1, max, na.rm=TRUE) >= 40,]
df_binary[df_binary > 0] = 1
cormat = cor(df_binary)
cormat = cormat[order(rownames(cormat)), order(colnames(cormat))]

h_map = Heatmap(as.matrix(cormat), #dat_row_scaled, 
                name = "spec info",
                col = colorRamp2(c(0,0.5,1), c(colors[2], "white", colors[1])),
                cluster_columns=TRUE, 
                cluster_rows=TRUE,
                show_column_dend=FALSE,
                show_row_dend=FALSE,
                show_column_names = TRUE,
                show_row_names = TRUE,
                column_names_gp = gpar(fontsize = font_size * 0.8),
                row_names_gp = gpar(fontsize = font_size * 0.8),
                row_names_side = "left",
                column_title = "", 
                column_title_gp = gpar(fontsize = font_size * 0.9),
                heatmap_legend_param = list(direction = "vertical"))

ht_list = h_map
png("res[['spec_name']].png")
draw(ht_list, heatmap_legend_side = "right", annotation_legend_side = "bottom")

AlexisW · September 20, 2023, 8:23pm

The very simplest way is with a for loop:

all_test_names = c("{TEST_NAME1}", "{TEST_NAME2}")

for( test_name in all_test_names){
  res = dbGetQuery(sql1, gsub("\\{TEST_NAME\\}", test_name, gsub("\\n\\s+", " ", query)))
  [...]
  png(paste0(res[['spec_name']], ".png")
  draw(ht_list, heatmap_legend_side = "right", annotation_legend_side = "bottom")
}

just make sure you are using the correct file name to save: in your example code, you are literally saving a file "res[['spec_name']].png", which will be overwritten by each iteration of the loop. By using paste0() you can keep the content of res[["spec_name"]] as a file name (I'm assuming it's different for each value of test_name).

I don't see any other obvious required change, but maybe they'll come up once you try running the loop.

SLP · September 20, 2023, 9:46pm

Really appreciate your reply. But I am getting an error when I run the first chunk. I close the loop at the end of first chunk, is that okay? Here is error I have got after running the first chunk.

summarise() has grouped output by 'hash'. You can override using the .groups argument.summarise() has grouped output by 'hash'. You can override using the .groups argument.

How can I get my file name as '220107' and '220103' as these are my test names? I also got an error after running the second chunk too. Please see below

Error in switch(units, in = res, cm = res/2.54, mm = res/25.4, px = 1) * :
non-numeric argument to binary operator

updated script is below, please let me know what I am doing wrong here

all_test_names = c("{220107}","{220103}")
for (test_name in all_test_names) {
res = dbGetQuery(sql1, gsub("\\{TEST_NAME\\}", test_name, gsub("\\n\\s+", " ", query)))

res[['status']] = ifelse(res[['line']] == 'TRUE', 'N', 'T')
res[is.na(res[['spec_name']]),][['spec_name']] = res[is.na(res[['spec_name']]),][['number']]
res[['spec_desc']] = paste0(res[['last_name']], '__', res[['spec_name']])
variants = unique(res[['hash']])
samples = unique(res[['spec_desc']])

dat = res[,c('hash','spec_desc', 'varfreq')] %>%
      group_by(hash,spec_desc) %>%
      summarize(mean_varfreq = mean(varfreq))

df = data.frame(variant = variants)
for (sample in samples) {
  df = merge(df, dat[dat[['sample_desc']] == sample, c('hash','mean_varfreq')], by.x=c('variant'), by.y=c('hash'), all.x=TRUE)
  names(df)[names(df) == 'mean_varfreq'] = sample
}

rownames(df) = df[['variant']]
df_binary = df
df_binary[['variant']] = NULL
df_binary[is.na(df_binary)] = 0
df_binary = df_binary[apply(df_binary, 1, max, na.rm=TRUE) >= 40,]
df_binary[df_binary > 0] = 1
cormat = cor(df_binary)
cormat = cormat[order(rownames(cormat)), order(colnames(cormat))]
}

h_map = Heatmap(as.matrix(cormat), #dat_row_scaled, 
                name = "spec info",
                col = colorRamp2(c(0,0.5,1), c(colors[2], "white", colors[1])),
                cluster_columns=TRUE, 
                cluster_rows=TRUE,
                show_column_dend=FALSE,
                show_row_dend=FALSE,
                show_column_names = TRUE,
                show_row_names = TRUE,
                column_names_gp = gpar(fontsize = font_size * 0.8),
                row_names_gp = gpar(fontsize = font_size * 0.8),
                row_names_side = "left",
                column_title = "", 
                column_title_gp = gpar(fontsize = font_size * 0.9),
                heatmap_legend_param = list(direction = "vertical"))

ht_list = h_map
png(paste0(res[['test_name']]), ".png"))
draw(ht_list, heatmap_legend_side = "right", annotation_legend_side = "bottom")

AlexisW · September 20, 2023, 10:01pm

That isn't an error, just an information message. Also, I expect this has nothing to do with the for loop and was already happening if you just run the code you shared.

It's "okay" in that the loop should run. But since it's not saving the result anywhere, you have no way of actually accessing it.

That is an error, just like that I have no idea where it's coming from (probably from inside Heatmap). Does it not happen when you run that code without the forloop?

Do run the entire script as shared before, does it run without error? If so, do try the entire script in the for loop.

Beware, you are closing the parenthesis too early: you want to paste(res, "png"). That should give you the desired file names with:

png(paste0(res[['test_name']], ".png"))

Or if you need to remove the braces:

filename <- gsub("[{}]", "", res[['test_name']])
png(paste0(filename, ".png"))

SLP · September 21, 2023, 12:28am

I don't get the ' summarise() has grouped output by 'hash'. You can override using the .groups argument.summarise() has grouped output by 'hash'. You can override using the .groups argument.' information when I don't use the for loop.

when I put both chunks in the for loop, the script stops at the 'summarise()' information line and doesn't not continue to the next line to finish the script.

If I run the last chunk separately it will save the png file without a file name like this .png. Also I have two testnames '220107' and '220103'. The image has the data only from the second test '220103'

AlexisW · September 21, 2023, 5:43pm

The only explanation I can think of is if some samples do not have several values of spec_desc (or maybe of hash).

That doesn't make sense to me, this is not an error message, it should not interrupt the loop. Try adding some print() statements within the loop to see what's happening (see below).

Is res[['test_name']] empty?

That suggests your loop is running fully, and everytime it runs it's overwriting the saved file. So at the end of the loop you only have the last testname. Actually it also seems you're not closing the png() you open, so that could be part of the problem: you need to call dev.off() after saving the image.

Try running this code changing only the all_test_names, and share the result:

all_test_names = c('{220107}', '{220103}')

cat("Starting for loop with ", length(all_test_names), "entries.\n")
for(test_name in all_test_names){
  cat("--- Starting iteration for test name: ", test_name, " ---\n")
  
  res = dbGetQuery(sql1, gsub("\\{TEST_NAME\\}", test_name, gsub("\\n\\s+", " ", query)))
  
  cat("Result of query: ", nrow(res), " rows\n")
  
  res[['status']] = ifelse(res[['line']] == 'TRUE', 'N', 'T')
  res[is.na(res[['spec_name']]),][['spec_name']] = res[is.na(res[['spec_name']]),][['number']]
  res[['spec_desc']] = paste0(res[['last_name']], '__', res[['spec_name']])
  variants = unique(res[['hash']])
  samples = unique(res[['spec_desc']])
  
  dat = res[,c('hash','spec_desc', 'varfreq')] %>%
    group_by(hash,spec_desc) %>%
    summarize(mean_varfreq = mean(varfreq),
              .groups = "drop")
  
  df = data.frame(variant = variants)
  
  cat("Number of samples: ", length(samples), "\n")
  for (sample in samples) {
    df = merge(df, dat[dat[['sample_desc']] == sample, c('hash','mean_varfreq')], by.x=c('variant'), by.y=c('hash'), all.x=TRUE)
    names(df)[names(df) == 'mean_varfreq'] = sample
  }
  
  rownames(df) = df[['variant']]
  df_binary = df
  df_binary[['variant']] = NULL
  df_binary[is.na(df_binary)] = 0
  df_binary = df_binary[apply(df_binary, 1, max, na.rm=TRUE) >= 40,]
  df_binary[df_binary > 0] = 1
  cormat = cor(df_binary)
  cormat = cormat[order(rownames(cormat)), order(colnames(cormat))]
  
  cat("Creating heatmap\n")
  h_map = Heatmap(as.matrix(cormat), #dat_row_scaled, 
                  name = "spec info",
                  col = colorRamp2(c(0,0.5,1), c(colors[2], "white", colors[1])),
                  cluster_columns=TRUE, 
                  cluster_rows=TRUE,
                  show_column_dend=FALSE,
                  show_row_dend=FALSE,
                  show_column_names = TRUE,
                  show_row_names = TRUE,
                  column_names_gp = gpar(fontsize = font_size * 0.8),
                  row_names_gp = gpar(fontsize = font_size * 0.8),
                  row_names_side = "left",
                  column_title = "", 
                  column_title_gp = gpar(fontsize = font_size * 0.9),
                  heatmap_legend_param = list(direction = "vertical"))
  
  ht_list = h_map
  filename = paste0(res[['spec_name']], ".png")
  cat("Heatmap created for ", res[['test_name']], ", saving it as: ", filename, "\n")
  png(filename)
  draw(ht_list, heatmap_legend_side = "right", annotation_legend_side = "bottom")
  dev.off()
  cat("Saved ", res[['test_name']], "as ", res[['spec_name']],"\n")
}

cat("All done.\n")

SLP · September 22, 2023, 2:12pm

Thank you so much for your help so far. I am almost there, not getting anymore error messages or warnings. The file is saving with the correct file name as 220103.png. But it is still saving only one file, the first test '220209' is not saving.

I have added one more field in the heatmap to display, which is my test name, in the database that column name is 'analysis', this column has the values stored as '220107' and '220103'. I have this field highlighted in yellow. Sharing the script again . Thanks again for your time.

all_test_names = c("{220107}","{220103}")
cat("Starting for loop with", length(all_test_names),"entries.\n")
for (test_name in all_test_names){
cat("--- Starting iteration for test name: ", test_name, " ---\n")
res = dbGetQuery(sql1, gsub("{TEST_NAME}", test_name, gsub("\n\s+", " ", query)))

cat("Result of query: ", nrow(res), " rows\n")

res[['status']] = ifelse(res[['line']] == 'TRUE', 'N', 'T')
res[is.na(res[['spec_name']]),][['spec_name']] = res[is.na(res[['spec_name']]),][['number']]
res[['spec_desc']] = paste0(res[['last_name']], '', res[['analysis']], '', res[['spec_name']])
variants = unique(res[['hash']])
samples = unique(res[['spec_desc']])

dat = res[,c('hash','spec_desc', 'varfreq')] %>%
group_by(hash,spec_desc) %>%
summarize(mean_varfreq = mean(varfreq),.groups = "drop")

df = data.frame(variant = variants)
cat("Number of samples: ", length(samples), "\n")
for (sample in samples) {
df = merge(df, dat[dat[['spec_desc']] == sample, c('hash','mean_varfreq')], by.x=c('variant'), by.y=c('hash'), all.x=TRUE)
names(df)[names(df) == 'mean_varfreq'] = sample
}

rownames(df) = df[['variant']]
df_binary = df
df_binary[['variant']] = NULL
df_binary[is.na(df_binary)] = 0
df_binary = df_binary[apply(df_binary, 1, max, na.rm=TRUE) >= 40,]
df_binary[df_binary > 0] = 1
cormat = cor(df_binary)
cormat = cormat[order(rownames(cormat)), order(colnames(cormat))]

cat("Creating heatmap\n")
h_map = Heatmap(as.matrix(cormat), #dat_row_scaled,
name = "spec info",
#col = colorRamp2(c(0, 10, 20), c("#EEEEEE", "cornflowerblue", "orange")),
col = colorRamp2(c(0,0.5,1), c(colors[2], "white", colors[1])),
cluster_columns=TRUE,
cluster_rows=TRUE,
show_column_dend=FALSE,
show_row_dend=FALSE,
show_column_names = TRUE,
show_row_names = TRUE,
column_names_gp = gpar(fontsize = font_size * 0.8),
row_names_gp = gpar(fontsize = font_size * 0.8),
row_names_side = "left",
column_title = "",
column_title_gp = gpar(fontsize = font_size * 0.9),
heatmap_legend_param = list(direction = "vertical"))

ht_list = h_map
filename = paste0(res[['analysis']], ".png")
cat("Heatmap created for ", res[['test_name']], ", saving it as: ", filename, "\n")
png(filename)
draw(ht_list, heatmap_legend_side = "right", annotation_legend_side = "bottom")
dev.off()
cat("Saved ", res[['test_name']], "as ", res[['analysis']],"\n")
}
cat("All done.\n")

AlexisW · September 22, 2023, 3:00pm

Please post the output of running this loop, as the answer might be explained by e.g. the value of filename before saving.

SLP · September 22, 2023, 4:38pm

I am sharing my output , part of file names are redacted due to sensitive information.

here you can see my out , says 'starting iteration for test name '220107' but saving the test '220103'

The loops runs again, here it shows 'starting iteration for test name '220103' but it is again saving the test '220103'

Here is the png image saved in the folder. There is only one file instead of two

AlexisW · September 22, 2023, 5:50pm

OK, we get some more information now!

We can see that res[['test_name']] is empty: we do define test_name (that's what we're looping on), but there is no such field in the results from the query. That's why in your previous version you had an empty filename.

Second, res[['analysis']] is defined, but it is the same for both iterations of the loop. There are 2 possibilities: either there is a problem with the database, and there is the same entry for both values of test_name, or we are sending the wrong query.

So let's focus on that line:

 res = dbGetQuery(sql1, gsub("{TEST_NAME}", test_name, gsub("\n\s+", " ", query)))

I assume the connection sql1 is correctly defined, that leaves us with the statement argument:

gsub("{TEST_NAME}", test_name, gsub("\n\s+", " ", query))

I can't fully interpret it without knowing what's in query, my recommendation is to set test_name <- "220103_xxxx" and run just this command, seeing what the SQL query that you're actually sending to the DB looks like.

In principle what it does is take query defined elsewhere, remove any newline and whitespace (and replace them with a single space), then find where in the query it's written {TEST_NAME} and replace it with the content of the variable test_name.

However, when trying to run that on my side I get two errors: first, "\n" and "\s" are not recognized, they need to be escaped twice as "\\n" and "\\s". Second, "{TEST_NAME}" is recognized as a regex, and gsub() tries interpreting the braces, so I have to tell gsub() this is not a regex with the argument fixed = TRUE. So this works as expected:

test_name <- "220301_xxxx"

query <- "SELECT *
    WHERE test = '{TEST_NAME}'
    ORDER BY data"

gsub("\\n\\s+", " ", query)
#> [1] "SELECT * WHERE test = '{TEST_NAME}' ORDER BY data"

gsub("{TEST_NAME}", test_name, gsub("\\n\\s+", " ", query), fixed = TRUE)
#> [1] "SELECT * WHERE test = '220301_xxxx' ORDER BY data"

I'm surprised you didn't run into any error when executing your code, so maybe I'm missing something. I suggest you try running just these lines with your actual query and test_name to see what is the actual SQL query produced, and correct it if needed.

SLP · September 25, 2023, 8:27pm

If I change the test name to just one test it works as expected. I am including my query and the remaining script script here. This works fine.

query =
"
select s.analysis_name, s.last_name, s.number,  s.line, s.spec_name,       
from   v,
s,
       e
where  v.env_id = e.env_id
and    s.ana_id = v.ana_id
and    s.sam_id = v.sam_id
and   s.analysis_name = '{TEST_NAME}'
order by 
"

test_name <- '220107'
#all_test_names = c('{220107 }','{220103 }')
cat("Starting for loop with", length(all_test_names),"entries.\n")
#for (test_name in all_test_names){
cat("--- Starting iteration for test name: ", test_name, " ---\n")  
res = dbGetQuery(sql1, gsub("\\{TEST_NAME\\}", test_name, gsub("\\n\\s+", " ", query)))

cat("Result of query: ", nrow(res), " rows\n")

res[['status']] = ifelse(res[['line']] == 'TRUE', 'N', 'T')
res[is.na(res[['spec_name']]),][['spec_name']] = res[is.na(res[['spec_name']]),][['number']]
res[['spec_desc']] = paste0(res[['last_name']], '__',  res[['spec_name']])
variants = unique(res[['hash']])
samples = unique(res[['spec_desc']])

dat = res[,c('hash','spec_desc', 'varfreq')] %>%
      group_by(hash,sample_desc) %>%
      summarize(mean_varfreq = mean(varfreq),.groups = "drop")

df = data.frame(variant = variants)
cat("Number of samples: ", length(samples), "\n")
for (sample in samples) {
  df = merge(df, dat[dat[['spec_desc']] == sample, c('hash','mean_varfreq')], by.x=c('variant'), by.y=c('hash'), all.x=TRUE)
  names(df)[names(df) == 'mean_varfreq'] = sample
}

rownames(df) = df[['variant']]
df_binary = df
df_binary[['variant']] = NULL
df_binary[is.na(df_binary)] = 0
df_binary = df_binary[apply(df_binary, 1, max, na.rm=TRUE) >= 40,]
df_binary[df_binary > 0] = 1
cormat = cor(df_binary)
cormat = cormat[order(rownames(cormat)), order(colnames(cormat))]

cat("Creating heatmap\n")
h_map = Heatmap(as.matrix(cormat), #dat_row_scaled, 
                name = "spec info",
                #col = colorRamp2(c(0, 10, 20), c("#EEEEEE", "cornflowerblue", "orange")),
                col = colorRamp2(c(0,0.5,1), c(colors[2], "white", colors[1])),
                cluster_columns=TRUE, 
                cluster_rows=TRUE,
                show_column_dend=FALSE,
                show_row_dend=FALSE,
                show_column_names = TRUE,
                show_row_names = TRUE,
                column_names_gp = gpar(fontsize = font_size * 0.8),
                row_names_gp = gpar(fontsize = font_size * 0.8),
                row_names_side = "left",
                column_title = "", 
                column_title_gp = gpar(fontsize = font_size * 0.9),
                heatmap_legend_param = list(direction = "vertical"))

ht_list = h_map
  filename = paste0(res[['analysis_name']], ".png")
  cat("Heatmap created for ", res[['test_name']], ", saving it as: ", filename, "\n")
  png(filename)
  draw(ht_list, heatmap_legend_side = "right", annotation_legend_side = "bottom")
  dev.off()
  cat("Saved ", res[['test_name']], "as ", res[['analysis_name']],"\n")
#}
cat("All done.\n")

But if I change the lines to activate the loop and include both tests it is erroring out

#test_name <- '220107'
all_test_names = c('{220107}','{220103}')
cat("Starting for loop with", length(all_test_names),"entries.\n")
for (test_name in all_test_names){

and here is the error I am getting

Error in [[<-.data.frame(*tmp*, "spec_desc", value = "__") :
replacement has 1 row, data has 0

SLP · September 26, 2023, 1:57pm

It's working now. I modified all test names as below and it is all good now
all_test_names = c("220107","220103")

Really appreciate all your help!!!

AlexisW · September 26, 2023, 6:50pm

Great! Sorry for my late answer, but apparently you didn't need it : )

SLP · September 26, 2023, 11:31pm

No worries. Your answers were so helpful! Appreciate your time!

SLP · September 27, 2023, 3:07pm

Have one more question to ask, if i need to create a new post I can do it.

here is my query for example
Query = "select name, test_name, test_date from table where test_name = '{TEST_NAME}' and test_date like '%2022%' "

all_test_names = c("220107", "220103")
for (test_name in all_test_name)
{}

The column has multiple test_name and some are duplicated for example '220107' listed many times. How can I pass the unique values presented in column test_name ito the loop, instead of typing test_name one by one? Hope my question is clear.

AlexisW · September 27, 2023, 7:26pm

With unique():

for (test_name in unique(all_test_name)){
...
}

or

all_test_names <- unique(some_df$some_column)

for( test_name in all_test_names){
...
}

system · October 18, 2023, 7:26pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.