Replicating a Hexagonal Grid Visual

AC3112 · April 13, 2024, 12:53pm

I've been using self-organising map (SOM) visualisation in R as a preliminary clustering process.

However, I've noticed that SOM visualisation methods are built for 'SOM objects.' But, is it possible to replicate a SOM grid but without a SOM object in ggplot?

So, for example, is it possible to recreate a SOM-like 9-node hexagonal grid in ggplot with data from a single variable embedded inside, but without the SOM object provided below?

Therefore, I am really just looking to use a visual equivalent for summary stats, but without the need for the SOM object.

I've provided some mock data that I'd like to put in a SOM-like 9-node visual format, but I've also provided the data that generated my original SOM.

All help would be appreciated.

####################################
#SOME DATA FOR A POTENTIAL GRID
####################################

#A SINGLE RANKED VARIABLE

RankVar = rep(0:5, times = 108) 
RankVar <- sample(RankVar)

# NODE ID TO BE ASSIGNED TO EACH OF THE 9 NODES

Node = rep(1:9, times = 72)
Node <- sample(Node)

DAT <- data.frame(RankVar, Node)
DAT <- data.frame(lapply(DAT, function(x) as.numeric(as.character(x))))


###################################
#MY SOM ROUTINE
###################################

install.packages("aweSOM")
install.packages("kohonen")
install.packages("RColorBrewer")

library(aweSOM)
library(kohonen)
library(RColorBrewer)

#DATA 

X1 <- rep(1:3, times = 100)
X1 <- sample(X1)
X2 <- rep(1:3, times = 100)
X2 <- sample(X2)
X3 <- rep(1:3, times = 100)
X3 <- sample(X3)
X4 <- rep(1:3, times = 100)
X4 <- sample(X4)
X5 <- rep(1:3, times = 100)
X5 <- sample(X5)

Dat <- data.frame(X1, X2, X3, X4, X5)
Dat <- data.frame(lapply(Dat, function(x) as.numeric(as.character(x))))
Dat <- as.matrix(Dat)

#SET SEED

set.seed(145)

#SOM INITIALISATION

init <- somInit(Dat, 3, 3)

#SOM OBJECT

SOM <- kohonen::som(Dat, grid = kohonen::somgrid(3, 3, "hexagonal"), 
                           rlen = 100, alpha = c(0.05, 0.01),
                           dist.fcts = "manhattan", init = init, keep.data = TRUE)

#CLUSTERING ON THE SOM

threeclusters3x3 <- cluster::pam(SOM$codes[[1]], 3)
clus3x3 <- threeclusters3x3$clustering

#PLOTTING THE SOM

plot < - aweSOMplot(som = SOM, type = "Barplot", data = Dat, 
                    variables = c("X1", "X2", "X3", "X4", "X4"),
                    superclass = clus3x3,
                    showAxes = FALSE,
                    values = "median", 
                    palsc = "Blues",
                    palvar = "Greys")

plot

####

dromano · April 13, 2024, 3:14pm

Could you say more about what you mean by "embedded"?

AC3112 · April 13, 2024, 3:32pm

Hi @dromano. Feel free to amend my terminology. By embedding I mean: that within each cell, you can place a summary of the data.

So for example, within each cell you can place a distribution of responses.

dromano · April 13, 2024, 4:06pm

From your code and picture, it's not clear what you mean by this: The plot within each hexagon is a bar chart of some kind of combination of parts of the five variables you created rather than a distribution of a single variable. Could you share a small table of data and describe how the data should be placed in a hexagon?

AC3112 · April 13, 2024, 5:09pm

Hi @dromano. I take your point. Overall, my objective is to have a hexagonal grid (like the SOM), but using the visual, present a frequency distribution in each cell/node for responses per node.

dromano · April 13, 2024, 7:45pm

Without a table to refer to that links data to cells, it's difficult to understand what you mean by this. Could you share a table that does this, and then describe how you like the data displayed?

AC3112 · April 13, 2024, 8:06pm

Hi @dromano.

Thanks for responding. I am not at a laptop atm, so I am unable to tabulate.

However, the data is comprised of two variables: a Node ID (9 nodes) and a response variable (with 6 levels).

The node can take on any response on a scale of 0:6. Therefore, I’d like each cell to be able to characterise the frequency distribution or median value of responses within each node.

dromano · April 13, 2024, 9:00pm

Do you mean like this?

library(tidyverse)
tibble(
  node = 1:2 |> rep(each = 3), 
  var = 1:6
) -> dat

dat
#> # A tibble: 6 × 2
#>    node   var
#>   <int> <int>
#> 1     1     1
#> 2     1     2
#> 3     1     3
#> 4     2     4
#> 5     2     5
#> 6     2     6

dat |> 
  group_by(node) |> 
  summarise(median = median(var))
#> # A tibble: 2 × 2
#>    node median
#>   <int>  <int>
#> 1     1      2
#> 2     2      5

^{Created on 2024-04-13 with reprex v2.0.2}

So that hexagon 1 would display "2" and hexagon 2 would display "5"?

AC3112 · April 13, 2024, 9:17pm

Hi @dromano.

That’s what I mean, yeah. But each node would have a distribution of responses ranging 0:6

dromano · April 23, 2024, 12:12pm

So in terms of designing a function that does what you want, would the basic inputs be: the size of the grid and a distribution for each cell?