Calculate All Combinations of Pairwise Mahalanobis Distance Between Multiple Variables and a Predictor Variable With Three Classes in R

Kaikash777 · June 10, 2022, 5:04am

I'm having some trouble calculating the Mahalanobis Distance between three classes of the independent variable Country .

My aim is to calculate the Mahalanobis distance among dolphin whistle acoustic parameters measured from a spectrogram taken from three populations in three different study regions . I want to calculate potential (dis)similarity amongst whistle types.

My dependent variable has three classes "France", "Spain" and "Italy"
My data frame has are multiple variables (features of whistle type aspects) to be taken into account.
There are multiple observations per group (obs = 367; indicating the data frame contains more than one row per Country).

Dataframe structure

'data.frame':   367 obs. of  10 variables:
 $ Country    : Factor w/ 3 levels "Holland","France",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ Low.Freq   : num  -0.245 -1.846 -1.369 -2.022 2.045 ...
 $ High.Freq  : num  -0.684 -2.033 -2.128 -2.643 0.358 ...
 $ Peak.Freq  : num  -0.0879 -1.3636 -0.8557 -1.4291 0.2001 ...
 $ Delta.Freq : num  -0.384 -0.588 -0.922 -0.973 -0.664 ...
 $ Delta.Time : num  -1.48 -1.124 -0.861 -0.999 -1.696 ...
 $ Peak.Time  : num  0.0383 0.3698 0.3703 -0.7444 -0.3214 ...
 $ Center.Freq: num  -0.169 -1.157 -1.032 -1.556 0.16 ...
 $ Start.Freq : num  -0.741 -0.944 -1.149 -1.712 0.905 ...
 $ End.Freq   : num  -0.335 -1.495 -1.561 -2.242 0.704 ...

I found a StackOverflow question that covered exactly what I want to achieve here but the problem is they conduct a pairwise mahalanobis distance analysis using the function pairwise.mahalanobis() from the HDMD package . However, when I try and download the package, I get this message.

Warning Message

Warning in install.packages :
  package ‘HDMD’ is not available for this version of R
A version of this package for your version of R might be available elsewhere,
see the ideas at
https://cran.r-project.org/doc/manuals/r-patched/R-admin.html#Installing-packages

#Ive tried these different methods to find the package but nothing works

ap <- available.packages()
View(ap)
"HDMD" %in% rownames(ap)

install.packages("HDMD", dependencies = TRUE)

av <- available.packages(filters=list())
av[av[, "Package"] == HDMD, ]

install.packages('HDMD',repos='http://cran.us.r-project.org')

I can't seem to find another function that behaves in the same way as the function pairwise.mahalanobis() except for a loop on this StackOverflow question, here, which I also could not get to work.

Does anyone know how to achieve the type of analysis I'm attempting because I can't figure out how?

If anyone can lend a hand, I would be very grateful.

R-CODE

    library(HDMD) #pairwise.mahalanobis function
    library(cluster) #agnes function
    library(biotools)

    group = matrix(Cluster_Dummy_2$Country) #what is being compared
    group = t(group[,1]) #prepare for pairwise.mahalanobis function
    
    variables = c(variables = c("Low.Freq", "High.Freq", "Peak.Freq", "Delta.Freq", "Delta.Time", "Peak.Time", 
                                "Center.Freq", "Start.Freq","End.Freq")) #variables (what is being used for comparison)
    variables = as.matrix(Cluster_Dummy_2[,variables]) #prepare for pairwise.mahalanobis function

The original code has the function pairwise.mahalanobis() in the HDMD package:

mahala_sq = pairwise.mahalanobis(x=variables, grouping=group) #get squared mahalanobis distances (see mahala_sq$distance).

I decided to try D2.dist() using the biotools package

mahala_sq = D2.dist(variables, group) #get squared mahalanobis distances (see mahala_sq$distance).
names = rownames(mahala_sq$means) #capture labels

#Error 
Error in D2.dist(variables, group, inverted = FALSE) : 
  incompatible dimensions!

Rest of the code to produce a dendrogram

mahala = sqrt(mahala_sq$distance) #mahalanobis distance
rownames(mahala) = names #set rownames in the dissimilarity matrix
colnames(mahala) = names #set colnames in the dissimilarity matrix

#This is how I used the dissimilarity matrix to find clusters.
cluster = agnes(mahala,diss=TRUE,keep.diss=FALSE,method="complete") #hierarchical clustering
plot(cluster,which.plots=2) #plot dendrogram

Data frame

structure(list(Low.Freq = c(435L, 94103292L, 1L, 2688L, 8471L, 
    28818L, 654755585L, 468628164L, 342491L, 2288474L, 3915L, 411L, 
    267864894L, 3312618L, 5383L, 8989443L, 1894L, 534981L, 9544861L, 
    3437614L, 475386L, 7550764L, 48744L, 2317845L, 5126197L, 2445L, 
    8L, 557450L, 450259742L, 21006647L, 9L, 7234027L, 59L, 9L, 605L, 
    9199L, 3022L, 30218156L, 46423L, 38L, 88L, 396396244L, 28934316L, 
    7723L, 95688045L, 679354L, 716352L, 76289L, 332826763L, 6L, 90975L, 
    83103577L, 9529L, 229093L, 42810L, 5L, 18175302L, 1443751L, 5831L, 
    8303661L, 86L, 778L, 23947L, 8L, 9829740L, 2075838L, 7434328L, 
    82174987L, 2L, 94037071L, 9638653L, 5L, 3L, 65972L, 0L, 936779338L, 
    4885076L, 745L, 8L, 56456L, 125140L, 73043989L, 516476L, 7L, 
    4440739L, 612L, 3966L, 8L, 9255L, 84127L, 96218L, 5690L, 56L, 
    3561L, 78738L, 1803363L, 809369L, 7131L, 0L), High.Freq = c(6071L, 
    3210L, 6L, 7306092L, 6919054L, 666399L, 78L, 523880161L, 4700783L, 
    4173830L, 30L, 811L, 341014L, 780L, 44749L, 91L, 201620707L, 
    74L, 1L, 65422L, 595L, 89093186L, 946520L, 6940919L, 655350L, 
    4L, 6L, 618L, 2006697L, 889L, 1398L, 28769L, 90519642L, 984L, 
    0L, 296209525L, 487088392L, 5L, 894L, 529L, 5L, 99106L, 2L, 926017L, 
    9078L, 1L, 21L, 88601017L, 575770L, 48L, 8431L, 194L, 62324996L, 
    5L, 81L, 40634727L, 806901520L, 6818173L, 3501L, 91780L, 36106039L, 
    5834347L, 58388837L, 34L, 3280L, 6507606L, 19L, 402L, 584L, 76L, 
    4078684L, 199L, 6881L, 92251L, 81715L, 40L, 327L, 57764L, 97668898L, 
    2676483L, 76L, 4694L, 817120L, 51L, 116712L, 666L, 3L, 42841L, 
    9724L, 21L, 4L, 359L, 2604L, 22L, 30490L, 5640L, 34L, 51923625L, 
    35544L), Peak.Freq = c(87005561L, 9102L, 994839015L, 42745869L, 
    32840L, 62737133L, 2722L, 24L, 67404881L, 999242982L, 3048L, 
    85315406L, 703037627L, 331264L, 8403609L, 3934064L, 50578953L, 
    370110665L, 3414L, 12657L, 40L, 432L, 7707L, 214L, 68588962L, 
    69467L, 75L, 500297L, 704L, 1L, 102659072L, 60896923L, 4481230L, 
    94124925L, 60164619L, 447L, 580L, 8L, 172L, 9478521L, 20L, 53L, 
    3072127L, 2160L, 27301893L, 8L, 4263L, 508L, 712409L, 50677L, 
    522433683L, 112844L, 193385L, 458269L, 93578705L, 22093131L, 
    6L, 9L, 1690461L, 0L, 4L, 652847L, 44767L, 21408L, 5384L, 304L, 
    721L, 651147L, 2426L, 586L, 498289375L, 945L, 6L, 816L, 46207L, 
    39135L, 6621028L, 66905L, 26905085L, 4098L, 0L, 14L, 88L, 530L, 
    97809006L, 90L, 6L, 260792844L, 9L, 833205723L, 99467321L, 5L, 
    8455640L, 54090L, 2L, 309L, 299161148L, 4952L, 454824L), Delta.Freq = c(5L, 
    78L, 88553L, 794L, 5L, 3859122L, 782L, 36L, 8756801L, 243169338L, 
    817789L, 8792384L, 7431L, 626921743L, 9206L, 95789L, 7916L, 8143453L, 
    6L, 4L, 6363L, 181125L, 259618L, 6751L, 33L, 37960L, 0L, 2L, 
    599582228L, 565585L, 19L, 48L, 269450424L, 70676581L, 7830566L, 
    4L, 86484313L, 21L, 90899794L, 2L, 72356L, 574280L, 869544L, 
    73418L, 6468164L, 2259L, 5938505L, 31329L, 1249L, 354L, 8817L, 
    3L, 2568L, 82809L, 29836269L, 5230L, 37L, 33752014L, 79307L, 
    1736L, 8522076L, 40L, 2289135L, 862L, 801448L, 8026L, 5L, 15L, 
    4393771L, 405914L, 71098L, 950288L, 8319L, 1396973L, 832L, 70L, 
    1746L, 61907L, 8709547L, 300750537L, 45862L, 91417085L, 79892L, 
    47765L, 5477L, 18L, 4186L, 2860L, 754038591L, 375L, 53809223L, 
    72L, 136L, 509L, 232325L, 13128104L, 1692L, 8581L, 23L), Delta.Time = c(1361082L, 
    7926L, 499L, 5004L, 3494530L, 213L, 64551179L, 70L, 797L, 5L, 
    72588L, 86976L, 5163L, 635080L, 3L, 91L, 919806257L, 81443L, 
    3135427L, 4410972L, 5810L, 8L, 46603718L, 422L, 1083626L, 48L, 
    15699890L, 7L, 90167635L, 446459879L, 2332071L, 761660L, 49218442L, 
    381L, 46L, 493197L, 46L, 798597155L, 45342274L, 6265842L, 6L, 
    3445819L, 351L, 1761227L, 214L, 959L, 908996387L, 6L, 3855L, 
    9096604L, 152664L, 7970052L, 32366926L, 31L, 5201618L, 114L, 
    7806411L, 70L, 239L, 5065L, 2L, 1L, 14472831L, 122042249L, 8L, 
    495604L, 29L, 8965478L, 2875L, 959L, 39L, 9L, 690L, 933626665L, 
    85294L, 580093L, 95934L, 982058L, 65244056L, 137508L, 29L, 7621L, 
    7527L, 72L, 2L, 315L, 6L, 2413L, 8625150L, 51298109L, 851L, 890460L, 
    160736L, 6L, 850842734L, 2L, 7L, 76969113L, 190536L), Peak.Time = c(1465265L, 
    452894L, 545076172L, 8226275L, 5040875L, 700530L, 1L, 3639L, 
    20141L, 71712131L, 686L, 923L, 770569738L, 69961L, 737458636L, 
    122403L, 199502046L, 6108L, 907L, 108078263L, 7817L, 4L, 6L, 
    69L, 721L, 786353L, 87486L, 1563L, 876L, 47599535L, 79295722L, 
    53L, 7378L, 591L, 6607935L, 954L, 6295L, 75514344L, 5742050L, 
    25647276L, 449L, 328566184L, 4L, 2L, 2703L, 21367543L, 63429043L, 
    708L, 782L, 909820L, 478L, 50L, 922L, 579882L, 7850L, 534L, 2157492L, 
    96L, 6L, 716L, 5L, 653290336L, 447854237L, 2L, 31972263L, 645L, 
    7L, 609909L, 4054695L, 455631L, 4919894L, 9L, 72713L, 9997L, 
    84090765L, 89742L, 5L, 5028L, 4126L, 23091L, 81L, 239635020L, 
    3576L, 898597785L, 6822L, 3798L, 201999L, 19624L, 20432923L, 
    18944093L, 930720236L, 1492302L, 300122L, 143633L, 5152743L, 
    417344L, 813L, 55792L, 78L), Center_Freq = c(61907L, 8709547L, 
    300750537L, 45862L, 91417085L, 79892L, 47765L, 5477L, 18L, 4186L, 
    2860L, 754038591L, 375L, 53809223L, 72L, 136L, 4700783L, 4173830L, 
    30L, 811L, 341014L, 780L, 44749L, 91L, 201620707L, 74L, 1L, 65422L, 
    595L, 89093186L, 946520L, 6940919L, 48744L, 2317845L, 5126197L, 
    2445L, 8L, 557450L, 450259742L, 21006647L, 9L, 7234027L, 59L, 
    9L, 651547554L, 45554L, 38493L, 91055218L, 38L, 1116474L, 2295482L, 
    3001L, 9L, 3270L, 141L, 53644L, 667983L, 565598L, 84L, 971L, 
    555498297L, 60431L, 6597L, 856943893L, 607815536L, 4406L, 79L, 
    4885076L, 745L, 8L, 56456L, 125140L, 73043989L, 516476L, 7L, 
    4440739L, 754038591L, 375L, 53809223L, 72L, 136L, 509L, 232325L, 
    13128104L, 1692L, 8581L, 23L, 5874213L, 4550L, 644668065L, 3712371L, 
    5928L, 8833L, 7L, 2186023L, 61627221L, 37297L, 716427989L, 21387L
    ), Start.Freq = c(426355L, 22073538L, 680374L, 41771L, 54L, 6762844L, 
    599171L, 108L, 257451851L, 438814L, 343045L, 4702L, 967787L, 
    1937L, 18L, 89301735L, 366L, 90L, 954L, 7337732L, 70891703L, 
    4139L, 10397931L, 940000382L, 7L, 38376L, 878528819L, 6287L, 
    738366L, 31L, 47L, 5L, 6L, 77848L, 2366508L, 45L, 3665842L, 7252260L, 
    6L, 61L, 3247L, 448348L, 1L, 705132L, 144L, 7423637L, 2L, 497L, 
    844927639L, 78978L, 914L, 131L, 7089563L, 927L, 9595581L, 2774463L, 
    1651L, 73509280L, 7L, 35L, 18L, 96L, 1L, 92545512L, 27354947L, 
    7556L, 65019L, 7480L, 71835L, 8249L, 64792L, 71537L, 349389666L, 
    280244484L, 82L, 6L, 40L, 353872L, 0L, 103L, 1255L, 4752L, 29L, 
    76L, 81185L, 14L, 9L, 470775630L, 818361265L, 57947209L, 44L, 
    24L, 41295L, 4L, 261449L, 9931404L, 773556640L, 930717L, 65007421L
    ), End.Freq = c(71000996L, 11613579L, 71377155L, 1942738L, 8760748L, 
    79L, 455L, 374L, 8L, 5L, 2266932L, 597833L, 155488L, 3020L, 4L, 
    554L, 4L, 16472L, 1945649L, 668181101L, 649780L, 22394365L, 93060602L, 
    172146L, 20472L, 23558847L, 190513L, 22759044L, 44L, 78450L, 
    205621181L, 218L, 69916344L, 23884L, 66L, 312148L, 7710564L, 
    4L, 422L, 744572L, 651547554L, 45554L, 38493L, 91055218L, 38L, 
    1116474L, 2295482L, 3001L, 9L, 3270L, 141L, 55595L, 38451L, 8660867L, 
    14L, 96L, 345L, 6L, 44L, 8235824L, 910517L, 1424326L, 87102566L, 
    53644L, 667983L, 565598L, 84L, 971L, 555498297L, 60431L, 6597L, 
    856943893L, 607815536L, 4406L, 79L, 7L, 28978746L, 7537295L, 
    6L, 633L, 345860066L, 802L, 1035131L, 602L, 2740L, 8065L, 61370968L, 
    429953765L, 981507L, 8105L, 343787257L, 44782L, 64184L, 12981359L, 
    123367978L, 818775L, 123745614L, 25345654L, 3L), Country = c("Holland", 
    "Holland", "Holland", "Holland", "Holland", "Holland", "Spain", 
    "Spain", "Spain", "Spain", "Spain", "Spain", "Spain", "Spain", 
    "Spain", "Spain", "Spain", "Spain", "Holland", "Holland", "Holland", 
    "Holland", "Holland", "Holland", "France", "France", "France", 
    "France", "France", "France", "France", "France", "France", "France", 
    "France", "Spain", "Spain", "Spain", "Spain", "Spain", "Spain", 
    "Spain", "Spain", "France", "France", "France", "France", "Holland", 
    "Holland", "Holland", "Holland", "Holland", "Holland", "Holland", 
    "Holland", "Holland", "Holland", "Holland", "Holland", "Holland", 
    "Spain", "Spain", "Spain", "Spain", "Spain", "Spain", "Spain", 
    "Holland", "Holland", "Holland", "Holland", "France", "France", 
    "France", "France", "France", "France", "France", "Spain", "Spain", 
    "Spain", "Spain", "Spain", "Spain", "Spain", "Spain", "Spain", 
    "Spain", "Spain", "Spain", "Spain", "Spain", "Spain", "Spain", 
    "Spain", "Spain", "France", "France", "France")), row.names = c(NA, 
    99L), class = "data.frame")

FJCC · June 10, 2022, 6:05am

Here is a link to the CRAN page for the HDMD package.
https://cran.r-project.org/package=HDMD
There is a link to an archive there from which you can download the latest version of the source code, which is from 2013.

To compile the source code, you would have to install the correct RTools program for your version of R. If you are using one of the 4.x versions of R, the RTools page is here:
https://cran.r-project.org/bin/windows/Rtools/
You can find instructions for the download and installation there.

Kaikash777 · June 10, 2022, 7:44am

Thank you so much, it worked really well. I really did appreciate your help:

#install.packages("biotools")
#install.packages("cluster")

library(biotools) #pairwise.mahalanobis function
library(cluster) #agnes function

group = matrix(Cluster_Dummy_2$Country) #what is being compared
group = t(group[,1]) #prepare for pairwise.mahalanobis function

variables = c(variables = c("Low.Freq", "High.Freq", "Peak.Freq", "Delta.Freq", "Delta.Time", "Peak.Time", 
                            "Center.Freq", "Start.Freq","End.Freq")) #variables (what is being used for comparison)

variables = as.matrix(Cluster_Dummy_2[,variables]) #prepare for pairwise.mahalanobis function
variables

#function for calculating the Mahalanobis distance

#REF:https://cran.r-project.org/src/contrib/Archive/HDMD/

#HDMD Package download

pairwise.mahalanobis = 
  function(x, grouping=NULL, cov =NULL, inverted=FALSE, digits = 5, ...) 
  {
    x <- if (is.vector(x))                                  #standardize input data as matrix
      matrix(x, ncol = length(x))
    else as.matrix(x)
    
    if(!is.matrix(x))
      stop("x could not be forced into a matrix")
    
    if(length(grouping) ==0){                               #no group assigned, uses first col
      grouping = t(x[1])
      x = x[2:dim(x)[2]]    
      cat("assigning grouping\n")
      print(grouping)
    }
    
    n <- nrow(x)
    p <- ncol(x)
    
    if (n != length(grouping)){                                 #grouping and matrix do not correspond
      cat(paste("n: ", n, "and groups: ", length(grouping), "\n"))
      stop("nrow(x) and length(grouping) are different")
    }
    g <- as.factor(grouping)
    g
    lev <- lev1 <- levels(g)                                # Groups
    counts <- as.vector(table(g))                           # No. of elements in each group
    
    if (any(counts == 0)) {                             # Remove grouping if not represented in data
      empty <- lev[counts == 0]
      warning(sprintf(ngettext(length(empty), "group %s is empty", 
                               "groups %s are empty"), paste(empty, collapse = " ")), 
              domain = NA)
      lev1 <- lev[counts > 0]
      g <- factor(g, levels = lev1)
      counts <- as.vector(table(g))
    }
    
    ng = length(lev1)
    
    group.means <- tapply(x, list(rep(g, p), col(x)), mean)     # g x p matrix of group means from x
    
    if(missing(cov)){                                           #create covariance matrix
      inverted = FALSE
      cov = cor(x)                                          #standardize into correlation mtx
    }
    else{                                                       #check cov of correct dimension
      if(dim(cov) != c(p,p))
        stop("cov matrix not of dim = (p,p)\n")
    }
    
    Distance = matrix(nrow=ng, ncol=ng)                                 #initialize distance matrix 
    dimnames(Distance) = list(names(group.means), names(group.means))
    
    Means = round(group.means, digits)
    Cov = round(cov, digits)
    Distance = round(Distance, digits)
    
    for(i in 1:ng){
      Distance[i,]= mahalanobis(group.means, group.means[i,], cov, inverted)
    }
    
    result <- list(means = group.means, cov = cov, distance = Distance)
    result
  }

Calculate the pairwise Mahalanobis distance

#Calculate the Mahalanobis distance
mahala_sq = pairwise.mahalanobis(x=variables, grouping=group) #get squared mahalanobis distances (see mahala_sq$distance).
mahala_sq

names = rownames(mahala_sq$means) #capture labels

mahala = sqrt(mahala_sq$distance) #mahalanobis distance
rownames(mahala) = names #set rownames in the dissimilarity matrix
colnames(mahala) = names #set colnames in the dissimilarity matrix

mahala 

#This is how I used the dissimilarity matrix to find clusters.
cluster = agnes(mahala,diss=TRUE,keep.diss=FALSE,method="complete") #hierarchical clustering
plot(cluster,which.plots=2) #plot dendrogram

system · July 1, 2022, 7:44am

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.