# Calculate All Combinations of Pairwise Mahalanobis Distance Between Multiple Variables and a Predictor Variable With Three Classes in R

I'm having some trouble calculating the Mahalanobis Distance between three classes of the independent variable `Country` .

My aim is to calculate the `Mahalanobis distance` among dolphin whistle acoustic parameters measured from a spectrogram taken from `three populations` in `three different study regions` . I want to calculate potential `(dis)similarity` amongst whistle types.

• My dependent variable has three classes `"France", "Spain" and "Italy"`
• My data frame has are multiple variables (features of whistle type aspects) to be taken into account.
• There are multiple observations per group (obs = 367; indicating the data frame contains more than one row per Country).

Dataframe structure

``````'data.frame':   367 obs. of  10 variables:
\$ Country    : Factor w/ 3 levels "Holland","France",..: 1 1 1 1 1 1 1 1 1 1 ...
\$ Low.Freq   : num  -0.245 -1.846 -1.369 -2.022 2.045 ...
\$ High.Freq  : num  -0.684 -2.033 -2.128 -2.643 0.358 ...
\$ Peak.Freq  : num  -0.0879 -1.3636 -0.8557 -1.4291 0.2001 ...
\$ Delta.Freq : num  -0.384 -0.588 -0.922 -0.973 -0.664 ...
\$ Delta.Time : num  -1.48 -1.124 -0.861 -0.999 -1.696 ...
\$ Peak.Time  : num  0.0383 0.3698 0.3703 -0.7444 -0.3214 ...
\$ Center.Freq: num  -0.169 -1.157 -1.032 -1.556 0.16 ...
\$ Start.Freq : num  -0.741 -0.944 -1.149 -1.712 0.905 ...
\$ End.Freq   : num  -0.335 -1.495 -1.561 -2.242 0.704 ...
``````

I found a StackOverflow question that covered exactly what I want to achieve here but the problem is they conduct a `pairwise mahalanobis distance analysis` using the function `pairwise.mahalanobis()` from the `HDMD package` . However, when I try and download the package, I get this message.

Warning Message

``````Warning in install.packages :
package ‘HDMD’ is not available for this version of R
A version of this package for your version of R might be available elsewhere,
see the ideas at

#Ive tried these different methods to find the package but nothing works

ap <- available.packages()
View(ap)
"HDMD" %in% rownames(ap)

install.packages("HDMD", dependencies = TRUE)

av <- available.packages(filters=list())
av[av[, "Package"] == HDMD, ]

install.packages('HDMD',repos='http://cran.us.r-project.org')
``````

I can't seem to find another function that behaves in the same way as the function `pairwise.mahalanobis()` except for a loop on this StackOverflow question, here, which I also could not get to work.

Does anyone know how to achieve the type of analysis I'm attempting because I can't figure out how?

If anyone can lend a hand, I would be very grateful.

R-CODE

``````    library(HDMD) #pairwise.mahalanobis function
library(cluster) #agnes function
library(biotools)

group = matrix(Cluster_Dummy_2\$Country) #what is being compared
group = t(group[,1]) #prepare for pairwise.mahalanobis function

variables = c(variables = c("Low.Freq", "High.Freq", "Peak.Freq", "Delta.Freq", "Delta.Time", "Peak.Time",
"Center.Freq", "Start.Freq","End.Freq")) #variables (what is being used for comparison)
variables = as.matrix(Cluster_Dummy_2[,variables]) #prepare for pairwise.mahalanobis function
``````

The original code has the function pairwise.mahalanobis() in the HDMD package:

``````mahala_sq = pairwise.mahalanobis(x=variables, grouping=group) #get squared mahalanobis distances (see mahala_sq\$distance).
``````

I decided to try D2.dist() using the biotools package

``````mahala_sq = D2.dist(variables, group) #get squared mahalanobis distances (see mahala_sq\$distance).
names = rownames(mahala_sq\$means) #capture labels

#Error
Error in D2.dist(variables, group, inverted = FALSE) :
incompatible dimensions!
``````

Rest of the code to produce a dendrogram

``````mahala = sqrt(mahala_sq\$distance) #mahalanobis distance
rownames(mahala) = names #set rownames in the dissimilarity matrix
colnames(mahala) = names #set colnames in the dissimilarity matrix

#This is how I used the dissimilarity matrix to find clusters.
cluster = agnes(mahala,diss=TRUE,keep.diss=FALSE,method="complete") #hierarchical clustering
plot(cluster,which.plots=2) #plot dendrogram
``````

Data frame

``````structure(list(Low.Freq = c(435L, 94103292L, 1L, 2688L, 8471L,
28818L, 654755585L, 468628164L, 342491L, 2288474L, 3915L, 411L,
267864894L, 3312618L, 5383L, 8989443L, 1894L, 534981L, 9544861L,
3437614L, 475386L, 7550764L, 48744L, 2317845L, 5126197L, 2445L,
8L, 557450L, 450259742L, 21006647L, 9L, 7234027L, 59L, 9L, 605L,
9199L, 3022L, 30218156L, 46423L, 38L, 88L, 396396244L, 28934316L,
7723L, 95688045L, 679354L, 716352L, 76289L, 332826763L, 6L, 90975L,
83103577L, 9529L, 229093L, 42810L, 5L, 18175302L, 1443751L, 5831L,
8303661L, 86L, 778L, 23947L, 8L, 9829740L, 2075838L, 7434328L,
82174987L, 2L, 94037071L, 9638653L, 5L, 3L, 65972L, 0L, 936779338L,
4885076L, 745L, 8L, 56456L, 125140L, 73043989L, 516476L, 7L,
4440739L, 612L, 3966L, 8L, 9255L, 84127L, 96218L, 5690L, 56L,
3561L, 78738L, 1803363L, 809369L, 7131L, 0L), High.Freq = c(6071L,
3210L, 6L, 7306092L, 6919054L, 666399L, 78L, 523880161L, 4700783L,
4173830L, 30L, 811L, 341014L, 780L, 44749L, 91L, 201620707L,
74L, 1L, 65422L, 595L, 89093186L, 946520L, 6940919L, 655350L,
4L, 6L, 618L, 2006697L, 889L, 1398L, 28769L, 90519642L, 984L,
0L, 296209525L, 487088392L, 5L, 894L, 529L, 5L, 99106L, 2L, 926017L,
9078L, 1L, 21L, 88601017L, 575770L, 48L, 8431L, 194L, 62324996L,
5L, 81L, 40634727L, 806901520L, 6818173L, 3501L, 91780L, 36106039L,
5834347L, 58388837L, 34L, 3280L, 6507606L, 19L, 402L, 584L, 76L,
4078684L, 199L, 6881L, 92251L, 81715L, 40L, 327L, 57764L, 97668898L,
2676483L, 76L, 4694L, 817120L, 51L, 116712L, 666L, 3L, 42841L,
9724L, 21L, 4L, 359L, 2604L, 22L, 30490L, 5640L, 34L, 51923625L,
35544L), Peak.Freq = c(87005561L, 9102L, 994839015L, 42745869L,
32840L, 62737133L, 2722L, 24L, 67404881L, 999242982L, 3048L,
85315406L, 703037627L, 331264L, 8403609L, 3934064L, 50578953L,
370110665L, 3414L, 12657L, 40L, 432L, 7707L, 214L, 68588962L,
69467L, 75L, 500297L, 704L, 1L, 102659072L, 60896923L, 4481230L,
94124925L, 60164619L, 447L, 580L, 8L, 172L, 9478521L, 20L, 53L,
3072127L, 2160L, 27301893L, 8L, 4263L, 508L, 712409L, 50677L,
522433683L, 112844L, 193385L, 458269L, 93578705L, 22093131L,
6L, 9L, 1690461L, 0L, 4L, 652847L, 44767L, 21408L, 5384L, 304L,
721L, 651147L, 2426L, 586L, 498289375L, 945L, 6L, 816L, 46207L,
39135L, 6621028L, 66905L, 26905085L, 4098L, 0L, 14L, 88L, 530L,
97809006L, 90L, 6L, 260792844L, 9L, 833205723L, 99467321L, 5L,
8455640L, 54090L, 2L, 309L, 299161148L, 4952L, 454824L), Delta.Freq = c(5L,
78L, 88553L, 794L, 5L, 3859122L, 782L, 36L, 8756801L, 243169338L,
817789L, 8792384L, 7431L, 626921743L, 9206L, 95789L, 7916L, 8143453L,
6L, 4L, 6363L, 181125L, 259618L, 6751L, 33L, 37960L, 0L, 2L,
599582228L, 565585L, 19L, 48L, 269450424L, 70676581L, 7830566L,
4L, 86484313L, 21L, 90899794L, 2L, 72356L, 574280L, 869544L,
73418L, 6468164L, 2259L, 5938505L, 31329L, 1249L, 354L, 8817L,
3L, 2568L, 82809L, 29836269L, 5230L, 37L, 33752014L, 79307L,
1736L, 8522076L, 40L, 2289135L, 862L, 801448L, 8026L, 5L, 15L,
4393771L, 405914L, 71098L, 950288L, 8319L, 1396973L, 832L, 70L,
1746L, 61907L, 8709547L, 300750537L, 45862L, 91417085L, 79892L,
47765L, 5477L, 18L, 4186L, 2860L, 754038591L, 375L, 53809223L,
72L, 136L, 509L, 232325L, 13128104L, 1692L, 8581L, 23L), Delta.Time = c(1361082L,
7926L, 499L, 5004L, 3494530L, 213L, 64551179L, 70L, 797L, 5L,
72588L, 86976L, 5163L, 635080L, 3L, 91L, 919806257L, 81443L,
3135427L, 4410972L, 5810L, 8L, 46603718L, 422L, 1083626L, 48L,
15699890L, 7L, 90167635L, 446459879L, 2332071L, 761660L, 49218442L,
381L, 46L, 493197L, 46L, 798597155L, 45342274L, 6265842L, 6L,
3445819L, 351L, 1761227L, 214L, 959L, 908996387L, 6L, 3855L,
9096604L, 152664L, 7970052L, 32366926L, 31L, 5201618L, 114L,
7806411L, 70L, 239L, 5065L, 2L, 1L, 14472831L, 122042249L, 8L,
495604L, 29L, 8965478L, 2875L, 959L, 39L, 9L, 690L, 933626665L,
85294L, 580093L, 95934L, 982058L, 65244056L, 137508L, 29L, 7621L,
7527L, 72L, 2L, 315L, 6L, 2413L, 8625150L, 51298109L, 851L, 890460L,
160736L, 6L, 850842734L, 2L, 7L, 76969113L, 190536L), Peak.Time = c(1465265L,
452894L, 545076172L, 8226275L, 5040875L, 700530L, 1L, 3639L,
20141L, 71712131L, 686L, 923L, 770569738L, 69961L, 737458636L,
122403L, 199502046L, 6108L, 907L, 108078263L, 7817L, 4L, 6L,
69L, 721L, 786353L, 87486L, 1563L, 876L, 47599535L, 79295722L,
53L, 7378L, 591L, 6607935L, 954L, 6295L, 75514344L, 5742050L,
25647276L, 449L, 328566184L, 4L, 2L, 2703L, 21367543L, 63429043L,
708L, 782L, 909820L, 478L, 50L, 922L, 579882L, 7850L, 534L, 2157492L,
96L, 6L, 716L, 5L, 653290336L, 447854237L, 2L, 31972263L, 645L,
7L, 609909L, 4054695L, 455631L, 4919894L, 9L, 72713L, 9997L,
84090765L, 89742L, 5L, 5028L, 4126L, 23091L, 81L, 239635020L,
3576L, 898597785L, 6822L, 3798L, 201999L, 19624L, 20432923L,
18944093L, 930720236L, 1492302L, 300122L, 143633L, 5152743L,
417344L, 813L, 55792L, 78L), Center_Freq = c(61907L, 8709547L,
300750537L, 45862L, 91417085L, 79892L, 47765L, 5477L, 18L, 4186L,
2860L, 754038591L, 375L, 53809223L, 72L, 136L, 4700783L, 4173830L,
30L, 811L, 341014L, 780L, 44749L, 91L, 201620707L, 74L, 1L, 65422L,
595L, 89093186L, 946520L, 6940919L, 48744L, 2317845L, 5126197L,
2445L, 8L, 557450L, 450259742L, 21006647L, 9L, 7234027L, 59L,
9L, 651547554L, 45554L, 38493L, 91055218L, 38L, 1116474L, 2295482L,
3001L, 9L, 3270L, 141L, 53644L, 667983L, 565598L, 84L, 971L,
555498297L, 60431L, 6597L, 856943893L, 607815536L, 4406L, 79L,
4885076L, 745L, 8L, 56456L, 125140L, 73043989L, 516476L, 7L,
4440739L, 754038591L, 375L, 53809223L, 72L, 136L, 509L, 232325L,
13128104L, 1692L, 8581L, 23L, 5874213L, 4550L, 644668065L, 3712371L,
5928L, 8833L, 7L, 2186023L, 61627221L, 37297L, 716427989L, 21387L
), Start.Freq = c(426355L, 22073538L, 680374L, 41771L, 54L, 6762844L,
599171L, 108L, 257451851L, 438814L, 343045L, 4702L, 967787L,
1937L, 18L, 89301735L, 366L, 90L, 954L, 7337732L, 70891703L,
4139L, 10397931L, 940000382L, 7L, 38376L, 878528819L, 6287L,
738366L, 31L, 47L, 5L, 6L, 77848L, 2366508L, 45L, 3665842L, 7252260L,
6L, 61L, 3247L, 448348L, 1L, 705132L, 144L, 7423637L, 2L, 497L,
844927639L, 78978L, 914L, 131L, 7089563L, 927L, 9595581L, 2774463L,
1651L, 73509280L, 7L, 35L, 18L, 96L, 1L, 92545512L, 27354947L,
7556L, 65019L, 7480L, 71835L, 8249L, 64792L, 71537L, 349389666L,
280244484L, 82L, 6L, 40L, 353872L, 0L, 103L, 1255L, 4752L, 29L,
76L, 81185L, 14L, 9L, 470775630L, 818361265L, 57947209L, 44L,
24L, 41295L, 4L, 261449L, 9931404L, 773556640L, 930717L, 65007421L
), End.Freq = c(71000996L, 11613579L, 71377155L, 1942738L, 8760748L,
79L, 455L, 374L, 8L, 5L, 2266932L, 597833L, 155488L, 3020L, 4L,
554L, 4L, 16472L, 1945649L, 668181101L, 649780L, 22394365L, 93060602L,
172146L, 20472L, 23558847L, 190513L, 22759044L, 44L, 78450L,
205621181L, 218L, 69916344L, 23884L, 66L, 312148L, 7710564L,
4L, 422L, 744572L, 651547554L, 45554L, 38493L, 91055218L, 38L,
1116474L, 2295482L, 3001L, 9L, 3270L, 141L, 55595L, 38451L, 8660867L,
14L, 96L, 345L, 6L, 44L, 8235824L, 910517L, 1424326L, 87102566L,
53644L, 667983L, 565598L, 84L, 971L, 555498297L, 60431L, 6597L,
856943893L, 607815536L, 4406L, 79L, 7L, 28978746L, 7537295L,
6L, 633L, 345860066L, 802L, 1035131L, 602L, 2740L, 8065L, 61370968L,
429953765L, 981507L, 8105L, 343787257L, 44782L, 64184L, 12981359L,
123367978L, 818775L, 123745614L, 25345654L, 3L), Country = c("Holland",
"Holland", "Holland", "Holland", "Holland", "Holland", "Spain",
"Spain", "Spain", "Spain", "Spain", "Spain", "Spain", "Spain",
"Spain", "Spain", "Spain", "Spain", "Holland", "Holland", "Holland",
"Holland", "Holland", "Holland", "France", "France", "France",
"France", "France", "France", "France", "France", "France", "France",
"France", "Spain", "Spain", "Spain", "Spain", "Spain", "Spain",
"Spain", "Spain", "France", "France", "France", "France", "Holland",
"Holland", "Holland", "Holland", "Holland", "Holland", "Holland",
"Holland", "Holland", "Holland", "Holland", "Holland", "Holland",
"Spain", "Spain", "Spain", "Spain", "Spain", "Spain", "Spain",
"Holland", "Holland", "Holland", "Holland", "France", "France",
"France", "France", "France", "France", "France", "Spain", "Spain",
"Spain", "Spain", "Spain", "Spain", "Spain", "Spain", "Spain",
"Spain", "Spain", "Spain", "Spain", "Spain", "Spain", "Spain",
"Spain", "Spain", "France", "France", "France")), row.names = c(NA,
99L), class = "data.frame")
``````

Here is a link to the CRAN page for the HDMD package.
https://cran.r-project.org/package=HDMD

To compile the source code, you would have to install the correct RTools program for your version of R. If you are using one of the 4.x versions of R, the RTools page is here:
https://cran.r-project.org/bin/windows/Rtools/

Thank you so much, it worked really well. I really did appreciate your help:

``````#install.packages("biotools")
#install.packages("cluster")

library(biotools) #pairwise.mahalanobis function
library(cluster) #agnes function

group = matrix(Cluster_Dummy_2\$Country) #what is being compared
group = t(group[,1]) #prepare for pairwise.mahalanobis function

variables = c(variables = c("Low.Freq", "High.Freq", "Peak.Freq", "Delta.Freq", "Delta.Time", "Peak.Time",
"Center.Freq", "Start.Freq","End.Freq")) #variables (what is being used for comparison)

variables = as.matrix(Cluster_Dummy_2[,variables]) #prepare for pairwise.mahalanobis function
variables

#function for calculating the Mahalanobis distance

#REF:https://cran.r-project.org/src/contrib/Archive/HDMD/

pairwise.mahalanobis =
function(x, grouping=NULL, cov =NULL, inverted=FALSE, digits = 5, ...)
{
x <- if (is.vector(x))                                  #standardize input data as matrix
matrix(x, ncol = length(x))
else as.matrix(x)

if(!is.matrix(x))
stop("x could not be forced into a matrix")

if(length(grouping) ==0){                               #no group assigned, uses first col
grouping = t(x[1])
x = x[2:dim(x)[2]]
cat("assigning grouping\n")
print(grouping)
}

n <- nrow(x)
p <- ncol(x)

if (n != length(grouping)){                                 #grouping and matrix do not correspond
cat(paste("n: ", n, "and groups: ", length(grouping), "\n"))
stop("nrow(x) and length(grouping) are different")
}
g <- as.factor(grouping)
g
lev <- lev1 <- levels(g)                                # Groups
counts <- as.vector(table(g))                           # No. of elements in each group

if (any(counts == 0)) {                             # Remove grouping if not represented in data
empty <- lev[counts == 0]
warning(sprintf(ngettext(length(empty), "group %s is empty",
"groups %s are empty"), paste(empty, collapse = " ")),
domain = NA)
lev1 <- lev[counts > 0]
g <- factor(g, levels = lev1)
counts <- as.vector(table(g))
}

ng = length(lev1)

group.means <- tapply(x, list(rep(g, p), col(x)), mean)     # g x p matrix of group means from x

if(missing(cov)){                                           #create covariance matrix
inverted = FALSE
cov = cor(x)                                          #standardize into correlation mtx
}
else{                                                       #check cov of correct dimension
if(dim(cov) != c(p,p))
stop("cov matrix not of dim = (p,p)\n")
}

Distance = matrix(nrow=ng, ncol=ng)                                 #initialize distance matrix
dimnames(Distance) = list(names(group.means), names(group.means))

Means = round(group.means, digits)
Cov = round(cov, digits)
Distance = round(Distance, digits)

for(i in 1:ng){
Distance[i,]= mahalanobis(group.means, group.means[i,], cov, inverted)
}

result <- list(means = group.means, cov = cov, distance = Distance)
result
}

Calculate the pairwise Mahalanobis distance

#Calculate the Mahalanobis distance
mahala_sq = pairwise.mahalanobis(x=variables, grouping=group) #get squared mahalanobis distances (see mahala_sq\$distance).
mahala_sq

names = rownames(mahala_sq\$means) #capture labels

mahala = sqrt(mahala_sq\$distance) #mahalanobis distance
rownames(mahala) = names #set rownames in the dissimilarity matrix
colnames(mahala) = names #set colnames in the dissimilarity matrix

mahala

#This is how I used the dissimilarity matrix to find clusters.
cluster = agnes(mahala,diss=TRUE,keep.diss=FALSE,method="complete") #hierarchical clustering
plot(cluster,which.plots=2) #plot dendrogram
``````

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.