monothetic cluster analysis

Hello everybody! i need your help!
I m trying to do a cluster analysis with a binary dataframe in which i have 37 rows(companies) and 20 columns that are features (pos,insant payment ecc), and i would like to cluster them based on which feature they have (1) or they haven t (0).
Could you help me? thank in advance

You have 740 points in 20 dimensions, so to see clustering the dimensions must be reduced. One way to do this is with principal component analysis adapted to binary data. Here's an example from the {logisticPCA} package.

# install.packages("rARPACK")
# devtools::install_github("andland/logisticPCA")
library(logisticPCA)
library(ggplot2)
data("house_votes84")
logsvd_model = logisticSVD(house_votes84, k = 2)
logsvd_model
#> 435 rows and 16 columns
#> Rank 2 solution
#> 
#> 63.6% of deviance explained
#> 549 iterations to converge
logpca_cv = cv.lpca(house_votes84, ks = 2, ms = 1:10)
plot(logpca_cv) + theme_minimal()
#> Warning in type.convert.default(colnames(x)): 'as.is' should be specified by
#> the caller; using TRUE
#> Warning in type.convert.default(rownames(x)): 'as.is' should be specified by
#> the caller; using TRUE

logpca_model = logisticPCA(house_votes84, k = 2, m = which.min(logpca_cv))
clogpca_model = convexLogisticPCA(house_votes84, k = 2, m = which.min(logpca_cv))
plot(clogpca_model, type = "trace") + theme_minimal()

plot(logsvd_model, type = "trace") + theme_minimal()

party = rownames(house_votes84)
plot(logsvd_model, type = "scores") + 
  geom_point(aes(colour = party)) + 
  ggtitle("Exponential Family PCA") + 
  scale_colour_manual(values = c("blue", "red")) +
  theme_minimal()

Created on 2023-06-29 with reprex v2.0.2

Thank you so much for your reply, but i need all of the 20 features and with pca i will lost some of the data. Can i do in another way?

The data is still all there. It's just been transformed. The axes of the plot are not variables. They represent Euclidian distances in 20-space. We don't have good intuition for high-dimensional spaces. Almost everyone can handle two dimensions, walking around looking at the floor. And also three dimensions, walking around looking at the pavement and the buildings. Four dimensions is harder—we need something like a stop-motion hologram. Five dimensions leave our visual imaginations behind and we're into pure mathematical relationships. You'll want to do some reading on the theory behind PCA.

This topic was automatically closed 42 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.