RStudio Error with fviz_nbclust() for NbClust() results

I was teaching a class today and ran into an error when trying to visualize the results of the NbClust() function.

Here's my code and resulting error. I'm using R version 4.3.1, factoextra version 1.0.7, NbClust version 3.0.1, and the R dataset "USArrests".
I saw a post on another site that indicates that there is a problem with the resulting data in Best.nc, but I haven't been able to determine a solution for this yet. Any ideas?

#> load data
data("USArrests")
View(USArrests)

#> scale
data <- scale(USArrests)

View(data)
#> choose optimal value for k with NbClust()
library(NbClust)
nb <- NbClust(data,
              distance = 'euclidean',
              method = 'kmeans',
              min.nc = 2,
              max.nc = 15,
              index = "all"
              )

#> visualize
library(factoextra)
fviz_nbclust(nb)

#>Error in if (class(best_nc) == "numeric") print(best_nc) else if (class(best_nc) ==  : 
#>  the condition has length > 1

Indeed, in the source code of fviz_nbclust() you can see this:

if (inherits(x, "list") & "Best.nc" %in% names(x)) {
    best_nc <- x$Best.nc
    if (class(best_nc) == "numeric") 
      print(best_nc)

But in your data:

class(best_nc)
#> [1] "matrix" "array"

so the test fails. This appears to be a bug in {factoextra}, it used to be OK, but has become an error since R4.2.0. I think this has already been reported, and a fix offered, but the authors haven't followed up (they don't seem to have been active on this package since 2020).

In your case, you can look at the source code and notice that the part of interest is relatively small, so you can define your own function with a copy-paste:

my_fviz_nbclust <- function(x, print.summary = TRUE, barfill = "steelblue", barcolor = "steelblue"){
  best_nc <- x$Best.nc
  best_nc <- as.data.frame(t(best_nc), stringsAsFactors = TRUE)
  best_nc$Number_clusters <- as.factor(best_nc$Number_clusters)
  
  ss <- summary(best_nc$Number_clusters)
  cat("Among all indices: \n===================\n")
  for (i in 1:length(ss)) {
    cat("*", ss[i], "proposed ", names(ss)[i], "as the best number of clusters\n")
  }
  cat("\nConclusion\n=========================\n")
  cat("* According to the majority rule, the best number of clusters is ", 
      names(which.max(ss)), ".\n\n")
  
  df <- data.frame(Number_clusters = names(ss), freq = ss, 
                   stringsAsFactors = TRUE)
  p <- ggpubr::ggbarplot(df, x = "Number_clusters", y = "freq", 
                         fill = "steelblue", color = "steelblue") +
    ggplot2::labs(x = "Number of clusters k", 
                 y = "Frequency among all indices",
                 title = paste0("Optimal number of clusters - k = ", 
                                names(which.max(ss))))
  p
}

which should be equivalent to what the package did on R < 4.2

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.