# Clustering based on one Column alone

Hey guys!

Firstly, thank you all for the space fo post questions regarding R. I just started using R regularly at work, and thus, questions are now coming!

My main question here is: is it possible to cluster based off on 1 single column?

I have customer purchase amounts, and would like to cluster them either with Kmeans, or with any other function available.

The examples I have found always have 2 or more columns, which are used to cluster the data together. Is this possible?

The column really only has the name "Purchase Amount", and all the purchase amounts in that column. I am looking for the best way to group them together, but can't seem to figure out how to do it.

Cheers!
Victor

To plot a vector, `v`, in a way that shows some classification of the values of `v` into different categories. The classification derived from some algorithm must be identified to the plot method either explicitly as a separate variable (such as a vector of kmeans clusters) or implicitly provided by algorithm. For example, there's the binning performed by `hist()`.

Here's an example of using kmeans clustering of a single continuous variable to create a plot of the `mtcars\$mpg` variable according to membership in a kmeans cluster. Other interval algorithms can also be specified.

``````# Load required packages
library(classInt)
library(ggplot2)

data(mtcars)

# Calculate kmeans intervals
num_intervals <- 5
kmeans_intervals <- classIntervals(mtcars\$mpg, num_intervals, style = "kmeans")

# Assign each mpg value to its corresponding interval
mtcars\$kmeans_group <- cut(mtcars\$mpg, breaks = kmeans_intervals\$brks, include.lowest = TRUE, labels = FALSE)

# Create the dotplot using ggplot2
ggplot(mtcars, aes(x = factor(kmeans_group), y = mpg)) +
geom_dotplot(binaxis = "y", stackdir = "center", dotsize = 0.5) +
labs(title = "Dotplot of mtcars\$mpg Divided into kmeans Intervals",
x = "kmeans Group",
y = "Miles per Gallon (mpg)") +
coord_flip() +
theme_minimal()
#> Bin width defaults to 1/30 of the range of the data. Pick better value with
#> `binwidth`.
``````

Created on 2023-10-02 with reprex v2.0.2

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.