Dear All
I'm solving one project
Use the following clustering algorithms:
o K means
o Hierarchical
• Identify the right number of customer segments
• Provide the number of customers who are highly valued
• Identify the clustering algorithm that gives maximum accuracy and explains robust clusters.
• If the number of observations is loaded in one of the clusters, break down that cluster further
using clustering algorithm.
i've tried to solve but i'm not able to get right number of customer segments
can anyone help me out in this
i'm attaching below my code also
setwd(choose.dir())
getwd()
Ecomm_data = read.csv("Ecommerce.csv")
del_var <- sapply(Ecomm_data, is.numeric)
names(which(del_var==TRUE))
Ecomm_data_num = Ecomm_data[names(which(del_var==TRUE))]
Ecomm_data_num = na.omit(Ecomm_data_num)
Ecomm_data_num
summary(Ecomm_data_num)
IQR_Quantity = 12-2
Up_quantity<- 12+1.5IQR_Quantity
Up_quantity
IQR_UnitPrice = 3.75-1.25
Up_UnitPrice<- 3.75+1.5IQR_UnitPrice
Up_UnitPrice
IQR_Cust = 16791-13953
Up_Cust<- 16791+1.5*IQR_UnitPrice
Up_Cust
Clean_data<- subset(Ecomm_data_num,Quantity<25 & Quantity>=2 & UnitPrice< 5.5 & CustomerID<= 16794.75)
boxplot(Clean_data)
Clean_data = scale(Clean_data)
Clean_data
Ecomm_data_num1 = scale(Clean_data[1:1000,])
Ecomm_data_num1
segment <- dist(Ecomm_data_num1, method = "euclidian")
fit <- hclust(segment, method = "ward.D2")
plot(fit)
groups <- cutree(fit,k=6)
groups
mean(c1$Quantity)
c1<- subset(Ecomm_data_num1,groups == 1)
c1
c2<- subset(Ecomm_data_num1,groups == 2)
c2
c3<- subset(Ecomm_data_num1,groups == 3)
c3
c4<- subset(Ecomm_data_num1,groups == 4)
c4
c5<- subset(Ecomm_data_num1,groups == 5)
c5
c6<- subset(Ecomm_data_num1,groups == 6)
c6
cluster = kmeans(Ecomm_data_num1,6,iter.max=10)
str(cluster)
Ecomm_data_num1 = cbind(Ecomm_data_num1, num_cluster=cluster$cluster)
View(Ecomm_data_num1)