I have a following dataset, this is a sample dataset which depicts the Products and its features(in each columns). It contains 12 features/columns for the 21 Products/records.
Data.df <- structure(list(Product.group = c("A Box", "A Box", "A Box", "A Box",
"A Box", "A Box", "A Box", "A Box", "B Box", "B Box", "B Box",
"B Box", "B Box", "B Box", "B Box", "B Box", "C Box", "C Box",
"C Box", "C Box", "C Box"), Performance = c("High", "High", "High",
"Medium", "Medium", "Low", "Low", "Low", "High", "High", "High",
"High", "Low", "Low", "Low", "Medium", "High", "High", "Low",
"Low", "Low"), Family = c("A1", "A1", "A1", "A1", "A1", "A2",
"A2", "A2", "B1", "B1", "B1", "B1", "B1", "B2", "B2", "B1", "C1",
"C2", "C1", "C1", "C2"), Product.ID = c("A111", "A112", "A113",
"A114", "A118", "A211", "A222", "A347", "AX12", "AX14", "AX16",
"AX18", "AY78", "AY89", "AY91", "B122", "AA11", "AA32", "AA43",
"AC21", "AC43"), Function = c("ELEC", "ELEC", "ELEC", "ELEC",
"ELEC", "ELEC", "ELEC", "ELEC", "ELEC", "ELEC", "GAS", "GAS",
"ELEC", "ELEC", "ELEC", "ELEC", "GAS", "GAS", "GAS", "GAS", "GAS"
), Voltage = c("G", "G", "G", "G", "G", "G", "G", "G", "G", "G",
"G", "G", "G", "G", "G", "G", "G", "G", "G", "G", "G"), Gas.stage = c(NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, "A112", "A112", NA, NA, NA,
NA, "A150", "A150", "A240", "A240", "A240"), Electric = c(NA,
NA, "22KW", NA, NA, NA, NA, NA, "22KW", NA, NA, NA, NA, NA, "7.5KW",
NA, NA, NA, NA, NA, NA), Drive = c("Direct", "Direct", "Direct",
"Direct", "Direct", "Direct", "Direct", "Direct", "Direct", "Direct",
"Direct", "Direct", "Direct", "Direct", "Direct", "Direct", "Direct",
"Direct", "Direct", "Direct", "Direct"), Exhaust = c(NA, NA,
NA, NA, NA, "Single", NA, NA, NA, NA, "Single", NA, NA, NA, NA,
NA, "Single", NA, NA, "Double", "Double"), Fuse = c("15A", "15A",
"15A", "15A", "15A", "15A", "15A", "15A", "15A", "15A", "15A",
"15A", "15A", "15A", "15A", "15A", "15A", "15A", "20A", NA, "15A"
), Accessory = c(NA, NA, NA, NA, NA, "Installed", NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, "Installed", NA)), row.names = c(NA,
21L), class = "data.frame")
Is there a way to subset this dataset into multiple groups based on unique values from some of the columns/features. The columns that are used for grouping are
- Product group
- Performance
- Family
- Function
- Voltage
Need to subset the entire data based on unique values from these columns in the same order mentioned as above and give this subset an unique group id.
I have just started learning about clustering and it would be of great help if anyone could advise on this.
Thank you