Hello,
I have a problem figuring out if R can help me with my work. I want to start by saying that I am new to this world, I recently started writing commands, and generally using these systems.
I had a starting file that contained this information:
NAME NAME_two column1 column2 column3 up to column10 and finally CLASS
Aa aae 5 3 4 3 3 0 5 1 2 4 YES
Ab and 11 3 5 6 4 5 5 2 3 2 NOT
Ac acd 9 4 4 2 7 5 5 3 6 1 NOT
Ad aaqff 0 2 0 1 0 2 1 1 0 YES
Ae ewg 1 0 2 1 1 0 4 1 0 0 NOT
Af wegv 10 5 9 5 6 0 3 2 3 7 NOT
Ag rwg 10 5 10 6 5 0 3 1 4 4 NOT
Ah wfq 1 0 2 0 1 0 2 1 1 0 NOT
Ai he 1 0 2 2 2 0 4 1 0 0 NOT
Al efgwa 0 0 1 0 1 0 1 0 1 0 NOT
Am h4h 0 0 3 1 1 0 1 0 1 0 NOT
So there are 10 columns with variable numbers (from 0 onwards) and at the end the name of a class (the classes are two: YES or NOT). The elements examined in this way are around 17,000.
With SOMbrero I have created clusters.
At this point, I would like to see if these created clusters make sense, if they were done well or if even the same clusters could be created by chance.
So here I am with the question: can I do this type of analysis with R? Is there a way to understand, to give value to these clusters, and to understand who has worked better and who worse?
I saw that there is a clValid package that could be useful to me: in particular the BSI functions or the index Davies Bouldin. But I didn't understand how I can use them in my case, I don't know how to write this analysis on R. And above all if these analyzes really serve to do what I want.
Thanks for your attention and for who can help me.
Best regards
Francesco Coppola