Cluster evaluation

Java-ML provides a large number of cluster evaluation measures that are provided in the package net.sf.javaml.clustering.evaluation. All scores are measures for the quality of the clustering, i.e. how well it reflects the properties of the data. Mostly they try to quantify how well the data is separated in logical units by the clustering algorithm.

All scores implement the double score(Dataset[]clusters) method. This method returns a score for an array of datasets that is returned from a clustering algorithm. Typical usage is illustrated in the code snippet below.

  1. /* We load some data */
  2. Dataset data = FileHandler.loadDataset(new File("iris.data"), 4, ",");
  3. /* We create a clustering algorithm, in this case the k-means
  4.  * algorithm with 4 clusters. */
  5. Clusterer km=new KMeans(4);
  6. /* We cluster the data */
  7. Dataset[] clusters = km.cluster(data);
  8. /* Create a measure for the cluster quality */
  9. ClusterEvaluation sse= new SumOfSquaredErrors();
  10. /* Measure the quality of the clustering */
  11. double score=sse.score(clusters);

[Documented source code]