Evaluate classifier on a dataset

This tutorial shows you how you can test the performance of a classifier on a data set. This tutorial will introduce two classes. EvaluateDataset, which allows you to test a classifier on a data set and it will also introduce PerformanceMeasure. This class is used to store information regarding the performance of a classifier.

Evaluate a classifier on a dataset

  1. Dataset data = FileHandler.loadDataset(new File("devtools/data/iris.data"), 4, ",");
  2. Classifier knn = new KNearestNeighbors(5);
  3. knn.buildClassifier(data);
  4. Dataset dataForClassification = FileHandler.loadDataset(new File("devtools/data/iris.data"), 4, ",");
  5.  
  6. Map<Object, PerformanceMeasure> pm = EvaluateDataset.testDataset(knn, dataForClassification);
  7. for(Object o:pm.keySet())
  8. System.out.println(o+": "+pm.get(o).getAccuracy());

[Documented source code]
This sample loads the iris data set, constructs a 5-nearest neighbor classifier and loads the iris data again.

The testDataset method will use the trained classifier to predict the labels for all instances in the supplied data set. The performance of the classifier is returned as a map that contains for each class a performance measure. A PerformanceMeasure is a wrapper around the values for the true positives, true negatives, false positives and false negatives. This class also provides a number of convenience method to calculate a number of aggregate measures like accuracy, f-score, recall, precision, sensitivity, specificity, etc.