Classification basics

This tutorial explains the basics of setting up a classifier, training the algorithm and evaluating its performance. First we need to initialize a classifier, next we can train it with some data, and finally we can use it to classify new instances.

Creating a classifier
The following sample loads data from the iris data set, next we construct a K-nearest neighbor classifier and we train it with the data.

  1. /* Load a data set */
  2. Dataset data = FileHandler.loadDataset(new File("devtools/data/iris.data"), 4, ",");
  3. /* Contruct a KNN classifier that uses 5 neighbors to make a
  4.   *decision. */
  5. Classifier knn = new KNearestNeighbors(5);
  6. knn.buildClassifier(data);

[Documented source code]

Note that the build method of a classifier may modify the Dataset that is provided as parameter.

Evaluating the performance of a classifier
Now that we have constructed and trained a classifier, we can use it to classify new Instances. In this example we will reload the iris data set and use the trained classifier to predict the class label for each Instance.

  1. Dataset dataForClassification = FileHandler.loadDataset(new File("devtools/data/iris.data"), 4, ",");
  2. /* Counters for correct and wrong predictions. */
  3. int correct = 0, wrong = 0;
  4. /* Classify all instances and check with the correct class values */
  5. for (Instance inst : dataForClassification) {
  6. Object predictedClassValue = knn.classify(inst);
  7. Object realClassValue = inst.classValue();
  8. if (predictedClassValue.equals(realClassValue))
  9. correct++;
  10. else
  11. wrong++;
  12. }

This example will go over all instances in the iris data set and try to predict its class by majority voting on its 5 neighbors. In this example this will result in 145 correct predictions and 5 wrong ones.

Note that this is not the proper way to do validation of a classifer. For the proper technique, look at cross validation.