Feature subset selection

Subset selection algorithms differ with the scoring and ranking methods in that they only provide a set of features that are selected without further information on the quality of each feature individually.
Subset selection algorithms provide the method

  1. public Set<Integer> selectedAttributes();

which will return a set of feature indices that have been selected by the algorithm.

The basic use of a feature subset selection algorithm is depicted in the snippet below.

  1. /* Load the iris data set */
  2. Dataset data = FileHandler.loadDataset(new File("iris.data"), 4, ",");
  3. /* Construct a greedy forward subset selector */
  4. GreedyForwardSelection ga = new GreedyForwardSelection(1, new PearsonCorrelationCoefficient());
  5. /* Apply the algorithm to the data set */
  6. ga.build(data);
  7. /* Print out the attribute that has been selected */
  8. System.out.println(ga.selectedAttributes());
[Documented source code]
This examples create a greedy forward selection algorithm that will select one ('the best') feature. To determine the quality of the feature in this example the Pearson correlation is used.