Sampling and bootstrapping

Sampling can be done with the Sampling class.

The most interesting methods are the different signatures of 'sample'

  1. public Pair<Dataset, Dataset> sample(Dataset data)
  2. public Pair<Dataset, Dataset> sample(Dataset data, int size)
  3. public Pair<Dataset, Dataset> sample(Dataset data, int size, long seed)

The data set is the one from which to sample. The size is the number of items that should be in the sample. The seed is for the random generator that is used in the sampling.

The methods return a pair of data sets. The first part is the actual sample, the second part of the pair is a data set containing the out-of-bag samples.

A working example:

  1. Dataset data=FileHandler.loadDataset(new File("devtools/data/"),4,",");
  2. Sampling s=Sampling.SubSampling;
  3. for(int i=0;i<5;i++){
  4. Pair<Dataset, Dataset> datas=s.sample(data, (int)(data.size()*0.8),i);
  5. Classifier c=new LibSVM();
  6. c.buildClassifier(datas.x());
  7. Map pms=EvaluateDataset.testDataset(c, datas.y());
  8. System.out.println(pms);
  9. }