Basic terminology

This article explains the various terms that are used throughout the rest of the tutorials and are also used in the source-code documentation.

We assume that you are somewhat familiar with the concept of machine learning or data mining.

We also assume you have at the very least a basic knowledge of Java, the language in which the library is written.

In short every data sample is stored in an Instance, which are grouped together in a Dataset. Each Instance can have a number of attributes that have real values. The terms features and attributes are synonymous in this context.

Each Instance can have a class label. Algorithms are functions that work on Datasets and Instances. Classification algorithms for example can be trained on a Dataset and can later be applied to classify Instances.

More in detail:

The representation of real world sample. An instance can have any number of attributes that define it and can have at most one class label.
A dataset is a collection of instances that belong together.
Any method or technique that can work with Datasets and/or Instances. This can be algorithms for classification, clustering, regression, filtering, etc.