Load data from file

This tutorial shows how you can read data from a number of file types. At the moment the library supports any type of field formatted file, i.e. one sample on a line and the attributes separated by a symbol (commonly , (comma) ;(semi-colon) or \t (tab)). Common formats include CSV and TSV and this type of file is further denoted by 'CSV-like file'. Java-ML also supports ARFF formatted files, with the limitation that only the class label can be non-numeric.

Loading data from CSV-like files
[Documented source code]

  1. Dataset data = FileHandler.loadDataset(new File("iris.data"), 4, ",");

The first parameter of loadDataset is the file to load the data from. The second parameter is the index of the class label (zero-based) and the final parameter is the separator used to split the attributes in the file.

Data for the above example.
Iris data set

Loading sparse data from CSV-like files
[Documented source code]

  1. Dataset data = FileHandler.loadSparseDataset(new File("sparse.tsv"), 0, " ",":");

The first parameter of loadSparseDataset is the file to load the data from. The second parameter is the index of the class label (zero-based), the third parameter is the attribute separator and the final parameter is the separator used to split the index and value of the attribute.

Sample data file for sparse data
Sample sparse data

Loading data from an ARFF formatted file
Loading from an ARFF formatted file is much like loading from a CSV or TSV file.
[Documented source code]

  1. Dataset data = ARFFHandler.loadARFF(new File("iris.arff"), 4);

The loadARFF method of the ARFFHandler only has two arguments, the first one to indicate the file that should be loaded and the second one to indicate the index of the class label. The ARFFHandler also has a method with only a File as argument to load files that do not contain a class label.

There are some caveats with the ARFF loader: (i) all the header information is ignored by the loader, and (ii) Java-ML only supports numeric attributes, data sets that have attributes that are not numeric will crash the loader.

Sample data file for ARFF loader
Sample Iris ARFF data

Note that the FileHandler works with flat files or GZIP compressed files, while the ARFFHandler only works with flat files.