This tutorial shows how you can read data from a number of file types. At the moment the library supports any type of field formatted file, i.e. one sample on a line and the attributes separated by a symbol (commonly , (comma) ;(semi-colon) or \t (tab)). Common formats include CSV and TSV and this type of file is further denoted by 'CSV-like file'. Java-ML also supports ARFF formatted files, with the limitation that only the class label can be non-numeric.
Loading data from CSV-like files
[Documented source code]
Data for the above example.
Iris data set
Loading sparse data from CSV-like files
[Documented source code]
Sample data file for sparse data
Sample sparse data
Loading data from an ARFF formatted file
Loading from an ARFF formatted file is much like loading from a CSV or TSV file.
[Documented source code]
There are some caveats with the ARFF loader: (i) all the header information is ignored by the loader, and (ii) Java-ML only supports numeric attributes, data sets that have attributes that are not numeric will crash the loader.
Sample data file for ARFF loader
Sample Iris ARFF data
Note that the FileHandler works with flat files or GZIP compressed files, while the ARFFHandler only works with flat files.