Fri, 09/05/2008 - 09:56 — Thomas Abeel

Some interesting links for machine learning and machine learning libraries.

- Machine learning: The machine learning article from Wikipedia.
- Scholarpedia: an Encyclopia like Wikipedia, except that all the articles are all in the field of Computational Neuroscience, Dynamical Systems or Computational Intelligence. All the articles are written by experts in their field and peer-reviewed by other experts.
- Tutorials: Tom Dietterichs tutorials on machine learning.
- Introduction to Machine Learning: A introductory book on machine learning by Nils J. Nilsson. This book is freely available.
- Clustering and segmentation software. A collection of unsupervised data mining software packages, both free and commercial.

**CougarSquared**: Cougar^2 is a new Java library for machine learning and data mining research. We extend the WEKA and YALE machine learning frameworks.**GNG**: DemoGNG, a Java applet, implements several methods related to competitive learning in neural networks. It is possible to experiment with the methods using various data distributions and observe the learning process. A common terminology is used to make it easy to compare one method to the other.**GUI Ant-Miner**: GUI Ant-Miner is a tool for extracting classification rules from data. It is an updated version of a data mining algorithm called Ant-Miner (Ant Colony-based Data Miner)-
**JAMA**: JAMA is a basic linear algebra package for Java. It provides user-level classes for constructing and manipulating real, dense matrices. It is meant to provide sufficient functionality for routine problems, packaged in a way that is natural and understandable to non-experts. It is intended to serve as the standard matrix class for Java, and will be proposed as such to the Java Grande Forum and then to Sun. A straightforward public-domain reference implementation has been developed by the MathWorks and NIST as a strawman for such a class. We are releasing this version in order to obtain public comment. There is no guarantee that future versions of JAMA will be compatible with this one. **JGAP**: JGAP (pronounced "jay-gap") is a Genetic Algorithms and Genetic Programming component provided as a Java framework. It provides basic genetic mechanisms that can be easily used to apply evolutionary principles to problem solutions. See the examples for a demonstration or watch out the graphical tree that can be created with JGAP for found solutions of genetically evolved programs.-
**JMathLib**: JMathLib is meant to be a clone of Matlab, but written entirely in java. A library of mathematical functions designed to be used in evaluating complex expressions and display the results graphically. It will be used either interactively through a terminal like window or to interpret script files. It is intended to be a java version of programs such as MatLab, Octave and Scilab. -
**JMatLink**: JMatLink connects Java to MATLAB using native methods. You are required to have a local version of Matlab. **JOONE**: Joone is a FREE Neural Network framework to create, train and test artificial neural networks. The aim is to create a powerful environment both for enthusiastic and professional users, based on the newest Java technologies.-
**KNIME**: KNIME, pronounced [naim], is a modular data exploration platform that enables the user to visually create data flows (often referred to as pipelines), selectively execute some or all analysis steps, and later investigate the results through interactive views on data and models. -
**MTJ**: The Matrix Toolkits for Java (MTJ) is a comprehensive collection of matrix datastructures, linear solvers (direct and iterative), least squares methods, eigenvalue- and singular value decompositions. MTJ is designed to be used as a library for developing numerical applications, both for small and large scale computations. The library is based on BLAS and LAPACK for its dense and structured sparse computations, and on the Templates project for unstructured sparse operations. As an option, MTJ can use machine-optimized BLAS libraries (such as ATLAS) for improved performance of dense matrix operations. Without such libraries, MTJ falls back to JLAPACK, a Java translation of BLAS and LAPACK. This ensures perfect portability, while allowing for improved performance in a production environment. -
**Weka**: Weka is a collection of machine learning algorithms for data mining tasks. The algorithmscan either be applied directly to a dataset or called from your own Java code. Weka contains tools for data pre-processing, classification, regression, clustering, association rules, and visualization. It is also well-suited for developing new machine learning schemes. **Xelopes**: The prudsys XELOPES library (eXtEnded Library fOr Prudsys Embedded Solutions) is an open platform-independent and data-source-independent library for Embedded Data Mining. XELOPES is CWM-compatible, supports the relevant Data Mining standards and can be combined with all prudsys products. The XELOPES Library that can be downloaded is licensed under the GNU General Public Licence.-
**MLC++**: MLC++ provides general machine learning algorithms that can be used by end users, analysts, professionals, and researchers. The main objective is to provide users with a wide variety of tools that can help mine data, accelerate development of new mining algorithms, increase software reliability, provide comparison tools, and display information visually. More than just a collection of existing algorithms, MLC++ is an attempt to extract commonalities of machine learning algorithms and decompose them for a unified view that is simple, coherent, and extensible. **SiMath**: SiMath is Silicos' open source library and C++ API to train and evaluate predictive and classification models from data matrices. SiMath is built on top of several open source libraries which all have their own interface and internal format. The idea of SiMath is to have a flexible and consistent interface to all tools needed in a typical data modeling procedure. These tools include preprocessing, feature selection, model training and evaluation. Currently, SiMath includes some basic classes to manipulate real-valued data matrices and vectors. There are several clustering algorithms available and also support vector machines for classification and regression are provided.-
**RapidMiner**: RapidMiner (formerly YALE) is a flexible open-source tool for knowledge discovery, machine learning experiments, and data mining applications. Experiments can be made up of a large number of arbitrarily nestable operators and their setup is described by XML files which can easily be created with a graphical user interface. Applications of RapidMiner cover both research and real-world data mining tasks. The graphical user interface and the XML based scripting language turn RapidMiner into an integrated development environment (IDE) for machine learning and data mining. Furthermore, this concept defines a standardized interchange format for data mining experiments.