We provide mirrors for a number of well known datasets and we host a number of datasets that have been used in the scientific literature for validation.
One of the most well-known repositories of machine learning related datasets is the UCI Machine Learning Repository. Currently they host over 170 datasets related to number of machine learning fields including classification, clustering and regression.
We provide two packages, first a collection of 111 'small' datasets that contain less than 10 Mb of data, and second a set of 7 larger datasets which have over 10 Mb of data.
For full details on all datasets we would like to refer to the home page of the UCI repository. If you use these datasets in your research, please cite the original authors work.