This repository serves as an index for open datasets which may prove useful to your machine learning projects.
Category | Description | Navigator |
---|---|---|
Commerce | Datasets on commerce, both physical and eCommerce. | Click |
Computer vision | Generic computer vision datasets. | Click |
Education | Data sets related to education, schools, universities, colleges, etc. | Click |
Healthcare | Data sets related to healthcare, medication/drugs, et cetera. | Click |
Human beings | Data sets related to human beings. | Click |
Infrastructure | Data about roads, bridges, railways, waterways, airports, and so on. | Click |
Language | Generic language processing datasets, e.g. for Natural Language Processing (NLP). | Click |
Nature | Datasets related to nature and nature phenomena | Click |
Physics | Datasets related to phenomena from physics | Click |
Real estate | Dataset that specifically focus on real estate. | Click |
Recommender engines | Datasets useful to recommender engine projects | Click |
Self-driving vehicles | Data about autonomous vehicles such as cars, trains, airplanes and ships. | Click |
Sentiment | Generic sentiment analysis datasets. | Click |
Traffic | Traffic related datasets. | Click |
Technology | Technology related datasets of all forms of technologies. | Click |
Travel | Travel related datasets. | Click |
You are welcomed and in fact encouraged to contribute to our repository. In return, you will be eternalized in the list of contributers below. There exist two paths to making a successful contribution:
You may create an issue in our issue tracker. Please use the following template:
Dataset name:
Dataset repository / storage location:
Dataset description:
We will review the issue as quickly as we can and include you in the contributors list when we accept your contribution.
You may also choose to make a pull request directly. Pick one of the repository's .md
files in which you think your dataset belongs. We will review the pull request as quickly as we can and include you in the contributors list when we approve your PR.
- Dua, D. and Graff, C. (2019). UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science.
- Stanford, Stacy; https://towardsdatascience.com/the-50-best-public-datasets-for-machine-learning-d80e9f030279
- Versloot, Christian; https://github.com/christianversloot
We dedicate this work to the public domain through a CC0 1.0 Universal Public Domain Dedication, which means that you may share and edit it, even within a different project. More information here: https://creativecommons.org/publicdomain/zero/1.0. If you do regardless wish to credit us for our work, you could e.g. write GSWRX Business Innovators, the Netherlands / www.gswrx.nl & www.degasfabriek.com
.
Please note that this license applies to our collection of datasets rather than the datasets themselves, which may be licensed differently by their respective owners. Please ensure to comply with the licenses of individual datasets prior to using them.