Document image classification with neural networks on a subset of the RVL-CDIP dataset [1].
The classification problem is tackled with two different approaches:
- Visual approach over the image pixels with dense only and convolutional neural networks:
skeleton.ipynb
- Textual approach over the recognized image words with bag-of-words and word embedding models:
skeleton_ocr.ipynb
It is recommended to begin with the visual approach as it includes more details about the computing environment setup and the dataset.
For a better experience, execute the notebooks within a Google Colab environment.
[1] A. W. Harley, A. Ufkes, K. G. Derpanis, "Evaluation of Deep Convolutional Nets for Document Image Classification and Retrieval," in ICDAR, 2015