https://mp.weixin.qq.com/s/sigPeclXz12NUk_nX8TZ-A
https://github.com/NanoNets/nanonets-ocr-sample-python
DocUNet: Document Image Unwarping via A Stacked U-Net, Ke Ma, Zhixin Shu, Xue Bai, Jue Wang, Dimitris Samaras. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
https://github.com/wuleiaty/DocUNet
用 PyTorch 1.2 实现的文本检测与识别研究项目 https://github.com/Megvii-CSG/MegReader
DB (Real-time Scene Text Detection with Differentiable Binarization) implementation in Keras and Tensorflow https://github.com/xuannianz/DifferentiableBinarization
A PyToch implementation of "Real-time Scene Text Detection with Differentiable Binarization". https://github.com/MhLiao/DB
CharNet: Convolutional Character Networks https://github.com/MalongTech/research-charnet
https://github.com/JaidedAI/EasyOCR
https://github.com/xiangweizeng/mobile-lpr
【Transformer场景文字识别】’Transformer-OCR - Scene Text Recognition via Transformer' https://github.com/fengxinjie/Transformer-OCR
pytorch implementation of R2CNN, Rotational Faster RCNN for orientated object detection https://github.com/Xiangyu-CAS/R2CNN.pytorch
Show, Attend and Read: A Simple and Strong Baseline for Irregular Text Recognition [AAAI-2019] https://github.com/Pay20Y/SAR_TF
PyTorch re-implementation of ''Real-time Scene Text Detection with Differentiable Binarization'' (AAAI 2019) https://github.com/SURFZJY/Real-time-Text-Detection
Code for the paper "KISS: Keeping it Simple for Scene Text Recognition" https://github.com/Bartzi/kiss
Pytorch implementation for "Decoupled attention network for text recognition". https://github.com/Wang-Tianwei/Decoupled-attention-network
超轻量级中文ocr,支持竖排文字识别, 支持ncnn推理 , psenet(8.5M) + crnn(6.3M) + anglenet(1.5M) 总模型仅17M - 基于chineseocr 与psenet 实现中文自然场景文字检测及识别
https://github.com/ouyanghuiyu/chineseocr_lite
https://github.com/brooklyn1900/CRAFT_pytorch
Unofficial PyTorch implementation of 2D Attentional Irregular Scene Text Recognizer https://github.com/chenjun2hao/Bert_OCR.pytorch
https://github.com/ahmetozlu/signature_extractor
https://github.com/xiaoyu258/DocProj
https://github.com/Mingtzge/2019-CCF-BDCI-OCR-MCZJ-OCR-IdentificationIDElement
https://github.com/youdao-ai/SRNet
Scene Text Detection with Learned Anchor https://github.com/xhzdeng/stela
EATEN: Entity-aware Attention for Single Shot Visual Text Extraction https://github.com/beacandler/EATEN
mxnet-Gluon implementation of PSENet text detector (Shape Robust Text Detection with Progressive Scale Expansion Network) https://github.com/saicoco/Gluon-PSENet
[ICCV 2019] CompenNet++: End-to-end Full Projector Compensation https://github.com/BingyaoHuang/CompenNet-plusplus
CRAFT-Pyotorch:Character Region Awareness for Text Detection Reimplementation for Pytorch https://github.com/backtime92/CRAFT-Reimplementation
【OpenCV表格识别】’OTR - Optical table recognition - recognize tables in scan images using OpenCV' https://github.com/ulikoehler/OTR
【PyLaia:面向手写文档分析的深度学习工具包】 https://github.com/jpuigcerver/PyLaia
'齊伋體(字体) - typeface from Ming Dynasty woodblock printed books' https://github.com/LingDong-/qiji-font
【CORD:(收据)OCR后处理解析数据集】 https://github.com/clovaai/cord
【小票信息提取(OCR)】 https://github.com/zzzDavid/ICDAR-2019-SROIE
【Kaggle开放日:Kuzushiji日文古文字识别】 https://www.youtube.com/watch?v=cGpIVyV96Hg
【深度学习表格检测、信息提取和结构化】 https://nanonets.com/blog/table-extraction-deep-learning/
'Image2Katex - 公式图片ocr,输入图片输出对应的latex表达式' https://github.com/xiaofengShi/Image2Katex
【用深度学习/OCR实现收据(小票)自动数字化】 https://nanonets.com/blog/receipt-ocr/
【Keras文本检测/OCR包】 https://github.com/faustomorales/keras-ocr
《Handwritten Optical Character Recognition (OCR): A Comprehensive Systematic Literature Review (SLR)》 https://www.arxiv-vanity.com/papers/2001.00139/
'汉字字符特征提取器 (featurizer),提取汉字的特征(发音特征、字形特征)用做深度学习的特征' https://github.com/howl-anderson/hanzi_char_featurizer
【CRNN (CNN+RNN)车牌识别】 https://github.com/qjadud1994/CRNN-Keras
【基于darknet框架实现CTPN版本自然场景文字检测与CNN+CTCOCR文字识别】 https://github.com/chineseocr/darknet-ocr
'Scanner - 二维码/条码识别、身份证识别、银行卡识别、车牌识别、图片文字识别、黄图识别' https://github.com/shouzhong/Scanner
【教程:用YOLO+Tesseract实现定制OCR系统】 https://medium.com/saarthi-ai/how-to-build-your-own-ocr-a5bb91b622ba
https://github.com/LinXueyuanStdio/LaTeX_OCR
【深度学习OCR/文档分析/文本识别/语言建模教程资料】 https://github.com/tmbdev/icdar2019-tutorial
将书本公式快速转换为 LaTex 格式 https://mp.weixin.qq.com/s/vNNNJumpgobE-iAdharL9A https://github.com/blaisewang/img2latex-mathpix
https://github.com/xuexingyu24/License_Plate_Detection_Pytorch
'选字验证码破解' https://github.com/cos120/captcha_crack
Seq2Seq+Attention 中文OCR文字识别 https://github.com/bai-shang/crnn_seq2seq_ocr_pytorch
【AttentionOCR 自然场景文字识别】 https://github.com/zhang0jhon/AttentionOCR
【OCR文本检测】’Text Detector for OCR - Text detection model that combines Retinanet with textboxes++ for OCR' https://github.com/qjadud1994/Text_Detector
https://github.com/LinXueyuanStdio/LaTeX_OCR_PRO
'树洞 OCR 文字识别 https://github.com/AnyListen/tools-ocr
端到端场景文字检测与识别资源大列表 https://github.com/HCIILAB/Scene-Text-End2end
【CRAFT + CRNN 文本识别工具】’Text recognition tool - Text recognition with Pytorch Using CRNN and CRAFT pretrained models. https://github.com/s3nh/text-detector
ICDAR 2019自由字型文本识别竞赛第一名方案 https://github.com/Jyouhou/ICDAR2019-ArT-Recognition-Alchemy
PDF表格数据提取工具 https://github.com/camelot-dev/camelot
'Convolutional Recurrent Neural Network + CTCLoss - Pytorch implementation of CRNN (CNN + RNN + CTCLoss) for all language OCR' https://github.com/Holmeyoung/crnn-pytorch
Kaggle 日文古文字识别比赛第二名方案 https://github.com/lopuhin/kaggle-kuzushiji-2019
寻找最好的OCR工具 https://github.com/factful/ocr_testing
https://github.com/lquirosd/P2PaLA
面向文本定位与识别的合成数据/OCR数据集列表 https://github.com/TianzhongSong/awesome-SynthText
https://github.com/masyagin1998/robin
https://github.com/ikivanc/Document-Classification-and-Post-OCR-Key-Value-Extraction
场景文字检测资源大列表 https://github.com/HCIILAB/Scene-Text-Detection https://github.com/HCIILAB/Scene-Text-Recognition
https://github.com/awslabs/handwritten-text-recognition-for-apache-mxnet
https://github.com/Roujack/mathAI
- faster-CTPN - very fast CTPN https://github.com/hsddlz/faster-CTPN we change the LSTM model to conv1D that more than 5 times faster than the original version.
https://github.com/xiaomaxiao/PSENET
RRD: Rotation-Sensitive Regression for Oriented Scene Text Detection https://github.com/MhLiao/RRD
Keras implementation of Character Region Awareness for Text Detection (CRAFT) https://github.com/RubanSeven/CRAFT_keras
Brno Mobile OCR数据集 https://pero.fit.vutbr.cz/brno_mobile_ocr_dataset https://github.com/DCGM/B-MOD
用图像预处理提高OCR文字识别精度 https://medium.com/cashify-engineering/improve-accuracy-of-ocr-using-image-preprocessing-8df29ec3a033
中文字型特征深度学习Glyce开源代码 Glyce: Glyph-vectors for Chinese Character Representations https://github.com/ShannonAI/glyce
https://github.com/Yuliang-Liu/Box_Discretization_Network
https://github.com/AirBernard/Scene-Text-Detection-with-SPCNET
https://github.com/ayumiymk/aster.pytorch
文字识别OCR相关文献大列表 https://github.com/ChanChiChoi/awesome-ocr
ocr_densenet - 第一届西安交通大学人工智能实践大赛(2018AI实践大赛--图片文字识别)第一名;仅采用densenet识别图中文字 https://github.com/yinchangchang/ocr_densenet
场景文字去除数据集 https://github.com/HCIILAB/Scene-Text-Removal
场景文字检测与识别文献/代码大列表 https://github.com/Jyouhou/SceneTextPapers
An Implementation of the alogrithm in paper IncepText: A New Inception-Text Module with Deformable PSROI Pooling for Multi-Oriented Scene Text Detection https://github.com/xieyufei1993/InceptText-Tensorflow
ICDAR2015场景文字检测方案 https://github.com/Vipermdl/OCR_detection_IC15
Extremely simple implement for CRNN by Tensorflow https://zhuanlan.zhihu.com/p/43534801 https://github.com/bai-shang/crnn_ctc_ocr.Tensorflow
Pyramid Mask Text Detector designed by SenseTime Video Intelligence Research team. https://github.com/STVIR/PMTD
https://juejin.im/post/5ce21f99e51d4510686adf85
https://github.com/xellows1305/Document-Image-Dewarping
【OpenCV表格识别】’OTR - Optical table recognition - recognize tables in scan images using OpenCV' by Uli Köhler https://github.com/ulikoehler/OTR
用unet实现对文档表格的自动检测,表格重建' https://github.com/chineseocr/table-ocr
A PyTorch implementation of "ABCNet: Real-time Scene Text Spotting with Adaptive Bezier-Curve Network" (CVPR 2020 oral) https://github.com/Yuliang-Liu/bezier_curve_text_spotting
用TensorFlow实现手写文字识别 https://github.com/githubharald/SimpleHTR
https://github.com/NPS-Cisco-2019/Backend-Fullstack
https://github.com/WenmuZhou/PSENet.pytorch https://github.com/whai362/PSENet https://arxiv.org/abs/1903.12473
Pyramid Mask Text Detector https://www.arxiv-vanity.com/papers/1903.11800/
基于CTPN(tensorflow)+CRNN(pytorch)+CTC的不定长文本检测和识别 https://github.com/ooooverflow/chinese-ocr
CRNN场景文字识别 https://github.com/MaybeShewill-CV/CRNN_Tensorflow
FOTS: Fast Oriented Text Spotting with a Unified Network https://github.com/xieyufei1993/FOTS
This is a tensorflow re-implementation of PSENet: Shape Robust Text Detection with Progressive Scale Expansion Network.My blog: https://blog.csdn.net/liuxiaoheng1992… https://github.com/liuheng92/tensorflow_PSENet
Deep Pyramid Convolutional Neural Networks for Text Categorization https://github.com/Cheneng/DPCNN
cnocr:用来做中文OCR的Python 3包,自带了训练好的识别模型 https://github.com/breezedeus/cnocr
Implementation for CVPR 2018 text recognition Paper by Tensorflow: "AON: Towards Arbitrarily-Oriented Text Recognition" https://github.com/huizhang0110/AON
PyTorch implementation of CRNN to do Image Text Recognition using torch.nn.CTCLoss https://github.com/zhiqwang/crnn.pytorch
License Plate Detection and Recognition in Unconstrained Scenarios https://github.com/sergiomsilva/alpr-unconstrained
https://github.com/lyl8213/Plate_Recognition-LPRnet
Tightness-aware Evaluation Protocol for Scene Text Detection (CVPR 2019) https://github.com/Yuliang-Liu/TIoU-metric
https://github.com/clovaai/deep-text-recognition-benchmark
TextField: Learning A Deep Direction Field for Irregular Scene Text Detection (TIP 2019) https://github.com/YukangWang/TextField
Convolutional Recurrent Neural Networks(CRNN) for Scene Text Recognition https://github.com/MaybeShewill-CV/CRNN_Tensorflow
https://github.com/Yuanhang8605/faster-than-ctpn-a-novel-poster-text-detector https://github.com/Yuanhang8605/pixel-anchor-link-and-text-detector-experience 结合实际项目经验谈文本检测
不知不觉已经转行一年了,一年以来,一直在搞各类检测和分割的算法,主要落地的场景是文本检测。由于我们这之前的OCR都是用的传统算法, 基于mser那一套,因此基于深度学习的文本检测我算是头一个吃螃蟹的,真的是一切只能靠自己。
刚开始的时候,毫无经验,只能去刷论文,然后在github上搜一些比较热门的项目。最开始看到的是旷世的EAST还有白翔试验室的Textboxes++。在基于分割和基于anchor的算法里摸爬了两个月,用ICDAR15数据集练练手。终于把文章的结果都复现了,而且复现的结果比原论文要好。正沾沾自喜的时候,当把算法用到实际项目场景的时候,发现实际根本没法用。因为EAST和Textboxes++针对的都是多方向的英文数据集,对于长中文无能为力。刷榜和实际落地果然是两码事。CVPR上公开的论文基本上以刷ICDAR的榜为主,只能用来启发思路。但是多看看其实也是可以的,通过这些方法来练手,锻炼自己的算法能力,然后才有可能自己去完整开发一套算法框架。解决实际问题的时候,往往没有什么特别的trick,只有最基本的原理,和对算法的深刻理解,要因地制宜设计算法。
实际项目主要针对印刷体,无非两种,一种结构化的需求,一种通用的需求。两种需求需要走不同的开发路线。
结构化的需求主要针对一些单据,比如我们开发的最复杂的增值税票据,需要把每一个名称字段和内容字段分别检测出来。还有银行的一些存单,现金支票等。这些场景经常碰到的问题是,内容字段和名称字段会印重,因此需要把它们分开。传统的link方法从原理上无法解决这个问题,类似pixel-link,PSENET那些github上公开的算法都会遇到文本粘连的问题,实际应用达不到工业级的鲁棒程度。只能走直接回归的路线,通过finetune的方式过拟合特定的场景。直接回归有两种方式,一种是基于anchor的回归,一种是角点回归和匹配。两种方法我们都取得了不错的效果,并成功落地在增值税发票的文本检测当中。当然实际开发当中会碰到各种各样的问题,实际动手后碰到问题解决问题,才能最终获得满意的结果。比如基于anchor回归的方式,你会发现anchor设计非常难,而且经常碰到的情况是对于长中文文本检测到的边界框时而准时而不准。角点回归和匹配主要难点在后处理,如何鲁棒性的匹配准也需要不断尝试。
通用场景下,首先碰到的问题是,对于中文文本行,没有办法准确定义文本的边界,到底隔多少字符算做一个文本行?所以不能走直接回归的方式,link的方式应该是主流方法。但是直接link的话也不行,像pixel-link和PSENet这种算法,对于小文本基本就看不见了,除非把图像放得很大,但是放得很大成本又太高。而且像素的link最大的问题在于容易受背景干扰,不够鲁棒,其次还有就是容易把紧密的文本粘连起来,实际跑过这些算法的同学应该知道有多痛苦。其实要想做到鲁棒而且泛化能力强的话,最好把问题简单化。大部分应用场景文本只有一个主方向,因此加一个方向矫正模块,把问题简化成一个一维问题,类似于CTPN的方式。CTPN还可以勉强用用,但是速度太慢。强烈推荐一下我们自己开发的pixel-anchor-link算法,只用合成数据训练就能获得极强的泛化能力,详见展示图片。学习最简单的特征,其泛化能力就会很好。pixel方法容易受背景干扰,anchor方法泛化能力不强,两者有机结合,具有超强的泛化能力和抗干扰能力,根本用不着ctpn这种加lstm的操作,lstm很慢。讽刺的是,这种实用的方法根本发不了顶会论文,因为一方面公司不会发实用的方法出去,另一方面学术圈的怪象发论文先benchmark刷刷榜。刚入行的同学最容易犯的一个错误是:刷无数的论文-->复现-->没用-->继续刷论文,还是不要迷信CVPR这些顶会,解决实际问题,追求工业级的鲁棒性才是王道。
有些细节不方便细说,只能帮大家到这了。愿你不要像我一样趟这么多坑,不过趟坑使人成长,内功最重要,把握分类和回归两个核心,针对实际场景解决实际问题,有时候越简单越好,算法保佑你。
最近在做商场购物小票的OCR,上传了一些效果图片。
(PyTorch)CRNN不定长中文字符识别 https://github.com/Sierkinhane/crnn_chinese_characters_rec
Single Shot Text Detector with Regional Attention https://github.com/HotaekHan/SSTDNet
This is a pytorch re-implementation of EAST: An Efficient and Accurate Scene Text Detector. https://github.com/songdejia/EAST
场景文字识别与理解资源集锦 https://github.com/tangzhenyu/Scene-Text-Understanding
通过分割来检测文字 PixelLink: Detecting Scene Text via Instance Segmentation https://github.com/ZJULearning/pixel_link
深度学习文本检测/识别资源大列表 https://github.com/hwalsuklee/awesome-deep-text-detection-recognition
ChineseAddress_OCR - Photographing Chinese-Address OCR implemented using CTPN+CTC+Address Correction. 拍照文档中文地址文字识别。 https://github.com/Walleclipse/ChineseAddress_OCR
任意方向场景文字检测 https://github.com/mjq11302010044/RRPN
AdvancedEAST高效场景文字检测 https://github.com/huoyijie/AdvancedEAST https://www.pyimagesearch.com/2018/08/20/opencv-text-detection-east-text-detector/
R2CNN: Rotational Region CNN for Orientation Robust Scene Text Detection https://github.com/beacandler/R2CNN
Tensorflow/Keras实现的端到端不定长中文字符检测和识别 https://github.com/YCG09/chinese_ocr
银行卡、身份证、门牌号光学识别 https://github.com/evilgix/Evil
文字识别(OCR)合成数据生成器 https://github.com/Belval/TextRecognitionDataGenerator
OCR文字(汉字)识别训练图像生成器 https://github.com/Sanster/text_renderer
准确率99.8%的开源车牌识别 https://github.com/zhubenfu/License-Plate-Detect-Recognition-via-Deep-Neural-Networks-accuracy-up-to-99.9
Talk the Walk: Navigating New York City through Grounded Dialogue(语音导航) https://github.com/facebookresearch/talkthewalk https://arxiv.org/abs/1807.03367
验证码识别 https://github.com/ecthros/uncaptcha2
ARU-Net: A Neural Pixel Labeler for Layout Analysis of Historical Documents https://github.com/TobiasGruening/ARU-Net
A tensorflow re-implementation of RRPN: Arbitrary-Oriented Scene Text Detection via Rotation Proposals. https://github.com/DetectionTeamUCAS/RRPN_Faster-RCNN_Tensorflow
Mask TextSpotter: An End-to-End Trainable Neural Network for Spotting Text with Arbitrary Shapes https://arxiv.org/abs/1807.02242 https://github.com/lvpengyuan/masktextspotter.caffe2
用web界面从PDF文件提取表格数据 https://github.com/camelot-dev/excalibur
基于深度学习的文字识别系统 https://github.com/AstarLight/CPS-OCR-Engine
Geometry-Aware Scene Text Detection with Instance Transformation Network https://github.com/zlmzju/itn
开源OCR文字识别软件Calamari https://github.com/Calamari-OCR/calamari
DeepTextSpotter: An End-to-End Trainable Scene Text Localization and Recognition Framework https://github.com/MichalBusta/DeepTextSpotter
yolo3+ocr https://github.com/chineseocr/chineseocr
Recognizing cropped text in natural images. https://github.com/bgshih/aster
Single Shot Scene Text Retrieval, ECCV 2018. https://github.com/lluisgomez/single-shot-str
Graph Convolutional Networks for Text Classification. AAAI 2019 https://github.com/yao8839836/text_gcn
A PyTorch implement of ECCV2018 Paper: TextSnake: A Flexible Representation for Detecting Text of Arbitrary Shapes https://github.com/princewang1994/TextSnake.pytorch
ocr, cnn+lstm+ctc, crnn, recognition model, tensorflow https://github.com/Li-Ming-Fan/OCR-CRNN-CTC
CRNN(CNN+RNN+CTCLoss)中文手写汉字识别 https://github.com/chizhanyuefeng/Chinese_OCR_CNN-RNN-CTC
ICDAR 2017场景文字图像数据集(及最新结果列表) https://github.com/cs-chan/Total-Text-Dataset
'CHINESE-OCR - [python3.6] 运用tf实现自然场景文字检测,keras/pytorch实现ctpn+crnn+ctc实现不定长场景文字OCR识别' by xiaofeng https://github.com/xiaofengShi/CHINESE-OCR
OCR_DataSet - 收集并整理有关OCR的数据集并统一标注格式 https://github.com/WenmuZhou/OCR_DataSet
nhyai: AI智能审查,支持色情识别、暴恐识别、语言识别、敏感文字检测和视频检测等功能,以及各种OCR识别能力 https://github.com/shuishang/nhyai
'PaddleOCR - 基于飞桨的OCR工具库,包含总模型仅8.6M的超轻量级中文OCR,单模型支持中英文数字组合识别、竖排文本识别、长文本识别。同时支持多种文本检测、文本识别的训练算法' https://github.com/PaddlePaddle/PaddleOCR
https://github.com/WenmuZhou/PytorchOCR
【Vedastr:PyTorch场景文字识别工具箱】 https://github.com/Media-Smart/vedastr
【OCR/文字检测/字体识别数据生成工具】 https://github.com/BboyHanat/TextGenerator
【TextShot:截屏直接拷贝图片中的文字内容】 https://github.com/ianzhao05/textshot
【在截屏图片里用OCR发现敏感信息的检测工具】 https://github.com/utkusen/shotlooter
【Tesseract OCR文本定位与检测】 https://www.pyimagesearch.com/2020/05/25/tesseract-ocr-text-localization-and-detection/
'TrWebOCR-开源的离线OCR' https://github.com/alisen39/TrWebOCR
【从模板化文档(表单/发票/收据等)提取结构化数据】 https://ai.googleblog.com/2020/06/extracting-structured-data-from.html https://research.google/pubs/pub49122/
【基于Transformer的场景文字识别(PyTorch)】 https://github.com/opconty/Transformer_STR
https://github.com/guanshuicheng/invoice
'《深度实践OCR:基于深度学习的文字识别》 随书代码' https://github.com/ocrbook/ocrinaction
【文档/图像表格检测数据集】’table-detection-dataset - dataset for table detection in documents and images' https://github.com/sgrpanchal31/table-detection-dataset
【支持40+种语言的OCR模块】’Easy OCR - Ready-to-use OCR with 40+ languages supported including Chinese, Japanese, Korean and Thai.' https://github.com/JaidedAI/EasyOCR
【Keras验证码识别】《OCR model for reading Captchas》 https://keras.io/examples/vision/captcha_ocr/
【TableRecognition:从图像还原表格并以word形式保存/crnn单字坐标提取(解决文字跨表格单元格识别)】 https://github.com/Rid7/Table-OCR
Deep relational reasoning graph network for arbitrary shape text detection; Accepted by CVPR 2020 (Oral). https://github.com/GXYM/DRRG
Synthetic Scene Text from 3D Engines https://github.com/Jyouhou/UnrealText
Code for generating the CurvedSynth dataset used in our paper: Alchemy: Techniques for Rectification Based Irregular Scene Text Recognition https://github.com/PkuDavidGuan/CurvedSynthText
STEFANN: Scene Text Editor using Font Adaptive Neural Network @ The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2020. https://github.com/prasunroy/stefann
This repository contains the code and implementation details of the CascadeTabNet paper "CascadeTabNet: An approach for end to end table detection and structure recognition from image-based documents" https://github.com/DevashishPrasad/CascadeTabNet
An implement of "R-Net: A Relationship Network for Efficient and Accurate Scene Text Detection” https://github.com/wangyuxin87/R-Net
Unofficial PyTorch implementation of Towards Accurate Scene Text Recognition with Semantic Reasoning Networks https://github.com/chenjun2hao/SRN.pytorch
https://github.com/clovaai/CLEval
https://github.com/saicoco/SA-Text
MASTER: Multi-Aspect Non-local Network for Scene Text Recognition https://github.com/jiangxiluning/MASTER-TF https://arxiv.org/abs/1910.02562
AutoSTR: Efficient Backbone Search for Scene Text Recognition. https://github.com/AutoML-4Paradigm/AutoSTR
https://github.com/tiantian91091317/OCR-Corrector
"TextRay: Contour-based Geometric Modeling for Arbitrary-shaped Scene Text Detection" https://github.com/LianaWang/TextRay
Packaged, Pytorch-based, easy to use, cross-platform version of the CRAFT text detector https://github.com/fcakyon/craft-text-detector
https://github.com/Layout-Parser/layout-parser
A PyTorch implementation of "ContourNet: Taking a Further Step toward Accurate Arbitrary-shaped Scene Text Detection" https://github.com/wangyuxin87/ContourNet
Scanning Single Shot Detector for Math in Document Images https://github.com/MaliParag/ScanSSD https://arxiv.org/pdf/2003.08005.pdf
Data and implementation of ECCV2020 paper 'Adaptive Text Recognition through Visual Matching' https://github.com/Chuhanxx/FontAdaptor
This is the implementation of the paper "SEED: Semantics Enhanced Encoder-Decoder Framework for Scene Text Recognition" https://github.com/Pay20Y/SEED
The code of "Mask TextSpotter v3: Segmentation Proposal Network for Robust Scene Text Spotting" https://github.com/MhLiao/MaskTextSpotterV3
https://cloud.tencent.com/developer/article/1484463
https://github.com/tommyMessi/tableImageParser_tx
https://github.com/tommyMessi/waveCorrection
场景文字检测/识别/合成文献资源列表 https://github.com/yflv-yanxia/scene_text
captcha_trainer_pytorch - 基于MobileNetV2/EfficientNet-b0/... + LSTM + CTC的不定长图像识别训练pytorch框架 https://github.com/sml2h3/captcha_trainer_pytorch
Im2Latex:用Deep CNN Encoder + LSTM Decoder由公式图片生成Latex源码的PyTorch实现 https://github.com/luopeixiang/im2latex
BERT model correct error character with mask feature - 基于bert进行中文文本纠错' https://github.com/tongchangD/bert_for_corrector
TDA-ReCTS文本检测消歧验证集 https://github.com/whai362/TDA-ReCTS
https://github.com/ciur/papermerge
中国首份OCR白皮书出炉,基于深度学习的OCR已成主流 https://mp.weixin.qq.com/s/xXDprbC94h-2a1JxIJB--g
pytorchOCR - 基于pytorch的ocr算法库,包括 psenet, pan, dbnet, sast , crnn https://github.com/BADBADBADBOY/pytorchOCR
https://github.com/neuspell/neuspell
OCR Post Correction for Endangered Language Text https://github.com/neulab/ocr-post-correction https://www.aclweb.org/anthology/2020.emnlp-main.478/
OpenScan:安卓开源文档扫描App https://github.com/Ethereal-Developers-Inc/OpenScan
End-to-end-for-chinese-plate-recognition - 基于u-net,cv2以及cnn的中文车牌定位,矫正和端到端识别软件,其中unet和cv2用于车牌定位和矫正,cnn进行车牌识别,unet和cnn都是基于tensorflow的keras实现
https://github.com/duanshengliu/End-to-end-for-chinese-plate-recognition
https://github.com/poke1024/origami
Autocorrect:Python拼写校正包 https://github.com/fsondej/autocorrect
https://arxiv.org/abs/2101.10281 https://github.com/allenai/pawls
VisualMRC: Machine Reading Comprehension on Document Images https://github.com/nttmdlab-nlp/VisualMRC
基于Pytorch实现(复现)的场景文字识别工具箱 https://github.com/chibohe/text_recognition_toolbox
https://github.com/phamquiluan/PubLayNet
On Recognizing Texts of Arbitrary Shapes with 2D Self-Attention https://github.com/clovaai/SATRN
This is the official implementation of AE TextSpotter, which introduces linguistic information to eliminate the ambiguity in text detection. https://github.com/whai362/AE_TextSpotter
RRPN++: Guidance Towards More Accurate Scene Text Detection https://github.com/mjq11302010044/RRPN_plusplus
https://github.com/Siyuada7/TP-LSD
MASTER is a scene text recognition model which is based on self-attention mechanism. https://github.com/jiangxiluning/MASTER-TF
This repository is the implementation of EraseNet, a neural network for end-to-end scene text removal. https://github.com/lcy0604/EraseNet
Pytorch implementation of Training Generative Adversarial Networks by Solving Ordinary Differential Equations. https://github.com/nshepperd/ode-gan-pytorch
The EPHOIE Dataset for the research of optical character recognition (OCR) and visual information extraction (VIE) in educational documents
https://github.com/HCIILAB/EPHOIE
https://github.com/manujosephv/pytorch_tabular
用Tesseract, OpenCV实现文档、表单、小票的OCR识别 https://www.pyimagesearch.com/2020/09/07/ocr-a-document-form-or-invoice-with-tesseract-opencv-and-python/
【PyTorch实现的实时场景文字检测】’DBNet-lite-pytorch - A pytorch re-implementation of Real-time Scene Text Detection with Differentiable Binarization'
https://github.com/BADBADBADBOY/DBnet-lite.pytorch
A suite of batches and tools for OCR tasks https://github.com/poke1024/origami
非英语语言的Tesseract OCR自动识别 https://www.pyimagesearch.com/2020/08/03/tesseract-ocr-for-non-english-languages/
这在研究历史事件的时候很有用。比如我很容易就找到了李鸿章在日本遇刺时的报道。
github.com/hpanwar08/detectron2
支持多种语言的即用型的 Python OCR 库,包括中文、日文、韩文等 https://github.com/JaidedAI/EasyOCR
PaddleOCR2Pytorch:移植到PyTorch的PaddleOCR github.com/frotms/PaddleOCR2Pytorch ncnn移植版:https:// github.com/FeiGeChuanShu/ncnn_paddleocr
mmocr:OpenMMLab开源文本检测/识别工具箱 github.com/open-mmlab/mmocr
layout-parser.github.io/
Deskew:扫描文档图像倾斜检测与校正库 github.com/sbrunner/deskew
《MASTER: Multi-Aspect Non-local Network for Scene Text Recognition》(PR 2021) github.com/wenwenyu/MASTER-pytorch
Scene Text Retrieval via Joint Text Detection and Similarity Learning (CVPR 2021) github.com/lanfeng4659/STR-TDSL
github.com/mindee/doctr
PyTorch Tabular: A Framework for Deep Learning with Tabular Data https://www.arxiv-vanity.com/papers/2104.13638 github.com/manujosephv/pytorch_tabular
TABBIE: Pretrained Representations of Tabular Data https://www.arxiv-vanity.com/papers/2105.02584
TextOCR: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text https://www.arxiv-vanity.com/papers/2105.05486
macOCR:Mac下的屏幕文字识别(OCR)工具 github.com/schappim/macOCR
场景文字检测/识别相关文献/代码集 github.com/Ykmoon/scene-text-detection-recognition
微软亚洲研究院提出多语言通用文档理解预训练模型LayoutXLM Multimodal Pre-training for Multilingual Visually-rich Document Understanding
论文链接:https://arxiv.org/abs/2104.08836 代码/模型:https://aka.ms/layoutxlm 数据集:https://github.com/doc-analysis/XFUN
《Vision Transformer for Fast and Efficient Scene Text Recognition》(ICDAR 2021) github.com/roatienza/deep-text-recognition-benchmark
Paperless-ng:Paperless加强版纸质文档扫描识别电子化方案 github.com/jonaswinkler/paperless-ng
cnstd - MXNet场景文字检测包 github.com/breezedeus/cnstd
用变分Transformer网络实现文档布局设计自动化 https://arxiv.org/abs/2104.02416
《Read Like Humans: Autonomous, Bidirectional and Iterative Language Modeling for Scene Text Recognition》(CVPR 2021) github.com/FangShancheng/ABINet
https://www.arxiv-vanity.com/papers/2106.11539
RapidOCR:基于PaddleOCR & OnnxRuntime的跨平台OCR库 github.com/RapidOCR/RapidOCR
github.com/gwxie/Dewarping-Document-Image-By-Displacement-Flow-Estimation
SAP-HANA-AutoML:表格数据自动化机器学习库 github.com/dan0nchik/SAP-HANA-AutoML
文档理解(DU)相关资源大列表 github.com/tstanislawek/awesome-document-understanding
《Primitive Representation Learning for Scene Text Recognition》(CVPR 2021) github.com/RuijieJ/pren
RewriteNet: Realistic Scene Text Image Generation via Editing Text in Real-world Image https://arxiv.org/abs/2107.11041
github.com/qurator-spk/eynollah
CTCResources:中文文本校正(纠错)相关资源大列表 github.com/destwang/CTCResources
github.com/JiaquanYe/TableMASTER-mmocr
Image to LaTeX,可将 LaTeX 公式图片,快速转换为可复制的 LaTeX 代码公式。 github.com/kingyiusuen/image-to-latex
ICDAR 2021公式检测第一名方案 github.com/Yuxiang1995/ICDAR2021_MFD
Lights, Camera, Action! A Framework to Improve NLP Accuracy over OCR documents https://arxiv.org/abs/2108.02899
Awesome OCR,列举了一系列 OCR 相关的开发工具、开源项目、技术实现方案、数据集等诸多资源。 github.com/zacharywhitley/awesome-ocr
MMOCR: A Comprehensive Toolbox for Text Detection, Recognition and Understanding github.com/open-mmlab/mmocr
pix2tex - LaTeX OCR:将公式转换成LaTeX脚本 github.com/lukas-blecher/LaTeX-OCR
github.com/AgentMaker/AgentOCR
PearOCR:在线图片转文字,免费OCR,在线图片文字提取 https://pearocr.com/
https://arxiv.org/abs/2108.11591
chineseocr_lite,一款超轻量级中文 OCR,支持竖排文字识别,总模型仅 4.7M。 github.com/DayBreak-u/chineseocr_lite
github.com/NielsRogge/Transformers-Tutorials/tree/master/LayoutLMv2
Skim-Attention: Learning to Focus via Document Layout
PP-OCRv2: Bag of Tricks for Ultra Lightweight OCR System github.com/PaddlePaddle/PaddleOCR
STRIVE: Scene Text Replacement In Videos https://arxiv.org/abs/2109.02762
TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models github.com/microsoft/unilm/tree/master/trocr
extract-table:从图片中提取表格的API github.com/vegarsti/extract-table
TrOCR:基于Transformer的新一代光学字符识别 https://weibo.com/ttarticle/p/show?id=2309404691531549507776
LayoutReader:基于ReadingBank的阅读序列抽取模型 https://weibo.com/ttarticle/p/show?id=2309404699865254068509
《Synthetic Document Generator for Annotation-free Layout Recognition》 https://arxiv.org/abs/2111.06016
繁体中文OCR文字识别数据集(繁體中文OCR文字識別數據集) github.com/GitYCC/traditional-chinese-text-recogn-dataset
场景文本识别相关资源集 github.com/HCIILAB/Scene-Text-Recognition-Recommendations
chineseocr_lite - 超轻量级中文ocr,支持竖排文字识别, 支持ncnn、mnn、tnn推理 ( dbnet(1.8M) + crnn(2.5M) + anglenet(378KB)) 总模型仅4.7M
github.com/DayBreak-u/chineseocr_lite
github.com/ocrmypdf/OCRmyPDF
Benchmarking-Chinese-Text-Recognition:中文文本识别基准(数据集) github.com/FudanVI/benchmarking-chinese-text-recognition
ddddocr - 带带弟弟OCR通用验证码识别SDK免费开源版 github.com/sml2h3/ddddocr
swin-transformer-ocr:用swin-transformer实现的OCR github.com/YongWookHa/swin-transformer-ocr
github.com/bupt-ai-cz/Meta-SelfLearning
github.com/feramhq/Perspec
layout-parser.github.io/
Towards Weakly-Supervised Text Spotting using a Multi-Task Transformer https://arxiv.org/abs/2202.05508
表格识别相关资源大列表 github.com/cv-small-snails/Awesome-Table-Recognition
https://arxiv.org/abs/2203.01017
https://arxiv.org/abs/2203.02378
github.com/katanaml/sparrow
Paperless-ngx:paperless加强版,用来扫描、索引和归档纸质文档 github.com/paperless-ngx/paperless-ngx
github.com/Psarpei/Multi-Type-TD-TSR
Document-Dewarping-with-Control-Points:基于控制点的褶皱纸张图像还原 github.com/gwxie/Document-Dewarping-with-Control-Points
pix2tex - LaTeX OCR:将公式图片转换成LaTeX代码 github.com/lukas-blecher/LaTeX-OCR
chineseocr_lite - 超轻量级中文ocr,支持竖排文字识别, 支持ncnn、mnn、tnn推理 ( dbnet(1.8M) + crnn(2.5M) + anglenet(378KB)) 总模型仅4.7M github.com/DayBreak-u/chineseocr_lite
Manga OCR:面向原版漫画的日文OCR github.com/kha-white/manga-ocr
LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking https://arxiv.org/abs/2204.08387 https://github.com/microsoft/unilm/tree/master/layoutlmv3
DocTR: 基于深度学习的端到端文档文字识别(OCR)工具(TensorFlow 2 & PyTorch) github.com/mindee/doctr
Unified Pretraining Framework for Document Understanding https://arxiv.org/abs/2204.10939
github.com/HillZhang1999/MuCGEC
LayoutBERT: Masked Language Layout Model for Object Insertion https://arxiv.org/abs/2205.00347
Interactive Model Cards: A Human-Centered Approach to Model Documentation https://arxiv.org/abs/2205.02894
自2019年起,微软亚洲研究院的研究员们已经对文档智能领域进行了诸多探索,并完成了LayoutLM/LayoutLMv2/LayoutXLM、TrOCR、MarkupLM等一系列工作,成功在目标识别、信息抽取、文档分类等多种标志性任务取得了突破。然而,在上述一系列工作中所使用到的视觉模型大多数来自在通用域上训练得到的模型,如ResNet,ViT等,而非文档专用的视觉模型。这就导致了图像编码部分的域漂移与不匹配问题。针对上述问题,微软亚洲研究院的研究员们基于目前先进的视觉Transformer架构,开发了全新的DiT模型。
https://weibo.com/ttarticle/p/show?id=2309404771607167239019 论文链接:https://arxiv.org/abs/2203.02378 代码链接:https://aka.ms/msdit
github.com/wangwen-whu/WTW-Dataset
【RapidOCR:基于PaddleOCR & OnnxRuntime的跨平台OCR库】’RapidOCR (捷智OCR) - A cross platform OCR Library based on PaddleOCR & OnnxRuntime' GitHub: https:// github.com/RapidOCR/RapidOCR
【基于Siamese networks的笔迹鉴定系统】’Signature verification system using Siamese networks' by Sean Benhur GitHub: github.com/seanbenhur/siamese_net
【文档图像整改相关论文资源列表】'Awesome Document Image Rectification - A comprehensive list of awesome document image rectification papers.' by Hao Feng GitHub: github.com/fh2019ustc/Awesome-Document-Image-Rectification
'OcrPy - OCR, Archive, Index and Search: Implementation agnostic OCR framework.' by maxent GitHub: github.com/maxent-ai/ocrpy
[CV]《Boosting Modern and Historical Handwritten Text Recognition with Deformable Convolutions》S Cascianelli, M Cornia, L Baraldi, R Cucchiara [University of Modena and Reggio Emilia] (2022) https://arxiv.org/abs/2208.08109
【OCRmyPDF:为扫描PDF增加OCR文本层,可实现文本内容搜索】’OCRmyPDF - OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched' GitHub: github.com/ocrmypdf/OCRmyPDF
【text_normalization:中文文本规范化工具】’text_normalization' by pengzhendong GitHub: github.com/pengzhendong/text_normalization
【Text-Grab:Win10的截屏OCR文字识别工具】’Text-Grab - Use OCR in Windows 10 quickly and easily with Text Grab. With optional background process and popups.' by Joseph Finney GitHub: github.com/TheJoeFin/Text-Grab
'Umi-OCR 批量图片转文字工具 - OCR批量图片转文字识别软件,带界面,离线运行。可排除图片中水印区域的干扰,提取干净的文本。基于 PaddleOCR 。' by hiroi-sora GitHub: github.com/hiroi-sora/Umi-OCR
'通用信息抽取 UIE(Universal Information Extraction) PyTorch版' by heiheiyoyo GitHub: github.com/heiheiyoyo/uie_pytorch
[CV]《PreSTU: Pre-Training for Scene-Text Understanding》J Kil, S Changpinyo, X Chen, H Hu, S Goodman, W Chao, R Soricut [The Ohio State University & Google Research] (2022) https://arxiv.org/abs/2209.05534
模拟手写体中文的 Python 库。基于 PIL 开发实现的工具库,能够输出手写体中文的图片,支持自定义背景图 https://github.com/Gsllchb/Handright
增值税发票 OCR 识别项目。包含训练好的模型和微服务,启动后可直接通过接口调用 https://github.com/guanshuicheng/invoice
【文本纠错相关文献资源列表】’Text Correction Papers - text correction papers' by HuYong GitHub: github.com/nghuyong/text-correction-papers
【Deslanting Algorithm:手写体倾斜校正】’Deslanting Algorithm - The deslanting algorithm sets text upright in images. Python, C++ and OpenCL implementations provided.' by Harald Scheidl GitHub: github.com/githubharald/DeslantImg
【AutoCorrect:基于 Rust 编写的 CLI 工具,用于「自动纠正」或「检查并建议」文案,给 CJK(中文、日语、韩语)与英文混写的场景,补充正确的空格,纠正单词,同时尝试以安全的方式自动纠正标点符号等等】'AutoCorrect - AutoCorrect is a linter and formatter to help you to improve copywriting, correct spaces, words, punctuations between CJK (Chinese, Japanese, Korean).' by Jason Lee GitHub: github.com/huacnlee/autocorrect
'Umi-OCR 批量图片转文字工具 - OCR批量图片转文字识别软件,带界面,离线运行。可排除图片中水印区域的干扰,提取干净的文本。基于 PaddleOCR 。' by hiroi-sora GitHub: github.com/hiroi-sora/Umi-OCR
「GitHub多星项目 ✨」8K⭐️ DayBreak-u/chineseocr_lite: 「超轻量级中文ocr,支持竖排文字识别, 支持ncnn、mnn、tnn推理 ( dbnet(1.8M) + crnn(2.5M) + anglenet(378KB)) 总模型仅4.7M
【PyTorch-IE: PyTorch实现的最先进的信息抽取】’PyTorch-IE: State-of-the-art Information Extraction in PyTorch - PyTorch-IE: State-of-the-art Information Extraction in PyTorch' by Christoph Alt GitHub: github.com/ChristophAlt/pytorch-ie
【OpenFind:苹果手机用来搜索图片中文字的App】’OpenFind - An app to find text in real life. Now open-source!' by Andrew Zheng GitHub: github.com/aheze/OpenFind
《Document AI: LiLT a better language agnostic LayoutLM model》 https://www.philschmid.de/fine-tuning-lilt
【加速文档AI(图像分类、图像转文本、文档问答、表格问答、视觉问答等)】《Accelerating Document AI》 https://huggingface.co/blog/document-ai
'Bob - 一款 macOS 平台翻译和 OCR 软件' by zongyi GitHub: github.com/ripperhe/Bob
日本人文学科开放数据中心(CODH)开发出了一款名为"KuroNet"的日文书法识别系统,能从书法照片里自动检测文本字符并识别成对应的文本,再以增强现实的方式实时显示在相机画面里。非常实用的项目,期待中文版的出现 http://codh.rois.ac.jp/miwo/
[LG]《PLay: Parametrically Conditioned Layout Generation using Latent Diffusion》C Cheng, F Huang, G Li, Y Li [Google Research] (2023) https://arxiv.org/abs/2301.11529
【Pix2Text (P2T): Mathpix 的免费开源 Python 替代工具,可识别既包含文字又包含公式的混合图片,输出 Latex 格式的公式和纯文本】’Pix2Text (P2T) - Pix In, Latex & Text Out. Recognize Chinese, English Texts, and Math Formulas from Images.' BreezeDeus GitHub: github.com/breezedeus/pix2text
【DocTr++:深度无限制文档图像校正工具包】’DocTr++ - Deep Unrestricted Document Image Rectification' Hao F GitHub: github.com/fh2019ustc/DocTr-Plus
FormNetV2采用中心化的多模态图对比学习策略,将所有模态的自监督预训练统一为一种损失函数,从而为各种表单文档的理解任务带来新的性能提升。 https://arxiv.org/abs/2305.02549 [CL]《FormNetV2: Multimodal Graph Contrastive Learning for Form Document Information Extraction》C Lee, C Li, H Zhang, T Dozat, V Perot, G Su, X Zhang, K Sohn, N Glushnev, R Wang, J Ainslie, S Long, S Qin, Y Fujii, N Hua, T Pfister [Google Cloud AI Research & Google Research] (2023)
【用Hugging Face Transformers和Amazon SageMaker对基于Donut的模型进行微调和部署,用于文档理解/文档解析。Donut是一种新的文档理解模型,与其他模型(如LayoutLMv2/LayoutLMv3)相比,可用于商业目的并取得了最先进的性能。本教程包括设置开发环境、加载SROIE数据集、预处理和上传数据集至Donut、在Amazon SageMaker上微调Donut模型、在Amazon SageMaker上部署Donut模型等步骤】《Generative AI for Document Understanding with Hugging Face and Amazon SageMaker》 https://www.philschmid.de/sagemaker-donut
【M⁶Doc Dataset:用于现代文档版面分析研究的M⁶Doc数据集】'M⁶Doc_Dataset_Release' by HCIILAB GitHub: github.com/HCIILAB/M6Doc
【TabRecSet: 用于实际场景端到端表格识别的大规模数据集】'TabRecSet: A Large Scale Dataset for End-to-end Table Recognition in the Wild - A large scale camera-taken table detection and recognition dataset.' Fan Yang GitHub: github.com/MaxKinny/TabRecSet
【MindOCR:基于MindSpore 框架开发的OCR开源工具箱,集成系列主流文字检测识别的算法、模型,并提供易用的训练和推理工具,可以帮助用户快速开发和应用业界SoTA文本检测、文本识别模型,如DBNet/DBNet++和CRNN/SVTR,满足图像文档理解的需求】'MindOCR - A toolbox of OCR models, algorithms, and pipelines based on MindSpore' MindSpore Lab GitHub: github.com/mindspore-lab/mindocr
'语言模型中文识字率分析' Tao Wang GitHub: github.com/twang2218/vocab-coverage
提出一种新的文档布局分析方法,通过将PDF页面表示为结构化的图,并引入一种轻量图神经网络模型GLAM,实现了与最先进模型的竞争性能,同时具有更小的模型大小和更高的效率。 https://arxiv.org/abs/2308.02051
'WeSubtitle: 用 OCR 提取视频硬字幕' WeNet Community GitHub: github.com/wenet-e2e/wesubtitle
介绍了一种方法,在OCR系统中通过生成和附加领域特定语言模型来显著降低专业领域材料的识别错误率。 [CL]《OCR Language Models with Custom Vocabularies》P Garst, R Ingle, Y Fujii [Google] (2023) https://arxiv.org/abs/2308.09671
介绍了一种名为Nougat的神经光学理解学术文档的方法,通过视觉Transformer模型实现光学字符识别,将科学文档转化为标记语言,提高科学知识的可访问性。
https://arxiv.org/abs/2308.13418
[LG]《Nougat: Neural Optical Understanding for Academic Documents》L Blecher, G Cucurull, T Scialom, R Stojnic [Meta AI] (2023)
介绍了在测试时对文本行识别模型进行自适应的问题,在单个测试图像上进行自训练迭代,以纠正模型在手写文档上的错误。 https://arxiv.org/abs/2308.15037 [CV]《Is it an i or an l: Test-time Adaptation of Text Line Recognition Models》D Tula, S Paul, G Madan, P Garst, R Ingle, G Aggarwal [Google Research] (2023)
介绍了一种基于Transformer的在线手写字符分割方法,通过习得字符查询在Transformer解码器块中形成每个簇,实现了最佳的分割结果。 https://arxiv.org/abs/2309.03072 [CV]《Character Queries: A Transformer-based Approach to On-Line Handwritten Character Segmentation》M Jungo, B Wolf, A Maksai, C Musat, A Fischer [University of Applied Sciences and Arts Western Switzerland & Google Research] (2023)
【PDF文档识别/转换引擎Nougat的Transformer实现Demo】《Nougat Transformers - a Hugging Face Space by hf-vision》 https://huggingface.co/spaces/hf-vision/nougat-transformers
KOSMOS-2.5:阅读「文本密集图像」的多模态大语言模型 https://t.cj.sina.com.cn/articles/view/5703921756/153faf05c019012g62
一个中文Lint脚本,可以帮你检查中文里面的标点符合和格式,例如自动在中文和英文之间加上空格。 github.com/Jinjiang/zhlint
【RapidOCRPDF:依托于RapidOCR仓库,快速提取PDF中文字,包括扫描版PDF、加密版PDF】'RapidOCRPDF - Based on RapidOCR, extract the PDF content.' RapidAI GitHub: github.com/RapidAI/RapidOCRPDF
https://arxiv.org/abs/2310.17674 [CV]《Hierarchical Text Spotter for Joint Text Spotting and Layout Analysis》S Long, S Qin, Y Fujii, A Bissacco, M Raptis [Google Research] (2023)
【BetterOCR:将多个 OCR 引擎的结果与大语言模型(LLM)相结合,以纠正和重建输出,目前支持 EasyOCR 和 Tesseract】’BetterOCR - Better text detection by combining multiple OCR engines (EasyOCR, Tesseract) with LLM.' Junho Yeo GitHub: github.com/junhoyeo/BetterOCR
pix2tex - LaTeX OCR 地址:github.com/lukas-blecher/LaTeX-OCR 截图将图片中的公式转为LaTeX代码
pycorrector: 中文文本纠错工具。支持中文音似、形似、语法错误纠正,python3开发。 地址:github.com/shibing624/pycorrector 本项目重点解决其中的"音似、形字、语法、专名错误"等类型。最近发布的v1.0.0版本:新增了ChatGLM3/LLaMA2等GPT模型用于中文文本纠错,发布了基于ChatGLM3-6B的shibing624/chatglm3-6b-csc-chinese-lora拼写和语法纠错模型;重写了DeepContext、ConvSeq2Seq、T5等模型的实现。
【DocDiff:文档增强模型,可用于文档去模糊、文档去噪、文档二值化、文档去水印和印章等任务】'DocDiff - ACM Multimedia 2023: DocDiff: Document Enhancement via Residual Diffusion Models. Also contains 1597 red seals in Chinese scenes, along with their corresponding binary masks.' Zongyuan Yang GitHub: github.com/Royalvice/DocDiff
一个能够把pdf修改成“扫描”效果的 工具:网页链接(github地址:https:////github.com/rwv/lookscanned.io ),虽然我也不知道把pdf做成扫描效果有什么用。
'Umi-OCR V2 文字识别工具 - 开源、免费、实用的离线OCR软件。截屏/粘贴/批量导入图片,支持段落排版/排除水印,扫描/生成二维码。全程无需联网,内置多国语言识别库。' hiroi-sora GitHub: github.com/hiroi-sora/Umi-OCR_v2
【M6Doc_Dataset_Release:用于现代文档布局分析研究的数据集,包含9,080张现代文档图像,涵盖科学文章、教材、试卷、杂志、报纸、笔记和书籍等七个子集,子集来源多样,包括arXiv、中国人民日报官网、VKontakte等,数据标注定义了74个详细的文档布局标注标签,使用了维基百科定义,确保标签的通用性和特异性】’M6Doc_Dataset_Release' by Deep Learning and Vision Computing Lab, SCUT GitHub: github.com/HCIILAB/M6Doc