-
Notifications
You must be signed in to change notification settings - Fork 113
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PDF Text scanner missing line breaks and space #39
Comments
Spaces and line breaks may not (and will most likely not) be represented as characters in text objects. Instead, while drawing the document, you will be instructed to move the current point of focus (the "cursor") something like 12 points to the right, i.e. a space between two words. As I recall, the width of a space is not included in the font, so you would have to listen for those operators that change the text matrix, and decide whether the horizontal translation is large enough to be a space character. There are separate operators for newlines, so that one is easy to implement. Hope this helps. |
Thank you for the reply, Seems like this is gonna be a tough job, I haven't looked into font yet. Gonna look into it and will let you know if i am succeed. Thank you |
Sure, working with PDFs gets complicated sometimes. On 15 jun 2012, at 11:31, omerabbas01
|
@omerabbas01 Have you resolved your problem? I am being stuck in this issue and using a temporary solution: split multi-words keywords and search for separate word, then do some complex code to locate the right place for all words in the keyword. Thereafter, draw all result frames! |
Hello,
Thank you for providing such a beautiful framework to handle the PDF, Your framework save allot of my time, Helped me allot. There are some things i have noticed in the framework while creating custom text highlighting feature. Highlighting works while Text to speech read aloud. For that i am using NSRange to determine which part of string to be highlight. Everything working very good so far i am able to highlight. But there are some issues with pdf scan text. There are some missing spaces between words and Also missing line breaks.
I have never worked with PDF before, Also i don't know much about PDF. But now i am looking into it how things are working. So i have found you are using CGPDFScannerRef to scan text from PDF. So there must be something i can do that help me to get better text. Can you please guide me a bit where should i look and if there's any tutorial about CGPDFScannerRef.
Thank you!
The text was updated successfully, but these errors were encountered: