-
Notifications
You must be signed in to change notification settings - Fork 3
Regexs in preprocess
There is problems described in next header 1-1. Remove chapter and category
1-2. Find the spaces that don't want to clear out
1-3. Get footnote indices
1-4. Split into pages by page number
1-5. find the first point footnote when there is a footnote continuous two pages and you want distinguish content part and footnote part of a page
1-6. Clear out footnote indices in content
1-7. Get and remove authors
1-8. Get and remove Alias, Birth, Death
2-1-1. Aggressive segments paragraphs
2-1-2. Falsely Catch not footnote index number
2-2-1. Incorrect start point of footnote part in a page
Haven't seen this case really, but we can imaginate it.
2-3-1. Broken English
Haven't come up with a good idea to deal with it...
I try to use dictionary to check every possible concatenation of broken english, but it's not possible to figure out what is we want,
e.g. I want Association
but I may get As
first because As
is also in dictionary and become As socia...